Before that, you might ask ChatGPT to create a vector image of a pelican riding a bicycle and then running the output through a PNG to SVG converter...
These are tough benchmarks to trial reasoning by having it _write_ an SVG file by hand and understanding how it's to be written to achieve this. Even a professional would struggle with that! It's _not_ a benchmark to give an AI the best tools to actually do this.
You've misunderstood. The parent was making a specific point — if you want an SVG of a penguin, the easiest way to AI-generate it is to get an image generator to create a (vector-styled) bitmap, then auto-vectorize it to SVG. But the point of this benchmark is that it's asking models to create an SVG the hard way, by writing its code directly.
I guess the idea is that by asking the model to do something that is inherently hard for it we might learn something about the baseline smartness of each model which could be considered a predictor for performance at other tasks too.
Yeah, that's part of the point of this. Getting a state of the art text generating LLM to generate SVG illustrations is an inappropriate application of them.
It's a fun way to deflate the hype. Sure, your new LLM may have cost XX million to train and beat all the others on the benchmarks, but when you ask it to draw a pelican on a bicycle it still outputs total junk.
Yeah, I suppose it is similar.. I don't know their diameters, rotations, nor the distance between their centers, nor which two dimensions, so I would have to guess a lot about what you meant.
It's a proxy for abstract designing, like writing software or designing in a parametric CAD.
Most the non-math design work of applied engineering AFAIK falls under the umbrella that's tested with the pelican riding the bicycle.
You have to make a mental model and then turn it into applicable instructions.
Program code/SVG markup/parametric CAD instructions don't really differ in that aspect.
I would not assume that this methodology applies to applied engineering, as a former actual real tangible meat space engineer. Things are a little nuanced and the nuances come from a combination of communication and experience, neither of which any LLM has any insight into at all. It's not out there on the internet to train it with and it's not even easy to put it into abstract terms which can be used as training data. And engineering itself in isolation doesn't exist - there is a whole world around it.
Ergo no you can't just say throw a bicycle into an LLM and a parametric model drops out into solidworks, then a machine makes it. And everyone buys it. That is the hope really isn't it? You end up with a useless shitty bike with a shit pelican on it.
The biggest problem we have in the LLM space is the fact that no one really knows any of the proposed use cases enough and neither does anyone being told that it works for the use cases.
This test isn't really about the quality of the image itself (multimodals like gpt-image-1 or even standard diffusion models would be far superior) - it's about following a spec that describes how to draw.
A similar test would be if you asked for the pelican on a bicycle through a series of LOGO instructions.
When I was at university, they got some people from industry to talk to us all about our CVs and how to do interviews.
My CV had a stupid cliché, "committed to quality", which they correctly picked up on — "What do you mean?" one of them asked me, directly.
I thought this meant I was focussed on being the best. He didn't like this answer.
His example, blurred by 20 years of my imperfect human memory, was to ask me which is better: a Porsche, or a go-kart. Now, obviously (or I wouldn't be saying this), Porsche was a trick answer. Less obviously is that both were trick answers, because their point was that the question was under-specified — quality is the match between the product and what the user actually wants, so if the user is a 10 year old who physically isn't big enough to sit in a real car's driver's seat and just wants to rush down a hill or along a track, none of "quality" stuff that makes a Porsche a Porsche is of any relevance at all, but what does matter is the stuff that makes a go-kart into a go-kart… one of which is the affordability.
LLMs are go-karts of the mind. Sometimes that's all you need.
I disagree. Quality depends on your market position and what you are bringing to the market. Thus I would start with market conditions and work back to quality. If you can't reach your standards in the market then you shouldn't enter it. And if your standards are poor, you should be ashamed.