My only take home is they are all terrible and I should hire a professional.

jug · 2025-06-08T13:25:21 1749389121

Before that, you might ask ChatGPT to create a vector image of a pelican riding a bicycle and then running the output through a PNG to SVG converter...

Result: https://www.dropbox.com/scl/fi/8b03yu5v58w0o5he1zayh/pelican...

These are tough benchmarks to trial reasoning by having it _write_ an SVG file by hand and understanding how it's to be written to achieve this. Even a professional would struggle with that! It's _not_ a benchmark to give an AI the best tools to actually do this.

YuccaGloriosa · 2025-06-08T16:33:31 1749400411

I think you made an error there png is a bitmap format

sethaurus · 2025-06-08T16:59:16 1749401956

You've misunderstood. The parent was making a specific point — if you want an SVG of a penguin, the easiest way to AI-generate it is to get an image generator to create a (vector-styled) bitmap, then auto-vectorize it to SVG. But the point of this benchmark is that it's asking models to create an SVG the hard way, by writing its code directly.

keiferski · 2025-06-08T11:17:30 1749381450

As the other guy said, these are text models. If you want to make images use something like Midjourney.

Promoting a pelican riding a bicycle makes a decent image there.

keiferski · 2025-06-08T13:40:19 1749390019

* Prompting

dist-epoch · 2025-06-08T10:47:13 1749379633

Most of them are text-only models. Like asking a person born blind to draw a pelican, based on what they heard it looks like.

neepi · 2025-06-08T11:07:54 1749380874

That seems to be a completely inappropriate use case?

I would not hire a blind artist or a deaf musician.

__alexs · 2025-06-08T11:16:55 1749381415

I guess the idea is that by asking the model to do something that is inherently hard for it we might learn something about the baseline smartness of each model which could be considered a predictor for performance at other tasks too.

simonw · 2025-06-08T12:05:49 1749384349

Yeah, that's part of the point of this. Getting a state of the art text generating LLM to generate SVG illustrations is an inappropriate application of them.

It's a fun way to deflate the hype. Sure, your new LLM may have cost XX million to train and beat all the others on the benchmarks, but when you ask it to draw a pelican on a bicycle it still outputs total junk.

dist-epoch · 2025-06-08T12:33:23 1749386003

tried starting from an image:

https://chatgpt.com/share/684582a0-03cc-8006-b5b5-de51e5cd89...

lol: https://gemini.google.com/share/4d1746a234a8

dist-epoch · 2025-06-08T11:12:53 1749381173

The point is about exploring the capabilities of the model.

Like asking you to draw a 2D projection of 4D sphere intersected with a 4D torus or something.

kevindamm · 2025-06-08T12:01:24 1749384084

Yeah, I suppose it is similar.. I don't know their diameters, rotations, nor the distance between their centers, nor which two dimensions, so I would have to guess a lot about what you meant.

namibj · 2025-06-08T11:14:22 1749381262

It's a proxy for abstract designing, like writing software or designing in a parametric CAD.

Most the non-math design work of applied engineering AFAIK falls under the umbrella that's tested with the pelican riding the bicycle. You have to make a mental model and then turn it into applicable instructions.

Program code/SVG markup/parametric CAD instructions don't really differ in that aspect.

neepi · 2025-06-08T11:34:05 1749382445

I would not assume that this methodology applies to applied engineering, as a former actual real tangible meat space engineer. Things are a little nuanced and the nuances come from a combination of communication and experience, neither of which any LLM has any insight into at all. It's not out there on the internet to train it with and it's not even easy to put it into abstract terms which can be used as training data. And engineering itself in isolation doesn't exist - there is a whole world around it.

Ergo no you can't just say throw a bicycle into an LLM and a parametric model drops out into solidworks, then a machine makes it. And everyone buys it. That is the hope really isn't it? You end up with a useless shitty bike with a shit pelican on it.

The biggest problem we have in the LLM space is the fact that no one really knows any of the proposed use cases enough and neither does anyone being told that it works for the use cases.

rjsw · 2025-06-08T11:56:02 1749383762

I don't think any of that matters, CEOs will decide to use it anyway.

neepi · 2025-06-08T12:34:44 1749386084

This is sad but true.

dist-epoch · 2025-06-08T11:41:26 1749382886

https://www.solidworks.com/lp/evolve-your-design-workflows-a...

neepi · 2025-06-08T12:38:59 1749386339

Yeah good luck with that. Seriously.

dmd · 2025-06-08T11:20:09 1749381609

Sorry, Beethoven, you just don’t seem to be a match for our org. Best of luck on your search!

You too, Monet. Scram.

wongogue · 2025-06-08T12:35:25 1749386125

Even Beethoven?

vunderba · 2025-06-08T19:00:05 1749409205

This test isn't really about the quality of the image itself (multimodals like gpt-image-1 or even standard diffusion models would be far superior) - it's about following a spec that describes how to draw.

A similar test would be if you asked for the pelican on a bicycle through a series of LOGO instructions.

spaceman_2020 · 2025-06-08T13:29:23 1749389363

My only take home is that a spanner can work as a hammer, but you probably should just get a hammer

GaggiX · 2025-06-08T12:48:39 1749386919

An expert at writing SVGs?

matkoniecz · 2025-06-08T10:48:16 1749379696

it depends on quality you need and your budget

neepi · 2025-06-08T11:06:53 1749380813

Ah yes the race to the bottom argument.

ben_w · 2025-06-08T11:47:43 1749383263

When I was at university, they got some people from industry to talk to us all about our CVs and how to do interviews.

My CV had a stupid cliché, "committed to quality", which they correctly picked up on — "What do you mean?" one of them asked me, directly.

I thought this meant I was focussed on being the best. He didn't like this answer.

His example, blurred by 20 years of my imperfect human memory, was to ask me which is better: a Porsche, or a go-kart. Now, obviously (or I wouldn't be saying this), Porsche was a trick answer. Less obviously is that both were trick answers, because their point was that the question was under-specified — quality is the match between the product and what the user actually wants, so if the user is a 10 year old who physically isn't big enough to sit in a real car's driver's seat and just wants to rush down a hill or along a track, none of "quality" stuff that makes a Porsche a Porsche is of any relevance at all, but what does matter is the stuff that makes a go-kart into a go-kart… one of which is the affordability.

LLMs are go-karts of the mind. Sometimes that's all you need.

neepi · 2025-06-08T12:33:40 1749386020

I disagree. Quality depends on your market position and what you are bringing to the market. Thus I would start with market conditions and work back to quality. If you can't reach your standards in the market then you shouldn't enter it. And if your standards are poor, you should be ashamed.

Go kart or porsche is irrelevant.

ben_w · 2025-06-08T12:55:44 1749387344

> Quality depends on your market position and what you are bringing to the market.

That's the point.

The market for go-karts does not support Porche.

If you bring a Porche sales team to a go-kart race, nobody will be interested.

Porche doesn't care about this market. It goes both ways: this market doesn't care about Porche, either.