I get access to previews from OpenAI, Anthropic and Gemini pretty often. They're usually accompanied by an NDA and an embargo date - in this case the embargo was 10am Pacific this morning.
I won't accept preview access if it comes with any conditions at all about what I can say about the model once the embargo has lifted.
Simonw is a cheerful and straightforward AI journalist who likes to show and not just tell. He has done a good job aggregating and documenting the progress of LLM tools and models. As I understand it, OpenAI and Anthropic have both wisely decided to make sure he has up to date info because they know he'll write about it.
Thanks for all your work, Simon! You're my favorite journalist in this space and I really appreciate your tone.
I get it, you would trust yourself if you said that, but it doesn't really matter whether you say that or not, what counts for your ongoing credibility if you will preface every future blog post with, whether you got special access, a special deal, sponsorship, or the fact that you didn't get any of those things.
You're a reviewer. This is how reviewers stay credible. If you don't disclose your relationship with the thing or company you're reviewing, I'm probably better off assuming you're paid.
And if your NDA says you can't write that in your preface, then logically, it is impossible to write a credible review in the first place.
awesome, thanks a lot that's important but ... sorry I just checked those, and I do think it's better to do it on a per-article basis, because a lot of your audience (I'm guessing) comes from external links, not browsing your website
this is (or should be) a pretty standard thing to do on youtube review channels (that I would trust), and it's not a bad thing to remind people of, on every occasion, plus it can function as a type of "canary" in cases of particularly restrictive NDAs
I like Simon, but he's not a journalist. A journalist would not have gone to OpenAI to glaze the GPT-5 release with Theo. I don't say this to discount Simon -- I appreciate his writing and analysis but a journalist, he isn't.
I actually talk to journalists on the AI beat quite often - I've had good conversations with them at publications including The Economist and NY Times and Washington Post and ArsTechnica.
They're not going to write up detailed reviews of things like the new Claude code interpreter mode though, because that's not of interest to a general enough audience.
Shoot that's what I get for staying off twitter and email for a week. Glad newsletters provide a little bit of a cushion these days but hopefully someone snaps her up.
You normally keep up with staffing updates for writers at random internet blogs? That is mind-blowing, I don't think I ever even read the name of the author of an article intentionally, and when I do it by mistake I forget it 2 webpages down the road.
i've never used twitter myself, but isn't that its purpose? follow people you like because of what they do and get informed by themselves about what happens behind the curtains. OP mentioned being off twitter, maybe they follow the author there and would've seen a tweet about it.
I fully expect a model to output a SVG made up of 1000x1000 rectangles (i.e. pixels) representing a raster image of a beautifully hand-drawn pelican riding a bicycle any day now :)
I got an amazing result from ChatGPT a while back - an SVG with a perfect illustration of a pelican riding a bicycle.
It was suspiciously good in fact... so I downloaded the SVG file and found out it had generated a raster image with its image tool and then embedded it as base64 binary image data inside an SVG wrapper!
You’ll just have to move the goalpost then; perhaps it can be a multidimensional pelican saving the multiverse, or an invisible pelican that only you can see and critique.
How would that help, given that ChatGPT has apparently already figured out how to consistently and systematically game the benchmark by working in pixel space and only using SVG as a wrapper for a raster image?
FWIW, I could totally see a not hugely more advanced model using its native image generation capabilities and then running a vector extraction tool on it, maybe iteratively. (And maybe I would not consider that cheating, anymore, since at some point that probably resembles what humans do?)
Other things you can ask that they're still clearly not optimizing for are ASCII art and directions between different locations. Complete fabrications 100% of the time.
Well, I definitely hope they aren't trying to teach LLMs directions between locations, given how idiotic use of compute and parameter space that would be. We already have excellent AIs for route planning. What they ought to optimize for is, of course, finally teaching them to say they don't know, or just automatically opting to call a route-planning API if the user asks for directions.
Simon tends to write up reports of new LLM releases (with great community respect) and it's much easier with lead time if the provider is able to set up a preview endpoint.
I believe the criticism is that he's reporting on a pre-release LLM which isn't the same as the one you and I are going to be using a few weeks from now after they've downgraded it enough to work at scale.
simonw is Simon Willison, who’s well known for a number of things. But these days, he’s well known for his AI centric blog and his tools. The AI companies give him early access to stuff.
Could you please stop breaking the site guidelines? You've been doing it repeatedly, we've asked you to stop several times, and you haven't stopped yet.