I’ve seen ChatGPT generate bad English and I’ve seen the layer or logic / UI re-render the page as I think there is a simple spell checker that kicks in and tells the api to re-render and recheck.
I don’t believe for one second that LLMs reason, understand, know, anything.
There are plenty of times LLMs fail to generate correct sentences, and plenty of times they fail to generate correct words.
Around the time ChatGPT rolled out web search inside actions, you’d get really funky stuff back and watch other code clearly try to catch the run away.
o3 can be hot garbage if you ask it expand a specific point inside a 3 paragraph memo, the reasoning models perform very, very poorly when they are not summarizing.
There are times where the thing works like magic, other times, asking it to write me a PowerShell script that gets users by first and last name has it inventing commands that flags that don’t exist.
If the model ‘understood’, ‘followed, some sort of structure outside parroting stuff it already knows about it would be easy to spot and guide it via prompts. That is not the case even with the most advanced models today.
It’s clear that LLMs work best at specific small tasks that have a well established pattern defined in a strict language or api.
I’ve broken o3 trying to have it lift working python code, into formal python code, how? The person that wrote the code didn’t exactly code it how a developer would code a program. 140 lines of basic grab some data generate a table broke the AI and it had the ‘informal’ solution in the prompt. So no there is zero chance LMMs do more than predict.
And to be clear, it one shot a whole thing for me last night, using the GitHub/Codex/agent thing in VS code, probably saved me 30 minutes but god forbid you start from a bad / edge / poorly structured thing that doesn’t fit the mould.
It's reasonable to perceive most of the value in math and computer science being "at the scale" where there is unpredictability arising from complexity, though scale may not really be the reason for the unpredictability.
But a lot of the trouble in these domains that I have observed comes from unmodeled effects, that must be modeled and reasoned about. GPZ work shows the same thing shown by the researcher here, which is that it requires a lot of tinkering and a lot of context in order to produce semi-usable results. SNR appears quite low for now. In security specifically, there is much value in sanitizing input data and ensuring correct parsing. Do you think LLMs are in a position to do so?
I see LLMs as tools, so, sure I think they’re in a position to do so the same way pen testing tools or spreadsheets are.
In the hands of an expert, I believe they can help. In the hands of someone clueless, they will just confuse everyone, much like any other tool the clueless person uses.
Know plenty of communities that abuse social security benefits, is it worth fixing? I’m fortunate enough to pay the median house hold income in just taxes to the government. Take my money, no.
These are poor people struggling to get by, they are going to cheat the system, who wouldn’t when you live on the edge. It’s a drop in the bucket all for a PR attempt.
Because this needs to be toned down a serious notch. I’ve spent the last year and a half in AI land just to do 95% data work and pretend that it’s now somehow magic AI while OpenAI and the rest are seen as the magic that makes this all work.
Well put. They are useful building blocks but ultimately a lot of the magic is in the data modeling and data engineering and shaping happening behind the scenes. And because it's still slow and costly it's hard to do that well ... and the democratizing frameworks for doing it well haven't been born yet
You have a 4500+ lb car with 0-60 acceleration of 2.x to 4.x seconds and crazy torque, if you drive EVs like a teenager you will eat your tires after 10-12k miles.
I mean... yes/no? EVs are on AS slicks, the tyre itself has less tread on it than something like CrossClimate 2. Overall they are replaced quicker and eat tread quicker. Its ~50% difference. I have both ICE and EV.
The new tyres that are coming out like Pirelli P Zero, promise a longer life span, but that remains to be seen.
They just need to hoodwink an investor to give them enough money to buy a factory producing magnets in China. Then they are going to slowly back way from their claims of doing anything special.
If they do not, they will run out of money and close after expending whatever capital they have.
I don’t believe for one second that LLMs reason, understand, know, anything.
There are plenty of times LLMs fail to generate correct sentences, and plenty of times they fail to generate correct words.
Around the time ChatGPT rolled out web search inside actions, you’d get really funky stuff back and watch other code clearly try to catch the run away.
o3 can be hot garbage if you ask it expand a specific point inside a 3 paragraph memo, the reasoning models perform very, very poorly when they are not summarizing.
There are times where the thing works like magic, other times, asking it to write me a PowerShell script that gets users by first and last name has it inventing commands that flags that don’t exist.
If the model ‘understood’, ‘followed, some sort of structure outside parroting stuff it already knows about it would be easy to spot and guide it via prompts. That is not the case even with the most advanced models today.
It’s clear that LLMs work best at specific small tasks that have a well established pattern defined in a strict language or api.
I’ve broken o3 trying to have it lift working python code, into formal python code, how? The person that wrote the code didn’t exactly code it how a developer would code a program. 140 lines of basic grab some data generate a table broke the AI and it had the ‘informal’ solution in the prompt. So no there is zero chance LMMs do more than predict.
And to be clear, it one shot a whole thing for me last night, using the GitHub/Codex/agent thing in VS code, probably saved me 30 minutes but god forbid you start from a bad / edge / poorly structured thing that doesn’t fit the mould.