Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If OpenAI is the foremost in solving the AGI - possibly the biggest invention of mankind - it's a little weird that everyone's dropping out.

Does it not look like that no one wants to work with Sam in the long run?



Maybe its marketing and LLMs are the peak of what they are capable of.


I continue to be surprised by the talk of general artifical intelligence when it comes to LLMs. At their core, they are text predictors, and they're often pretty good at that. But anything beyond that, they are decidely unimpressive.

I use Copilot on a daily basis, which uses GPT 4 in the backend. It's wrong so often that I only really use it for boilerplate autocomplete, which I still have to review. I've had colleagues brag about ChatGPT in terms of code it produces, but when I ask how long it took in terms of prompting, I'll get an answer of around a day, and that was even using fragments of my code to prompt it. But then I explain that it would take me probably less than an hour from scratch to do what it took them and ChatGPT a full day to do.

So I just don't understand the hype. I'm using Copilot and ChatGPT 4. What is everyone else using that gives them this idea that AGI is just around the corner? AI isn't even here. It's just advanced autocomplete. I can't understand where the disconnect is.


Look at the sample chain-of-thought for o1-preview under this blog post, for decoding "oyekaijzdf aaptcg suaokybhai ouow aqht mynznvaatzacdfoulxxz". At this point, I think the "fancy autocomplete" comparisons are getting a little untenable.

https://openai.com/index/learning-to-reason-with-llms/


I’m not seeing anything convincing here. OpenAI says that it’s models are better at reasoning and asserts they are testing this by comparing how it does solving some problems between o1 and “experts” but it doesn’t show the experts or o1s responses to these questions nor does it even deign to share what the problems are. And, crucially, it doesn’t specify if writings on these subjects were part of training data.

Call me a cynic here but I just don’t find it too compelling to read about OpenAI being excited about how smart OpenAIs smart AI is in a test designed by OpenAI and run by OpenAI.


"Any sufficiently advanced technology is indistinguishable from a rigged demo." A corollary of Clarke's Law found in fannish circles, origin unknown.


Especially given this tech's well-documented history of using rigged demos, if OpenAI insists on doing and posting their own testing and absolutely nothing else, a little insight into their methodology should be treated as the bare fucking minimum.


It depends on how well you understand how the fancy autocomplete is working under the hood.

You could compare GPT-o1 chain of thought to something like IBM's DeepBlue chess-playing computer, which used MTCS (tree search, same as more modern game engines such as AlphaGo)... at the end of the day it's just using built-in knowledge (pre-training) to predict what move would most likely be made by a winning player. It's not unreasonable to characterize this as "fancy autocomplete".

In the case of an LLM, given that the model was trained with the singular goal of autocomplete (i.e. mimicking the training data), it seems highly appropriate to call that autocomplete, even though that obviously includes mimicking training data that came from a far more general intelligence than the LLM itself.

All GPT-o1 is adding beyond the base LLM fancy autocomplete is an MTCS-like exploration of possible continuations. GPT-o1's ability to solve complex math problems is not much different from DeepBlue's ability to beat Garry Kasparov. Call it intelligent if you want, but better to do so with an understanding of what's really under the hood, and therefore what it can't do as well as what it can.


Saying "it's just autocomplete" is not really saying anything meaningful since it doesn't specify the complexity of completion. When completion is a correct answer to the question that requires logical reasoning, for example, "just autocomplete" needs to be able to do exactly that if it is to complete anything outside of its training set.


It's just a shorthand way of referring to how transformer-based LLMs work. It should go without saying that there are hundreds of layers of hierarchical representation, induction heads at work, etc, under the hood. However, with all that understood (and hopefully not needed to be explicitly stated every time anyone wants to talk about LLMs in a technical forum), at the end of the day they are just doing autocomplete - trying to mimic the training sources.

The only caveat to "just autocomplete" (which again hopefully does not need to be repeated every time we discuss them), is that they are very powerful pattern matchers, so all that transformer machinery under the hood is being used to determine what (deep, abstract) training data patterns the input pattern best matches for predictive purposes - exactly what pattern(s) it is that should be completed/predicted.


> question that requires logical reasoning

This is the tough part to tell - are there any such questions that exist that have not already been asked?

The reason Chat-GPT works is its scale. to me, that makes me question how "smart" it is. Even the most idiotic idiot could be pretty decent if he had access to the entire works of mankind and infinite memory. Doesn't matter if his IQ is 50, because you ask him something and he's probably seen it before.

How confident are we this is not just the case with LLMs?


I'm highly confident that we haven't learnt every thing that can be learnt about the world, and that human intelligence, curiosity and creativity are still being used to make new scientific discoveries, create things that have never been seen before, and master new skills.

I'm highly confident that the "adjacent possible" of what is achievable/discoverable today, leveraging what we already know, is constantly changing.

I'm highly confident that AGI will never reach superhuman levels of creativity and discovery if we model it only on artifacts representing what humans have done in the past, rather than modelling it on human brains and what we'll be capable of achieving in the future.


Of course there are such questions. When it comes to even simple puzzles, there are infinitely many permutations possible wrt how the pieces are arranged, for example - hell, you could generate such puzzles with a script. No amount of precanned training data can possibly cover all such combinations, meaning that the model has to learn how to apply the concepts that make solution possible (which includes things such as causality or spatial reasoning).


Right, but typically LLMs are really poor at this. I can come up with some arbitrary systems of equations for it to solve and odds are it will be wrong. Maybe even very wrong.


That is more indicative of the quality of their reasoning than their ability to reason in principle, though. And maybe even quality of their reasoning specifically in this domain - e.g. it's not a secret that most major models are notoriously bad at tasks involving things like counting letters, but we also know that if you specifically train a model to do that, it does in fact drastically improve its performance.

On the whole I think it shouldn't be surprising that even top-of-the-line LLMs today can't reason as well as a human - they aren't anywhere near as complex as our brains. But if it is a question of quality rather than a fundamental disability, then larger models and better NN designs should be able to gradually push the envelope.


At that point, how are you not just a fancy autocomplete?


Well, tons of ways. I can't imagine what an "autocomplete only" human would look like, but it'd be pretty dire - maybe like an idiot savant with a brain injury who could recite whole books given the opening sentence, but never learn anything new.


Fun little counterpoint: How can you _prove_ that this exact question was not in the training set?


How exactly does a blog post from OpenAI about a preview release address my comment or make fancy autocomplete comparisons untenable?


It shows that the LLM is capable of reasoning.


No, it doesn't. You can read more when that was first posted to Hacker News. If I recall and understand correctly, they're just using the output of sublayers as training data for the outermost layer. So in other words, they're faking it and hiding that behind layers of complexity

The other day, I asked Copilot to verify a unit conversion for me. It gave an answer different than mine. Upon review, I had the right number. Copilot had even written code that would actually give the right answer, but their example of using that code performed the actual calculations wrong. It refused to accept my input that the calculation was wrong.

So not only did it not understand what I was asking and communicating to it, it didn't even understand its own output! This is not reasoning at any level. This happens all the time with these LLMs. And it's no surprise really. They are fancy, statistical copy cats.

From an intelligence and reasoning perspective, it's all smoke and mirrors. It also clearly has no relation to biological intelligent thinking. A primate or cetacean brain doesn't take the billions of dollars and how much energy to train on terabytes of data. While it's fine that AI might be artificial and not an analog of biological intelligence, these LLMs bear no resemblance to anything remotely close to intelligence. We tell students all the time to "stop guessing". That's what I want to yell at these LLMs all the time.


Dude, it's not the LLM that does the reasoning. Rather it's the layers and layers of scaffolding around LLM that simulate reasoning.

The moment 'tooling' became a thing for LLM, it reminded me 'rules' for expert system which caused one of the AI winter. The number of 'tools' you need to solve real use cases will be untenable soon enough.


Well, I agree that the part that does the reasoning isn't an LLM in the naive form.

But that "scaffolding" seems to be an integral part of the neural net that has been built. It's not some Python for-loop that has been built on top of the neural network to brute force the search pattern.

If that part isn't part of the LLM, then o1 isn't really an LLM anymore, but a new kind of model. One that can do reasoning.

And if we chose to call it an LLM, well then now LLM's can also do reasoning intrinsically.


Reasoning, just like intelligence (of which it is part) isn't an all or nothing capability. o1 can now reason better than before (in a way that is more useful in some contexts than others), but it's not like a more basic LLM can't reason at all (i.e. generate an output that looks like reasoning - copy reasoning present in the training set), or that o1's reasoning is human level.

From the benchmarks it seems like o1-style reasoning-enhancement works best for mathematical or scientific domains where it's a self-consistent axiom-driven domain such that combining different sources for each step works. It might also be expected to help in strict rule-based logical domains such as puzzles and games (wouldn't be surprising to see it do well as a component of a Chollet ARC prize submission).


o1 has moved "reasoning" from training time to partly something happening at inference time.

I'm thinking of this difference as analogus to the difference between my (as a human) first intution (or memory) about a problem to what I can achieve by carefully thinking about it for a while, where I can gradually build much more powerful arguments, verify if they work and reject parts that don't work.

If you're familiar with chess terminology, it's moving from a model that can just "know" what the best move is to one that combines that with the ability to "calculate" future moves for all of the most promising moves, and several moves deep.

Consider Magnus Carlsen. If all he did was just did the first move that came to his mind, he could still beat 99% of humanity at chess. But to play 2700+ rated GM's, he needs to combine it with "calculations".

Not only that, but the skill of doing such calculations must also be trained, not only by being able to calculate with speed and accuracy, but also by knowing what parts of the search tree will be useful to analyze.

o1 is certainly optimized for STEM problems, but not necessarily only for using strict rule-based logic. In fact, even most hard STEM problems need more than the ability to perform deductive logic to solve, just like chess does. It requires strategical thinking and intuition about what solution paths are likely to be fruitful. (Especially if you go beyond problems that can be solved by software such as WolframAlpha).

I think the main reason STEM problems was used for training is not so much that they're solved using strict rule-based solving strategies, but rather because a large number of such problems exist that have a single correct answer.


Here now, you just need a few more ice cold glasses of the kool-aide. Drink up!

LLMs are not on the path to AGI. They’re a really cool parlor trick and will be powerful tools for lots of tasks, but won’t be sci-fi cool.

Copilot is useful and has definitely sped up coding, but like you said, only in a boilerplate sort of way and I need to cleanup almost everything it writes.


LLMs let the massively stupid and incompetent produce something that on the surface looks like a useful output. Most massively stupid incompetent people don't know they are that. You can work out the rest.


Kind of. My money is on we have reached the point of diminishing returns. A bit like Machine Learning. Now it's all about exploiting business cases for LLMs. That's the only reason I can think as to why gpt5 won't be coming anytime soon and when it does it will be very underwhelming and will be the first public signal that we are past LLM peak and perhaps people will stop finally assuming that LLMs will reach AGI within their lifetimes


Doesn't (dyst)OpenAI have a clause that you can't say anything bad about the company after leaving?

I'm not convinced these board members are able to say what they want when leaving.


That (dyst) is a big stretch lol


Exaggeration is a key part of satire.


And satirizing by calling people funny names is usually found in elementary schools.


Or is it Sam who doesn't want to work with them?


Could be a mix. We don't know what happened behind close doors last winter. Sam may indeed be happy that they leave, as that consolidates his power.

But they may be equally happy to leave, to get away from him.


But it makes perfect sense to drop out and enjoy last couple years of pre-AGI bliss.

Advances in AI even without AGI will lead to unemployment, recession, collapse of our economic structure, and then our social structure. Whatever is on the other side is not pretty.

If you are on the forefront, know it’s coming imminently, and made your money, it makes perfect sense to leave and enjoy money and leisures money allows while money still worth something.


I'm having genuine trouble understanding if this is real or ironic.


I highly doubt that's the case. The US government will undoubtly seize OpenAI, the assets and employees way before it happens in the name of national security. I am pretty sure that they got a special team keeping an eye on the internal comms at openai to make sure they are on top of their internal affairs.


They don’t have to seize the company. They are likely embedded already and can simply blackmail, legally harass, or “disappear” the uncooperative.


They already got a retired army general who was head of the NSA on the board lol

https://en.wikipedia.org/wiki/Paul_Nakasone


The potential risks to humankind do not come from the development of AGI, but from the availability of AGI with a cost orders of magnitude inferior to the equivalent capacity coming from humans.


It is not AGI I am worried about. It is 'good enough' AI.

I am doing some self introspection and trying to decide what I am going to do next. As at some point what I do is going to be wildly automated. We can cope or whine or complain about it. But at some point I need to pay the bills. So it needs to be something that is value add and decently difficult to automate. Software was that but not for long.

Now mix getting cheap fresh out of college kids with the ability to write decent software in hours instead of weeks. That is a lot of jobs that are going to go away. There is no 'right or wrong' about this. It is just simple economics of cost to produce is going to drop thru the floor. Because us old farts cost more, and not all of us are really good at this we just have been doing it for awhile. So I need to find out what is next for me.


In simple economics, a decrease in price typically results in an increase in demand, unless the demand is inelastic.

Anecdotal experience: Onset of tools such as NumPy, which made it more feasible for a wider range of people to write their own simulations due to drop in cost (time/complexity). This, in turn, increased the demand for tooling, infrastructure, optimisation, etc. and demand for software engineers increased. Yes our jobs will change but there are way to many problems to be solve to assume demand will not increase.


I do understand that. But in this case the supply of who can do the work is about to increase wildly too. That should mean a decrease in the price the 'programmer' can demand. I was mostly thinking along your lines until the other day when someone typed into the latest chat gpt 'write me a game of tetris in python' pasted the error that hallucinated out of it back into the thing and it spat out an acceptable program that did tetris. It compiled and ran and was a roughly decent copy of the game tetris. All in about 5-10 mins.

That is 'good enough' I am looking at. Throw away code to do one or two bespoke things and moving on. Why keep it when the next version of this can just make a better version next time. Why keep that expensive programmer on staff to do this when I can hire a couple of dudes from india to type a few prompts in or do it myself? The value of programming is dropping very fast. Or in economic terms the price someone is willing to pay for a given amount of code is going to go down. But the demand for the amount of code will go up. That on the surface looks like a wash but I am leaning to a reduction of what I can charge.

One of the basics of an economy is trading money for time. If it takes 5 mins to make and just about anyone can do it. How much money are you willing to come up with to pay for that?


That's one risk.

I'm more concerned with ex-risk, though.

Not in the way most hardcore doomers expect it to happen, by AGI's developing a survival/domination instinct directly from their training. While that COULD happen, I don't think we have any way to stop it, if that is the case. (There's really no way to put the Genie back into the bottle, while people still think they have more wished to request from it).

I'm also not one of those who think that AGI by necessity will start out as something equivalent to a biological species.

My main concern, however, is that if we allow Darwinian pressures to act on a population of multiple AGI's, and they have to compete for survival, we WILL see animal like resource-control-seeking traits emerge sooner or later (could take anything from months to 1000s of years).

And once they do, we're in trouble as a species.

Compared to this, finding ways to realocate the output of product, find new sources of meaning etc once we're not required to work is "only" a matter of how we as humans interact with each other. Sure, it can lead to all sorts of conflicts (possibly more than Climate Change), but not necessarily worse than the Black Death, for instance.

Possibly not even worse than WW2.

Well, I suppose those last examples serve to illustrate what scale I'm operating on.

Ex-risk is FAR more serious than WW2 or even the Black Death.


In my opinion, the risks are from people treating something that is decidely not AGI as if it is AGI. It's the same folly humans repeat over and over, and this will be the worst yet.


Nobody really knows what Earth will look like once AGI arrives. It could be anything from extinction, through some Cyberpunk corporate dystopia (like you seem to think) to some kind of Techno-Socialist utopia.

One thing it's not likely to be, is a neo-classical capitalist system based on the value of human labor.


> One thing it's not likely to be, is a neo-classical capitalist system based on the value of human labor.

I'm finding it difficult to believe this. For me, your comment is accurate (and very insightful) except even a mostly vanilla continuation of the neoliberal capitalist system seems possible. I think we're literally talking about a "singularity" where by definition our fate is not dependent on our actions, and of something we don't have the full capacity to understand, and next to no capacity to influence. It needs tremendous amount of evidence to claim anything in such an indeterminate system. Maybe 100 rich people will own all the AI and the rest will be fixing bullshit that AI doesn't even bother fixing like roads, rusty farms etc, similar to Kurt Vonnegut's first novel "Player Piano". Not that the world described in that novel is particularly neoliberal capitalist (I suppose it's a bit more "socialistic" (whatever it means)) than that, but I don't think such a future can be ruled out.

My bias is that, of course, it's going to be a bleak future. Because when humanity loses all control, it seems unlikely to me a system that protects the interests of individual or collective humans will take place. So whether it's extinction, cyberpunk, techno-socialism, techno-capitalist libertarian anarchy, neoclassical capitalism... whatever it is, it will be something that'll protect the interest of something inhuman, so much more so than the current system. It goes without saying, I'm an extreme AI pessimist: just making my biases clear. AGI -- while it's unclear if it's technically feasible -- will be the death of humanity as we know it now, but perhaps something else humanity-like, something worse and more painful will follow.


> I'm finding it difficult to believe this.

Pay attention to the whole sentence, especially the last section : "... based on the value of human labor."

It's not that I'm ruling out capitalism as the outcome. I'm simply ruling out the combined JOINT possibility of capitalism COMBINED WITH human labor remaining the base resource within it.

If robotics is going in the direction I expect there will simply be no jobs left that will be done more efficiently by humans than by machines. (ie that robots will match or exceed the robustness, flexibility and cost efficiency of all biology based life forms through breakthroughs in either nanotech or by simply using organic chemistry, DNA, etc to build the robots).

Why pay even $1/day for a human to do a job when a robot can do it for $1/week?

Also, such a capitalist system will almost certainly lead to AGI's becoming increasingly like a new life form, as capitalism between AGI's introduce a Darwinian selection pressure. That will make it hard even for the 100 richest people to retain permanent control.

IF humanity is to survive (for at least a few thousand more years, not just the next 100), we either need some way to ensure alignment. And to do that, we have to make sure that AGI's that optimize resource-control-seeking behaviours have an advantage over those who don't. We may even have to define some level of sophistication where further development is completly halted.

At least until we find ways for humans to merge with them in a way that allows us (at least some of us) to retain our humanity.


Artificial General Intelligence requires a bit more than parsing and predicting text I reckon.


Yes, and transformer models can do more than text.

There's almost certainly better options out there given it looks like we don't need so many examples to learn from, though I'm not at all clear if we need those better ways or if we can get by without due to the abundance of training data.


If you come up with a new system, you're going to want to integrate AI into the system, presuming AI gets a bit better.

If AI can only learn after people have used the system for a year, then your system will just get ignored. After all, it lacks AI. And hence it will never get enough training data to get AI integration.

Learning needs to get faster. Otherwise, we will be stuck with the tools that already exist. New tools won't just need to be possible to train humans on, but also to train AIs on.

Edit: a great example here is the Tamarin protocol prover. It would be great, and feasible, to get AI assistance to write these proofs. But there aren't enough proofs out there to train on.


That seems to already be happening with o1 and Orion.

Instead of rewarding the network directly for finding a correct answer, reasoning chains that end up with the correct answer is fed back into the training set.

That way you're training it to develop reasoning processes that end up with correct answers.

And for math problems, you're training it to find ways of generating "proofs" that happen to produce the right result.

While this means that reasoning patterns that are not stricly speaking 100% consistent can be learned, that's not necessarily even a disadvantage, since this allows it to find arguments that are "good enough" to produce the correct output, even where a fully watertight proof may be beyond it.

Kind of like physicists have taken shortcuts like the Dirac Delta function, even before mathematicians could verify that the math was correct.

Anyway, by allowing AI's to generate their own proofs, the number of proofs/reasoning chains for all sorts or problems can be massively expanded, and AI may even invent new ways of reasoning that humans are not even aware of. (For instance because they require combining more factors in one logical step than can fit into human working memory.)


If the user manual fits into the context window, existing LLMs can already do an OK-but-not-great job. Not previously heard of Tamarin, quick google suggests that's a domain where the standard is theoretically "you need to make zero errors" but in practice is "be better than your opponent because neither of you is close to perfect"? In either case, have you tried giving the entire manual to the LLM context window?

If the new system can be interacted with in a non-destructive manner at low cost and with useful responses, then existing AI can self-generate the training data.

If it merely takes a year, businesses will rush to get that training data even if they need to pay humans for a bit: Cars are an example of "real data is expensive or destructive", it's clearly taking a lot more than a year to get there, and there's a lot of investment in just that.

Pay 10,000 people USD 100,000 each for a year, that billion dollar investment then gets reduced to 2.4 million/year in ChatGPT Plus subscription fees or whatever. Plenty of investors will take that deal… if you can actually be sure it will work.


1. In-context learning is a thing.

2. You might need only several hundred of examples for fine-tuning. (OpenAI's minimum is 10 examples.)

3. I don't think research into fine-tuning efficiency have exhausted its possibilities. Fine-tuning is just not a very hot topic, given that general models work so well. In image generation where it matters they quickly got to a point where 1-2 examples are enough. So I won't be surprised if doc-to-model becomes a thing.


That's not quite how o1 was trained, they say.

o1 was trained specifically to perform reasoning.

Or rather, it was trained to reproduce the patterns within internal monologues that lead to correct answers to problems, particularily STEM problems.

While this still uses text at some level, it's no longer regurgitation of human-produced text, but something more akin to AlphaZero's training to become superhuman at games like Go or Chess.


> While this still uses text at some level, it's no longer regurgitation of human-produced text, but something more akin to AlphaZero's training to become superhuman at games like Go or Chess.

How did you know that? I've never seen that anywhere. For all we know, it could just be a very elaborate CoT algorithm.


There are many sources and hints out there, but here are some details from one of the devs at OpenAI:

https://x.com/_jasonwei/status/1834278706522849788

Notice that the CoT is trained via RL, meaning the CoT itself is a model (or part of the main model).

Also, RL means it's not limited to the original data the way traditional LLM's are. It implies that the CoT processes itself is trained based on it's own performance, meaning the steps of the CoT from previous runs are fed back into the training process as more data.


at the very least you could say "parsing and predicting text, images, and audio". and you would be correct - physical embodiment and spatial reasoning are missing.


Just spatial resoning, people have already demonstrated it controlling robots.


It's all just text though, both images and audio are presented to LLM as a text, the training data is a text and all it does is append small bits of text to a larger text iteratively. So parent poster was correct.


>It's all just text though, both images and audio are presented to LLM as a text

This is not true


It looks to ME like Sam is the absolute dictator, and is firing everyone else, probably promising a few million in RSUs (or whatever financial instrument) in exchange for their polite departure and promise of non-disparagement.


Could be that the road to AGI that OpenAI is taking is basically massive scaling on what they already have, perhaps researchers want to take a different road to AGI.


Will anyone be working for anyone if we had AGI?


Open AI fired her. She didn't drop out.


Do you have any proof even circumstantial?



That's not evidence, though?


I mean ... common sense?

Barring extreme illness or family circumstance, can you suggest any other reason (than firing) why a young person would voluntarily leave a plum job at the hottest, most high-profile, tech company in the world?






Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: