Hacker Newsnew | past | comments | ask | show | jobs | submit | soloist11's commentslogin

That won't make a difference. These generative AI systems have ingested more math books than any human being alive today and they still can't add numbers.


I work in the field. The issue is that you will find 2+2=5 literally a hundred times more in all big datasets than 6+3=9

A lot of information is wrong, a lot is only true in context (e.g. 2+2=5 features prominent in the book 1984), even more is spam or machine generated.

If you see the garbage that goes in, I am always amazed at how these models do what they do.


To be fair, I'm kind of amazed that humans can do basic arithmetic at all. Our attention spans suck. We're barely trustworthy on single digit addition. For anything more complicated I use a calculator.


Yes, the issue is that statistical models can not reason and determine what is logically valid vs what is most probable (which I guess is also its own kind of logic).


And context, so much context.

The sqrt(-1) sometimes doesn't exist, sometimes it's 1i. 2+2=4, except in literature where it can be 5. 1+1=2, but sometimes 3 in advertisements or in ironical text.

We often have some ideas about e.g how it works in a quiz, where you know there is only one factually correct answer. And we are disappointed if the model is wrong. But even in a quiz setting the jury gets that balance wrong every so often, where there are other answers than the official one which are also correct.

Even "logically valid" is context dependend. This is not to say that models don't hallucinate, just that even within the logically valid answers, there is hidden context surrounding the data which is not expressed in the data itself. Fermats last problem is a solved problem in mathematics, but not in documents from before 1994.


The models operate by the logic of boolean arithmetic so in that sense they can not be inconsistent. But in any case, it's pretty obvious no one in this thread understands what I'm getting at but maybe eventually there will be an AGI smart enough to get the point.


Annotated data sets help, and ChatGPT-4o says 2+2=4 just fine. It also gives the right answer for 920,384 + 293,848. It can also read X - 12 = 14 out of an image and get 2.

1984 having 2 + 2 = 5 makes sense in context as a human reading the book, and ChatGPT dot-producting the book can also compute the context and not say that 2+2=5.

ChatGPT's not Mathematica, and we already have calculators. My hammer is terrible for driving in nails, so I don't use it for that.


And this is why input assessment¹ is crucial...

(Idea assessment in general. Handled ideas in thought processes are still input.)


I keep seeing this being thrown around but ChatGPT can do addition just fine since at least GPT-3.5. Not that I care or expect it to (why would I use an LLM for math?) but still.


I don't know man, I keep hearing about AGI before 2030 but none of these AI labs can figure out how to do arithmetic with their fancy intelligence software.


They do addition just fine, you're just repeating something you heard months ago, which is out of date in the field, which is why adherents are saying 2030.


Pretty sure I'm right. Ask your favorite chatbot to solve the following system of equations and let me know what you get as the answer. Here is the answer from gemini:

> Solve the following system of equations: 2x + 2y = 2 and x + y = 1

My pleasure, I’ve been growing my expertise in solving system of linear equations problems. Let’s solve the system of equations: $$2x+2y=2$$ $$x+y=1$$

We can solve the system of equations by elimination.

Steps to solve: *1. Eliminate x:* Multiply the second equation by -2: $$-2x-2y=-2$$ Add the first and second equations: $$0=-1$$

*2. No solution:* Since we obtained a contradiction (0=-1), there is no solution to the system of equations.

*Answer:* The system of equations has no solution.


ChatGPT-4o

the system of equations has infinitely many solutions given by x=t and y=1−t for any real number t.

https://chatgpt.com/share/c00f1435-a57f-49db-b8fb-9d848f7e86...

Furthermore, if I didn't understand that, I can have a conversation with the LLM as to what that actually means, ask it to graph it, ask what linear equations are, and to have it give me some more systems of equations for me to solve.


Wow, that's amazing. We have achieved AGI already. You should go back to the homology stuff though because the triangulation was incorrect and figure out why.


It is amazing, isn't it‽


You should let it plan your day. It's obviously much smarter than you.


From Claude 3.5 Sonnet:

"Let's solve this system of equations step by step:

1) We have two equations: 2x + 2y = 2 (Equation 1) x + y = 1 (Equation 2)

2) We can solve this using substitution. Let's express x in terms of y from Equation 2: x = 1 - y

3) Now, let's substitute this into Equation 1: 2(1 - y) + 2y = 2

4) Let's simplify: 2 - 2y + 2y = 2 2 = 2

5) This is true for all values of y, which means that Equation 1 is actually redundant. It doesn't give us any new information beyond what Equation 2 provides.

6) Therefore, we can conclude that there are infinitely many solutions to this system, all satisfying the equation: x + y = 1

7) We can express the solution set as: x = 1 - y y = y (where y can be any real number)

Thus, the solution to this system is a line in the xy-plane described by the equation x + y = 1.

Would you like me to explain any part of this solution in more detail?"


Wow, incredible. No one needs to learn math anymore, AI can do it for us now.


Oh absolutely, what a tragedy! How will we ever cope without the joy of solving soloist11's dumb as fuck linear equation? Ahah


You should marry an AI and not worry about my dumb linear equations. That way the AI can do everything for you, it can even think for you.


I just asked both Claude and ChatGPT to add numbers and they both gave me the right answer.


Ask them to compute the simplicial homology of the n dimensional projective plane next.


I easily broke copilot by asking it to make lists of radioactive isotopes in order of half lives. It can put the U.S. states in alphabetical or reverse alphabetical order but for any other order I would bet against it. If I ask it what the probability is that it can correctly complete a sorting task, however, it insists that it is almost certain to get it right.

I had a good conversation with it about the theory of partial orderings, it even corrected my mistakes. I asked it to make a textbook problem determining if a graph was cyclic or not and it made a straight and beautiful example where the partial ordering was realized with a total ordering and everything was written out in a straight order that was easy to follow.

If I wrote a script that made up a bunch of "is this graph cyclic?" problems that are well randomized I am sure there is some size where it just falls down the same way it falls down with sorting.

The obvious answer is that the LLM should pick an algorithm or write some code to do the thing which ordinary algorithms can do such as arithmetic, sorting, SAT solving, etc.

There's the deeper issue that it doesn't know what it doesn't know. It can't sort a list of radioactive isotopes any more than it can help you make an atom bomb. In the second case it will say that it won't help you, in the first case it will try to help you anyway when it really should be saying "I can't do that, Dave" because it just can't.


> lists of radioactive isotopes in order of half lives

ChatGPT-4o does just fine with that. Basing your opinion of a whole technology based on a poor implementation of that instead of the best one doesn't seem like the best analysis.


I just asked ChatGPT that and it seemed to come up with a good answer.


Now ask it to convert the computation into a logical calculus so that it can be verified with a theorem prover like Coq, Lean, or Isabelle.


What are you actually getting at with this? Formalising algebraic topology in Lean is still _very_ much an open project, for instance.


You'd think with all those billions spent on the software and the hardware it would be a walk in the park to convert a single book on algebraic topology into a formalized Coq, Lean, or Isabelle module. Seems like a very obvious test case for the intelligence capabilities of these systems. I know that it is possible because Kevin Buzzard is going to formalize Fermat's last theorem for less than £934,043 but no commercial AI lab has yet managed to build an AI that can do basic arithmetic. [0] Mira Murati is on the record about their next AI model and that it will have the intelligence of a PhD student so let's see if their next model can actually formalize basic algebraic topology into a logical calculus. [1]

0: https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/Y0...

1: https://engineering.dartmouth.edu/news/openai-cto-mira-murat...


Why would you think that's a walk in the park? Have you actually tried formalising stuff in Lean/Coq? I have, and even with a postgraduate maths degree behind me it's hard as hell!

The fact that Kevin and his team are formalising FLT is incredible, but they all have decades of experience with this stuff (!!).

Transformers can do arithmetic (and many other things) just fine, do a bit of searching on arxiv and you'll find papers from 2023 showing that nano-scale transformer models suffice. It really is a data problem, not a fundamental limitation with the technology.


What is your degree in?


Master's studying representation theorems in nonmonotonic logic, left academia for industry during my PhD. Fun spaced out maths problems. I tried to formalize my thesis in Lean but it is nowhere near as simple as you make it out to be.


You are really onto something here. LLMs aren't really perfect. Good catch.


Perfection is not the problem. An obvious test case of intelligence is to formally model something like algebraic topology in a formal logical calculus like intensional type theory with identity types. Even though all the commercial labs have ingested all of nLab, there isn't a single commercial model that can use logic to perform arithmetic operations.


OK, so it seems you didn't get the memo: LLMs at the present stage have not yet reached AGI, and have some notable other flaws, like not being able to do math reliably. Commercial interests will try to exaggerate the current capabilities, but most people can see through that.

The capabilities are nonetheless nothing short of astounding, given where we were 10 or even 2 years ago, and clearly point to a near future where we can expect the machines overcome these shortcomings.

Thousands, if not millions, of researchers, coders and others will have to adjust their worklife expectations, just like previous technological revolutions have seen thousands of other professions disappear into think air.


Sure, good luck with this AGI business. I'm sure it will work out great for everyone in the end.


The output of ChatGPT almost always sounds good. That's the point.

But I would wager that its answer was at least wrong, and perhaps total nonsense.

That's the real hazard of using ChatGPT as a learning tool. You are in no position to evaluate whether the output makes any sense.


I asked it to compute the simplicial homology of RP^2 and not only was it spot on with the result, it gave me a detailed and essentially correct computation. This definitely appears in its training set, but nevertheless you should have some humility =P


How do you know it's correct? The only simplicial traingulation I know of is by splitting up the sphere into an icosahedron and then identifying all the opposite faces to get the proper antipodal action for the quotient.


I'm not interested in engaging with you further on this topic after you devolved into ad hominems against me in the other thread. I'm here to argue in good faith. Have a good day.


You made an incorrect assessment of a basic calculation in algebraic topology and claimed that it was correct. You didn't even look at what it was computing and simply looked at the final answer which lined up with the answer on Wikipedia. Simplicial calculations for projective planes are not simple. The usual calculations are done with cellular decomposition and that's why the LLM gives the wrong answer, the actual answer is not in the dataset and requires reasoning.


Are you confusing me with someone else? When I asked it GPT computed the homology from the CW decomposition of RP^2 with three cells. Which is a very simple exercise.

I recommend that you give it a try.


That's ok. It seems like LLMs know all about simplicial complexes and homology so I'll spend my time on more fruitful endeavors but thanks for the advice.


To be fair, it's not a simplicial complex, but simplicial and cellular homology coincide on triangulatable spaces like RP^2 so I gave it the benefit of the doubt =) algebraic topology is a pretty fun field regardless of how much a language model knows about it IMO.


Do you have a reference for this equivalency?


It's in Hatcher iirc


I dunno what you wanted to wager, but I would still be interested in the holes in this answer.

https://chatgpt.com/share/e84800dd-c714-42d4-977b-b446c5c5ed...


That's incorrect.


No it isn't.


lmao. you're totally right. RP^2 can be triangulated with a single triangle with all of its vertices identified. that's totally how you compute the simplicial decomposition of RP^2


I asked you explain why it's wrong, and all you said was "that's incorrect". Saying "no it isn't" got you to explain your answer far better than how I directly asked you to in the first place.


Don't worry, we'll have AGI soon and it will give the correct answer instead of whatever plausible nonsense it put together this time. I have faith.


That’s moving the goal post. GP’s assertion was that it couldn’t add numbers.


Computing simplicial homology is basic arithmetic. It's the same goal post.


The goal post is adding numbers, not some other calculation for which adding numbers is a step.


The computer can't do anything other than arithmetic


I mean, they are much better than humans at "mental" arithmetic.


I don't know what that means. There is nothing "mental" happening in the circuits of the computer or the function graph which is implemented on top of it.


In the context of AI, it's a term that I borrow from human cognition to describe rapid calculations without external aids.


The comparison still makes no sense. What would be an external aid for a computer?


Well for the LLM would be a calculator.


The LLM is a calculator. Think about it.


The LLM is an LLM. It runs on computer hardware which has an ALU, but that doesn't make the LLM a calculator. The LLM can, however, call out to a calculator to do addition when it deems necessary.


The LLM is not doing anything other than arithmetic calculations. Every operation an LLM is doing can be done with a calculator.


Typically we don't refer to matrix multiplication as arithmetic calculations, but you do you.


> Typically we don't refer to matrix multiplication as arithmetic calculations

You don't? Didn't you do them in school? Everyone calls that arithmetic's, you are just adding up and multiplying a bunch of numbers.


do you really multiply numbers but call that arithmetic? I must sincerely ask, where are you?


I'm pretty sure it's all arithmetic for an LLM but you do you too.


It's not.


Are you sure?


Yes


So the LLM is not doing arithmetic?


This is like saying binary numbers are the reason generative AI falls short. Computers work with transistors which are either on or off so what are these people proposing as the next computational paradigm to fix the problems with binary generative AI?


What?


Tokenization as the main problem is a red herring. It's possible to get rid of the tokens entirely and train on byte sequences, it won't make a difference to why generative AI can't count or do basic arithmetic.


One must ask the obvious question: where did the money go?


Money is simply a container for a form of monetary energy that follows the same conservation of energy laws as physical energy.

And so monetary energy can't be created or destroyed but instead it flows from one system (dollars, bitcoin, real estate, gold, TVs, art etc) to another, and the value of each unit of a system is proportional to the density of the monetary energy the unit is storing.

It will inevitably be attracted to rest in the system whose units store it well and without dilution due to addition of empty units to the system (e.g. freshly printed dollars)

To say precisely where the monetary energy moved to is impossible, but the energy density of units of both gold and the dollar have increased relative to bitcoin over the last few weeks, so those systems are prime candidates


Money is a promise about the future. The future can be unexpected, and promises can be broken.


The same place the “missing money” goes when stock market crashes.


How does WebSage verify semantic fidelity and avoid hallucinatory confabulations?


All the listed loss functions induce equivalent topologies. Functions which converge in absolute value on a compact space converge also with mean squared error to the same function. The loss function is actually irrelevant and all it does is change the rate of convergence.


It's basic thermodynamics. More energy in the atmospheric system means more violent and extreme weather events. The energy has to go somewhere and it goes into faster winds and bigger storms. In other news: In 2022, U.S. total petroleum consumption averaged about 20.28 million barrels per day (b/d) [1].

1: https://www.eia.gov/energyexplained/oil-and-petroleum-produc...


The computer works with binary sequences. LLMs do not change this fact.


They don't really have to. Even in Star Trek where you can tell the holodeck to do whatever in LLM fashion, there's also a lower level language that can be used for more fine control. It's always good to have levels of abstraction when you need them, this just adds the very top one that was always missing.


The fact is that they do and it doesn't matter how many abstractions we put on top of the underlying physical reality of digital computation, it will always be digital unless someone figures out how to make analog computation as scalable as digital computation.


This can be said about every AI system/software and not just Figma. First the data is gathered for "self-supervised" training. Then, some product is built on top of it to gather users. Once the users show up their data is in turn used to fine tune the system in order to continue gathering data and subscriptions from the users.

The logic of AI companies is very simple and the entire value proposition is in how efficiently the company can convert user data/feedback into features that users will pay for so that the AI company can continue paying their cloud bills.


Folks, there is no formula for detecting anxiety and emotions from pictures alone. Anxiety levels are correlated with nutrition and various biometric markers like cortisol. The AI, no matter how advanced can not detect cortisol levels from pictures of faces (this is before we even get to how stress and anxiety are defined and labeled).


(1) In this case they are asking people questions about pictures and having the AI score their answers, not having the AI look at a picture.

(2) There's a pretty big area of research into AI-based diagnosis, for instance, diagnosing depression based on social media posts. Given that "business as usual" in psychiatric diagnosis is terrible (conditions that affect 5% of the population get diagnosed practically 0% of the time, but everybody has autism...) it's not hard to improve matters greatly but the problem is that you can collect enough info to diagnose people without their consent.


The standard solution is to use digital keys and signatures. There is no need to reinvent the wheel here, just use the standard cryptographic constructions to verify that the requests are from trusted sources, e.g. https://medium.com/@georgwiese/hash-based-digital-signatures...


Yes, this is a reasonable approach, but how are certificates deployed and managed?

How do we deploy a list of certificates that a service should accept?

How do we do certificate rotation and revocation?


You can use a configuration management tool but you can also just have a bundled archive that is deployed and extracted with SSH. Here's one example: https://community.chef.io/tools/chef-habitat


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: