Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.

Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.

Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.





Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:

- Log in to the internal system that handles customer policies

- Find all policies that were bound in the last 30 days

- Log in to the internal system that manages customer payments

- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.

- Flag any divergences above X% for accounting/finance to follow up on.

Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.

Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.


> “Does it have to be perfect?”

Actually, yes. This kind of management reporting is either (1) going to end up in the books and records of the company - big trouble if things have to be restated in the future or (2) support important decisions by leadership — who will be very much less than happy if analysis turns out to have been wrong.

A lot of what ties up the time of business analysts is ticking and tying everything to ensure that mistakes are not made and that analytics and interpretations are consistent from one period to the next. The math and queries are simple - the details and correctness are hard.


Is this not belligerently ignoring the fact that this work is already done imperfectly? I can’t tell you how many serious errors I’ve caught in just a short time of automating the generation of complex spreadsheets from financial data. All of them had already been checked by multiple analysts, and all of them contained serious errors (in different places!)

No belligerence intended! Yes, processes are faulty today even with maker-checker and other QA procedures. To me it seems the main value of LLMs in a spreadsheet-heavy process is acceleration - which is great! What is harder is quality assurance - like the example someone gave regarding deciding when and how to include or exclude certain tables, date ranges, calc, etc. Properly recording expert judgment and then consistently applying that judgement over time is key. I’m not sure that is the kind of thing LLMs are great at, even ignoring their stochastic nature. Let’s figure out how to get best use out of the new kit - and like everything else, focus on achieving continuously improving outcomes.

There’s actually different classes of errors though. There’s errors in the process itself versus errors that happen when performing the process.

For example, if I ask you to tabulate orders via a query but you forgot to include an entire table, this is a major error of process but the query itself actually is consistently error-free.

Reducing error and mistakes is very much modeling where error can happen. I never trust an LLM to interpret data from a spreadsheet because I cannot verify every individual result, but I am willing to ask an LLM to write a macro that tabulates the data because I can verify the algorithm and the macro result will always be consistent.

Using Claude to interpret the data directly for me is scary because those kinds of errors are neither verifiable nor consistent. At least with the “missing table” example, that error may make the analysis completely bunk but once it is corrected, it is always correct.


Very much agreed

Speak for yourself and your own use cases. There are a huge diversity of workflows with which to apply automation in any medium to large business. They all have differing needs. Many excel workflows I'm personally familiar with already incoporate a "human review" step. Telling a business leader that they can now jump straight to that step, even if it requires 2x human review, with AI doing all of the most tediuous and low-stakes prework, is a clear win.

>Speak for yourself and your own use cases

Take your own advice.


I'm taking a much weaker position than the respondent: LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

This is basic business and engineering 101.


>LLMs are useful for many classes of problem that do not require zero shot perfect accuracy. They are useful in contexts where the cost of building scaffolding around them to get their accuracy to an acceptable level is less than the cost of hiring humans to do the same work to the same degree of accuracy.

Well said. Concise and essentially inarguable, at least to the extent it means LLMs are here to stay in the business world whether anyone likes it or not (barring the unforeseen, e.g. regulation or another pressure).


There is another aspect to this kind of activity.

Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period. Basically sitting on credits or debits to some accounts for a period of weeks. The tacit knowledge to know when to sit on a transaction and when to action it is generally not written down in formal terms.

I'm not sure how these shenanigans will translate into an ai driven system.


> Sometimes there can be an advantage in leading or lagging some aspects of internal accounting data for a time period.

This worked famously well for Enron.


That’s the kind of thing that can get a company into a lot of trouble with its auditors and shareholders. Not that I am offering accounting advice of course. And yeah, one can not “blame” and ai system or try to ai-wash any dodgy practices.

Checking someone elses spreadsheet is a fucking nightmare. If your company has extremely good standards it's less miserable because at least the formatting etc will be consistent...

The one thing LLMs should consistently do is ensure that formatting is correct. Which will help greatly in the checking process. But no, I generally don't trust them to do sensible things with basic formulation. Not a week ago GPT 5 got confused whether a plus or a minus was necessary in a basic question of "I'm 323 days old, when is my birthday?"


I think you have a misunderstanding of the types of things that LLMs are good at. Yes you're 100% right that they can't do math. Yet they're quite proficient at basic coding. Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

My concern would be more with how to check the work (ie, make sure that the formulas are correct and no columns are missed) because Excel hides all that. Unlike code, there's no easy way to generate the diff of a spreadsheet or rely on Git history. But that's different from the concerns that you have.


> Yes you're 100% right that they can't do math.

The model ought to be calling out to some sort of tool to do the math—effectively writing code, which it can do. I'm surprised the major LLM frontends aren't always doing this by now.


MS Office Tools menu has a "Spreadsheet Compare" application. It is quite good for diffing 2 spreadsheets. Of course it cannot catch logic errors, human or ML.

I've built spreadsheet diff tools on Google sheets multiple times. As the needs grows I think we will see diffs and commits and review tools reach customers

hey Collin! I am working on an AI agent on Google Sheets, I am curious if any of your designs are out in the public. We are trying to re-think how diffs should look like and want to make something nicer than what we currently have, so curious.

Hi! Nothing public nor generic enough to be a good building block. I found myself often frustrated by the tools that came out of the box but I believe better apis could make this slightly easier to solve.

The UX of spreadsheet diffs is a hard one to solve because of how weird the calculation loops are and how complicated the relationship between fields might be.

I've never tried to solve this for a real end user before in a generic way - all my past work here was for internal ability to audit changes and rollback catastrophes. I took a lot of shortcuts by knowing which cells are input data vs various steps of calculations -- maybe part of your ux is being able to define that on a sheet by sheet basis? Then you could show how different data (same formulas) changed outputs or how different formulas (same data) did differently?

Spreadsheets are basically weird app platforms at this point so you might not be able to create a single experience that is both deep and generic. On the other hand maybe treating it as an app is the unlock? Get your AI to noodle on what the whole thing is for, then show diff between before and after stable states (after all calculation loops stabilize or are killed) side by side with actual diffs of actual formulas? I feel like Id want to see a diff as a live final spreadsheet and be able to click on changed cells and see up the chain of their calculations to the ancestors that were modified.

Fun problem that sounds extremely complicated. Good luck distilling it!


> Most Excel work is similar to basic coding

Excel is similar to coding in BASIC, a giant hairy ball of tangled wool.


So do it in basic code where numbering your line G53 instead of G$53 doesn't crash a mass transit network because somebody's algorithm forgot to order enough fuel this month.

proficient != near-flawless.

> Most Excel work is similar to basic coding so I think this is an area where they might actually be pretty well suited.

This is a hot take. One I'm not sure many would agree with.


Excel work of people who make a living because of their excel skills (Bankers, VCs, Finance pros) is truly on the spectrum of basic coding. Excel use by others (Strategy, HR, etc.) is more like crude UI to manipulate small datasets (filter, sort, add, share and collaborate). Source: have lived both lives.

Maybe LLMs will enable a new type of work in spreadsheets. Just like in coding we have PR reviews, with an LLM it should be possible to do a spreadsheet review. Ask the LLM to try to understand the intent and point out places where the spreadsheet deviates from the intent. Also ask the LLM to narrate the spreadsheet so it can be understood.

That first condition "try to understand the intent" is where it could go wrong. Maybe it thinks the spreadsheet aligns with the intent, but it misunderstood the intent.

LLMs are a lossy validation, and while they work sometimes, when they fail they usually do so 'silently'.


Maybe we need some kind of method, framework to develop intent. Most of things that go wrong in knowledge working are down to lack of common understanding of intent.

> The one thing LLMs should consistently do is ensure that formatting is correct.

In JavaScript (and I assume most other programming languages) this is the job of static analysis tools (like eslint, prettier, typescript, etc.). I’m not aware of any LLM based tools which performs static analysis with as good a results as the traditional tools. Is static analysis not a thing in the spreadsheet world? Are there the tools which do static analysis on spreadsheets subpar, or offer some disadvantage not seen in other programming languages? And if so, are LLMs any better?


Just use a normal static analysis tool and shove the result to an LLM. I believe Anthropic properly figured that agents are the key, in addition to models, contrary to OpenAI that is run by a psycho that only believes in training the bigger model.

Last time, I gave claude an invoice and asked it to change one item on it, it did so nicely and gave me the new invoice. Good thing I noticed it had also changed the bank account number..

The more complicated the spreadsheet and the more dependencies it has, the greater the room for error. These are probabilistic machines. You can use them, I use them all the time for different things, but you need to treat them like employees you can't even trust to copy a bank account number correctly.


We’ve tried to gently use them to automate some of our report generation and PDF->Invoice workflows and it’s a nightmare of silent changes and absence of logic.. basic things like specifically telling it “debits need to match credits” and “balance sheets need to balance” that are ignored.

Yeah, asking llm to edit one specific thing in a large or complex document/ codebase is like those repeated "give me the exact same image" gifs. It's fundamentally a statistical model so the only thing we can be _certain_ of is that _it's not_. It might get the desired change 100% correct but it's only gonna get the entire document 99 5%

Something that Claude Sonnet does when you use it to code is write scripts to test whether or not something is working. If it does that for Excel (e.g. some form of verification) it should be fine.

Besides, using AI is an exercise in a "trust but verify" approach to getting work done. If you asked a junior to do the task you'd check their output. Same goes for AI.


The use cases for spreadsheets are much more diverse than that. In my experience, spreadsheets just as often used for calculation. Many of them do require high accuracy, rely on determinism, and necessitate the understanding of maths ranging from basic arithmetic to statistics and engineering formulas. Financial models, for example, must be built up from ground truth and need to always use the right formulas with the right inputs to generate meaningful outputs.

I have personally worked with spreadsheet based financial models that use 100k+ rows x dozens of columns and involve 1000s of formulas that transform those data into the desired outputs. There was very little tolerance for mistakes.

That said, humans, working in these use cases, make mistakes >0% of the time. The question I often have with the incorporation of AI into human workflows is, will we eventually come to accept a certain level of error from them in the way we do for humans?


Sysadmin of a small company. I get asked pretty often to help with a pivot table, vlookup, or just general excel functions (and smartsheet, these users LOVE smartsheet)

Indeed, in a small enough org, the sysadmin/technologist becomes support of last resort for all the things.

> these users LOVE smartsheet

I hate smartsheet…

Excel or R. (Or more often, regex followed by pen and paper followed by more regex.)


They're coming to me for pivot tables....

Handing them regex would be like giving a monkey a bazooka


>Does it have to be perfect? Also no.

Yeah, but it could be perfect, why are there humans in the loop at all? That is all just math!


I don't trust humans to do the kind of precise deterministic work you need in a spreadsheet!

Right, we shouldn’t use humans or LLMs. We should use regular deterministic computer programs.

For cases where that is not available, we should use a human and never an LLM.


I like to use Claude Code to write deterministic computer programs for me, which then perform the actual work. It saves a lot of time.

I had a big backlog of "nice to have scripts" I wanted to write for years, but couldn't find the time and energy for. A couple of months after I started using Claude Code, most of them exist.


That’s great and the only legitimate use case here. I suspect Microsoft will not try to limit customers to just writing scripts and will instead allow and perhaps even encourage them to let the AI go ham on a bunch of raw data with no intermediary code that could be reviewed.

Just a suspicion.


"regular deterministic computer programs" - otherwise known as the SUM function in Microsoft Excel

I don’t see the issue so much as the deterministic precision of an LLM, but the lack of observability of spreadsheets. Just looking at two different spreadsheets, it’s impossible to see what changes were made. It’s not like programming where you can run a `git diff` to see what changes an LLM agent made to a source code file. Or even a word processing document where the text changes are clear.

Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.

This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.

For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.


>I don't trust LLMs to do the kind of precise deterministic work

not just in a spreadsheet, any kind of deterministic work at all.

find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.

after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.

ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?


ML/AI is not my domain but you don’t have to get all that technical to understand that LLMs run on probability. We need a new architecture to solve these problems.

Most real-world spreadsheets I've worked with were fragile and sloppy, not precise and deterministic. Programmers always get shocked when they realize how many important things are built on extremely messy spreadsheets, and that people simply accept it. They rather just spend human hours correcting discrepancies than trying to build something maintainable.

Usually this is very hard because the tasks and the job often subtly shifts in somewhat unpredictable and unforeseen ways and there is no neat clean abstraction that you can just implement as an application. Too hererogeneous, too messy, too many exceptions. If you develop some clean elegant solution, next week there will be something that your shiny app doesn't allow and they'd have to submit a feature request or whatever.

In Excel, it's possible to just ad hoc adjust things and make it up as you go. It's not clean but very adaptable and flexible.


"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.

Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.

Exactly, and if it can be done in a way that helps users better understand their own spreadsheets (which are often extremely complex codebases in a single file!) then this could be a huge use case for Claude.

> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

I was thinking along the same lines, but I could not articulate as well as you did.

Spreadsheet work is deterministic; LLM output is probabilistic. The two should be distinguished.

Still, its a productivity boost, which is always good.


Do you trust humans to be precise and deterministic, or even to be especially good at math?

This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.


I trust humans to not be able to shoot the company on the foot without even realizing it.

Why are we suddenly ok with giving every underpaid and exploited employee a foot gun and expect them to be responsible with it???


Maybe my experience is unique, but I have seen so many security issues from underpaid and exploited employees being successfully phished, among many other abuses.

If your experience with the lowest, most-abused employees is better than mine, I envy you.


They're not great at arithmetic but at abstract mathematics and numerical coding they're pretty good actually.

It's widely known LLMs are terrible at even basic maths.

Claude for Excel isn't doing maths. It's doing Excel. If the llm is bad at maths then teaching it to use a tool that's good at maths seems sensible.


> I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.

Rightly so! But LLMs can still make you faster. Just don't expect too much from it.


If LLMs can replace mathematica for me when I'm doing affine yield curve calculations they can do a DCF for some banker idiots

LLMs are just a tool, though. Humans still have to verify them, like with very other tools out there

Eh, yes. In theory. In practice, and this is what I have experienced personally, bosses seem to think that you now have interns so you should be able to do 5x the output.. guess what that means. No verification or rubber stamp.

I couldn’t agree more. I get all my perfectly deterministic work output from human beings!

If only we had created some device that could perform deterministic calculations and then wrote software that made it easy for humans to use such calculations.

ok but humans are idiots, if only we could make some sort of Alternate Idiot, a non-human but every bit as generally stupid as humans are! This A.I would be able to do every stupid thing humans did with the device that performed deterministic calculations only many times faster!

Yes and when the AI did that all the stupid humans could accept its output without question. This would save the humans a lot of work and thought and personal responsibility for any mistakes! See also Israel’s Lavender for an exciting example of this in action.

you might trust when the precision is extremely high and others agree with that.

high precision is possible because they can realize that by multiple cross validations


ChatGPT is actively being used as a calculator.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: