Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I pull messy data from a remote source (think OCR-ed invoices for example), and need to clean it up. Every day I get around 1k new rows. The way in which it's messed up changes frequently, and while I don't care about it being 100% correct, any piece of code (relying on rules, regex, heuristics and other such stuff) would break in a couple of weeks. This means I need at least a part time developer on my team, costing me a couple of thousands per month.

Or I can pass each row through an LLM and get structured clean output out for a couple of dollars per month. Sure, it doesn't work 100%, but I don't need that, and neither could the human-written code do it.

Effectively, LLM resulted in one less develper hired on our team.



It resulted in one less developer, but you're still using that tool, right? Didn't a human (you) point the LLM at this problem and think this through?

That fired developer now has the toolset to become a CEO much, much easier than pre-LLM era. You didn't really make him obsolete. You made him redundant. I'm not saying he's gonna become a CEO, but trudging through programming problems is much easier for him as a whole.

Redundancies happen all the time and they don't end career types. Companies get bought, traded, and merged. Whenever this happens the redundant folk get the axe. They follow on and get re-recruited into another comfy tech job. That's it really.


Human or LLM the trick with messy inputs from scanned sources is having robust sanity combs that look for obvious fubar's and a means by which end data users can review the asserted values and the original raw image sources (and flag for review | alteration).

At least in my past experience with volumes of transcribed data for applications that are picky about accuracy.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: