I pull messy data from a remote source (think OCR-ed invoices for example), and need to clean it up. Every day I get around 1k new rows. The way in which it's messed up changes frequently, and while I don't care about it being 100% correct, any piece of code (relying on rules, regex, heuristics and other such stuff) would break in a couple of weeks. This means I need at least a part time developer on my team, costing me a couple of thousands per month.
Or I can pass each row through an LLM and get structured clean output out for a couple of dollars per month. Sure, it doesn't work 100%, but I don't need that, and neither could the human-written code do it.
Effectively, LLM resulted in one less develper hired on our team.
It resulted in one less developer, but you're still using that tool, right? Didn't a human (you) point the LLM at this problem and think this through?
That fired developer now has the toolset to become a CEO much, much easier than pre-LLM era. You didn't really make him obsolete. You made him redundant. I'm not saying he's gonna become a CEO, but trudging through programming problems is much easier for him as a whole.
Redundancies happen all the time and they don't end career types. Companies get bought, traded, and merged. Whenever this happens the redundant folk get the axe. They follow on and get re-recruited into another comfy tech job. That's it really.
Human or LLM the trick with messy inputs from scanned sources is having robust sanity combs that look for obvious fubar's and a means by which end data users can review the asserted values and the original raw image sources (and flag for review | alteration).
At least in my past experience with volumes of transcribed data for applications that are picky about accuracy.
I do! I posted this a while down below but I guess the way the algorithm here works it got super deprioritized. Full repost:
I can chip in from my tech consulting job where we ship a few GenAI projects to several AWS clients via Amazon Bedrock. I'm senior level but most people here are pretty much insulated.
I think whoever commented once here about more complex problems being tackled, (and the nature of these problems becoming broader) is right on the money. Newer patterns around LLM-based applications are emerging and having seen them first hand, they seem like a slightly different paradigm shift in programming. But they are still, at heart, programming questions.
A practical example: company sees GenAI chatbot, wants one of their own, based on their in-house knowledge base.
Right then and there there is a whole slew of new business needs with necessary human input to make it work that ensues.
- Is training your own LLM needed? See a Data Engineer/Data engineering team.
- If going with a ready-made solution, which LLM to use instead? Engineer. Any level.
- Infrastructure around the LLM of choice. Get DevOps folk in here. Cost assessment is real and LLMs are pricey. You have to be on top of your game to estimate stuff here.
- Guard rails, output validation. Engineers.
- Hooking up to whatever app front-end the company has. Engineers come to the rescue again.
All these have valid needs for engineers, architects/staff/senior what have you — programmers. At the end of the day, these problems devolve into the same ol' https://programming-motherfucker.com
Do you have any examples of where/how that would work? It has seemed for me like lot of the hype is "they'll be good" with no further explanation.