Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can definitely echo the challenges of debugging non-trivial LLM apps, and making sure you have the right evals to validate progress. I spent many hours optimizing Copilot Workspace, and there is definitely both an art and a science to it :)

That said, I’m optimistic that tool builders can take on a lot of that responsibility, and create abstractions that allow developer to focus solely on their code, and the problem at hand.



For sure! As a user, I would love to be able to have some sort of debugger like behavior for debugging the LLM's output generation. Maybe some ability for the LLM to keep on running some tests until they pass? That sort of stuff would make me want to try this :)


see langtail app (I am not maker)


I'm sure we'll share some of the strategies we used here in upcoming talks. It's, uh, "nontrivial". And it's not just "what text do you stick in the prompt".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: