Hacker Newsnew | past | comments | ask | show | jobs | submit | hhimanshu's commentslogin

Great resource, definitely a good place to take the next step. As I looked into detail, the natural question came (based on software developing experience), how do I evaluate the correctness of output produced by LLM given the inputs. Clearly, unit test with fixed in/out pairs won't help so learning methods to evaluate as we develop iteratively will be very useful.

Thanks for sharing the article!



May be if you pick a real-world agent workflow (toy from your production experience, trim it down), and showcase how all these factors will come along in a project.

I am inspired by the simplicity of these 12 factors and definitely want to learn more with an example that embraces these factors.


I link in a few places to https://github.com/got-agents/agents where I have a few of these real agents


Thank you, I will take a look


I am wondering how libraries like DSPY [0] fits in your factor-2 [1]

As I was reading, I saw mention of BAML > (the above example uses BAML to generate the prompt ...

Personally, in my experience hand-writing prompts for extracting structured information from unstructured data has never been easy. With DSPY, my experience has been quite good so far.

As you have used raw prompt from BAML, what do you think of using the raw prompts from DSPY [2]?

[0] https://dspy.ai/

[1] https://github.com/humanlayer/12-factor-agents/blob/main/con...

[2] https://dspy.ai/tutorials/observability/#using-inspect_histo...


interesting - I think I have to side with the Boundary (YC W23) folks on this one - if you want bleeding edge performance, you need to be able to open the box and hack on the insides.

I don't agree fully with this article https://www.chrismdp.com/beyond-prompting/ but the comparison of punchards -> assembly -> c -> higher langs is quite useful here

I just don't know when we'll get the right abstraction - i don't think langchain or dspy are the "C programming language" of AI yet (they could get there!).

For now I'll stick to my "close to the metal" workbench where I can inspect tokens, reorder special tokens like system/user/JSON, and dynamically keep up with the idiosyncrasies of new models without being locked up waiting for library support.


It's always true that you need to drop down a level of abstraction in order to extract the ultimate performance. (eg I wrote a decent-sized game + engine entirely in C about 10 years ago and played with SIMD vectors to optimise the render loop)

However, I think the vast majority of use cases will not require this level of control, and we will abandon prompts once the tools improve.

Langchain and DSPY are also not there for me either - I think the whole idea of prompting + evals needs a rethink.

(full disclaimer: I'm working on such a tool right now!)


i'd be interested to check it out

here's a take, I adapted this from someone on the notebookLM team on swyx's podcast

> the only way to build really impressive experiences in AI, is to find something right at the edge of the model's capability, and to get it right consistently.

So in order to build something very good / better than the rest, you will always benefit from being able to bring in every optimization you can.


I think the building blocks of the most impressive experiences will come from choosing the exact right point to involve an LLM, the orchestration of the component pieces, and the user experience.

That's certainly what I found in games. The games which felt magic to play were never the ones with the best hand rolled engine.

The tools aren't there yet to ignore prompts, and you'll always need to drop down to raw prompting sometimes. I'm looking forward to a future where wrangling prompts is only needed for 1% of my system.


yeah. the issue is when you're baked into a tool stack/framework where you cant go customize in that 1% of cases. A lot of tools try to get the right abstractions where you can "customize everything you would want to" but they miss the mark in some cases


100%. You can't and shouldn't wrap every interaction. We need a new approach.


looking forward to the new tool


Yes, I love using the code and AI tools like Claude, Github CoPilot are great at it. But when solving a business problem, we still need to "see it" before start writing the code (at least this is my preferred way)


This is good project. I built my first chrome extension using https://crxjs.dev/vite-plugin/getting-started and there was lot of fiddling to et styles working.

This could also be because this is my first browser extension development. https://chromewebstore.google.com/detail/bettermenu-for-door...

There is still lot more work to do, so I will checkout your project. Thanks for sharing!


That's great! I'm glad you will give it a try


Spark works great for me too!


thank you. Could you elaborate on this method please?


I have made one side project ever since 2013 (I am working from 2011 and made few failed project, but learnt a lot). In 2013, we had exactly same problem - tracking expenses and do personal budgeting. My wife was using spreadsheets and I created a software (using Python and Flask) and released a webapp. Later we realized that an app would be much more useful, so rewrote the backend (Java EE) and front-end(Objective C), but never released it. We have however 3 years worth of our own data (never did analysis or charts). Lately, we feel that since our app never made to App Store, it crashes every 7 days and I need to reinstall/use XCode to do this work (very painful). I plan to rewrite again in Scala (my latest favorite) and use React-Native to build the app. This time I would like to publish it


I enjoy working with Akka to build reactive applications. Clean, Readable, Testable


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: