Alan Kay described [1] what he considers the first object oriented system, made in the 60s by an unknown programmer. It was a tape-based storage system, where the "format" of the tap was a set of routines to read, write, etc. at a known offsets on the tape.
This looks like RAG...? That's fine, RAG is a very broad approach and there's lots to be done with it. But it's not distinct from RAG.
Searching by embedding is just a way to construct queries, like ILIKE or tsvector. It works pretty nicely, but it's not distinct from SQL given pg_vector/etc.
The more distinctive feature here seems to be some kind of proxy (or monkeypatching?) – is it rewriting prompts on the way out to add memories to the prompt, and creating memories from the incoming responses? That's clever (but I'd never want to deploy that).
From another comment it seems like you are doing an LLM-driven query phase. That's a valid approach in RAG. Maybe these all work together well, but SQL seems like an aside. And it's already how lots of normal RAG or memory systems are built, it doesn't seem particularly unique...?
RAG, or Retrieval Augmented Generation, is an AI technique that improves large language models (LLMs) by connecting them to external knowledge bases to retrieve relevant, factual information before generating a response. This approach reduces LLM "hallucinations," provides more accurate and up-to-date answers, and allows for responses grounded in specialized or frequently updated data, increasing trust and relevance.
I was unaware what RAG referred to, perhaps other too.
I'm trying to see if there's something specifically for streaming/generators. I don't think so? Of course you can use callbacks, but you have to implement your own sentinel to mark the end, and other little corner cases. It seems like you can create a callback to an anonymous function, but then the garbage collector probably can't collect that function?
---
I don't see anything about exceptions (though Error objects can be passed through).
I get how it works: remotePromise.map(callback) will invoke the callback to see how it behaves, then make it behave similarly on the server. But it seems awfully fragile... I am assuming something like this would fail (in this case probably silently losing the conditional):
I think the biggest question I have is: how would I apply this to my boring stateless-HTTP server? I can imagine something where there's a worker that's fairly simple and neutral that the browser connects to, and proxies to my server. But then my server can also get callbacks that it can use to connect back to the browser, and put those callbacks (capability?) into a database or something. Then it can connect to a worker (maybe?) and do server-initiated communication. But that's only good for a session. It has to be rebuilt when the browser network connection is interrupted, or if the browser page is reloaded.
I can imagine building that on top of Cap'n Web, but it feels very complicated and I can equally imagine lots of headaches.
Note that the dispose method will be called automatically when the caller disposes the stub or when they disconnect the RPC session. The `end()` method is still useful as a way to distinguish a clean end vs. an abort.
In any case, you implement this interface, and pass it over the RPC connection. The other side can now call it back to write chunks. Voila, streaming.
That said, getting flow control right is a little tricky here: if you await every `write()`, you won't fully utilize the connection, but if you don't await, you might buffer excessively. You end up wanting to count the number of bytes that aren't acknowledged yet and hold off on further writes if it goes over some threshold. Cap'n Proto actually has built-in features for this, but Cap'n Web does not (yet).
Workers RPC actually supports sending `ReadableStream` and `WritableStream` (JavaScript types) over RPC. I'd like to support that in Cap'n Web, too, but haven't gotten around to it yet. It'd basically work exactly like above, but you get to use the standard types.
---------------------
Exceptions work exactly like you'd expect. If the callee throws an exception, it is serialized, passed back to the caller, and used to reject the promise. The error also propagates to all pipelined calls that derive from the call that threw.
---------------------
The mapper function receives, as its parameter, an `RpcPromise`. So you cannot actually inspect the value, you can only pipeline on it. `friend.isBestFriend ?` won't work, because `friend.isBestFriend` will resolve as another RpcPromise (for the future property). I suppose that'll be considered truthy by JavaScript, so the branch will always evaluate true. But if you're using TypeScript, note that the type system is fully aware that `friend` is type `RpcPromise<Friend>`, so hopefully that helps steer you away from doing any computation on it.
I'll definitely be watching out for more built-in streaming support. Being able to throw the standard types directly over the wire and trust that the library will handle optimally utilizing the connection would make this the RPC library that I've been looking for all year.
Re: RpcPromise, I'm pretty sure all logical operations will result in unexpected results. TypeScript isn't going to complain about using RpcPromise as a boolean.
Overloading .map() does feel a bit too clever here, as it has this major difference from Array.map. I'd rather see it as .mapRemote() or something that immediately sticks out.
I can imagine a RpcPromise.filterRemote(func: (p: RPCPromise) => RPCPromise) that only allows filtering on the truthiness of properties; in that case the types really would save someone from confusion.
I guess if the output type of map was something like:
... then you'd catch most cases, because there's no good reason to have any constant/literal value in the return value. Almost every case where there's a non-RpcPromise value is likely some case where a value was calculated in a way that won't work.
Though another case occurs to me that might not be caught by any of this:
result = aPromise.map(friend => {...friend, nickname: getNickname(friend.id, userId)})
The spread operator is a pretty natural thing to use in this case, and it probably doesn't work on an RpcPromise?
There were some responses about educational expectations, but I would love to hear how folks in these Asian countries specifically deal with cell phones, social media, and these general media/online distractions.
Respect (or lack thereof) goes both ways: both the writer and reader. I have frequently felt disrespected by producing documentation, planning/etc that isn't read. In the end I mostly rely on oral transmission of knowledge because then at least I can read the room and know if I'm providing some value to people, and ultimately we're both trapped in the room together and have to invest the same amount of time.
The LLM isn't always smart, but it's always attentive. It rewards that effort in a way that people frequently don't. (Arguably this is a company culture issue, but it's also a widespread issue.)
Great framing of the problem. I do think it's a culture issue with "Agile" practices in particular - By design, there is no time budgeted for reading, writing, reflection, or discussion. Sprint, sprint, sprint.
In organizations that value innovation, people will spend time reading and writing. It's a positive feedback loop, almost a litmus test of quality of the work culture.
My experience writing in a professional setting is that people mostly don't read what I write, and the more effort I put into being thorough the less likely that it will be read.
Agreed, and I would argue the super quick turnaround time an interactive discussion during a planning phase make this much more enjoyable.
I also enjoy discussing solutions with people in real time too. But writing documentation in a vacuum without any feedback or even knowing if someone will read the spec?? Soul draining stuff.
In fact, the best of both worlds would be having a discussion with someone else (real person) while an AI agent listens, takes notes, and provides feedback / insights using different models. Vetting your ideas etc.
Probably ignoring things like robots.txt, I'm guessing? But I'd be curious what exactly the list of things is, and if it's growing. Would it go as far as ChatGPT filling in CAPTCHAs?
autocomplete="off" is an instance of something that user agents willfully ignore based on their own heuristics, and I'm assuming accessibility tools have always ignored a lot of similar things.
The surgeon general warning came along with a large number of other measures to reduce smoking. If that warning had an effect, I would guess that effect was to prime the public for the other measures and generally to change consensus.
BUT, I think it's very likely that the surgeon general warning was closer to a signal that consensus had been achieved. That voice of authority didn't actually _tell_ anyone what to believe, but was a message that anyone could look around and use many sources to see that there was a consensus on the bad effects of smoking.
1. I really like the "commitment" concept. That solves a real conversational problem where the AI can be too easy to redirect, moving on too fluidly from previous conversational beats. And the AI will easily make commitments that it can't or won't keep, so tracking them is good.
2. Reflection is a good approach. I think this is generally in the zone of "memory", though a more neutral term like insight or observation can be better for setting expectations. There's a lot of systems that are using explicit memory management, with tools to save or load or search memories, and I don't think that's very good. I include both techniques in my work because sometimes the AI wants to KNOW that it has remembered something. But maybe the commitment idea is a better way to think about it. Reflection lets the memory be built from a larger context. And usually the peak moment when a memory would be explicitly stored isn't actually the final moment, and so a reflective memory will be more nuanced and correct.
3. It's good to create a model for personality. I should probably be more explicit in my own work, though I guess I focus mostly on behavioral aspects: how the AI should act toward the user, not what the AI's "identity" is. But generally I don't trust scores. A score implies a rubrik already embedded in the model, and to the degree that even exists the rubrik is unstable, not portable between models, and changes can be arbitrary. Instead I like to use terms that imply the rubrik. So if you take Big Five then I'd create terms for each attribute and score and use those terms exclusively, ignoring numbers entirely. For instance for neuroticism you might have Unflappable → Even-keeled → Sensitive → Reactive → Vulnerable.
4. I can't tell if Emergence Metrics are prescriptive or descriptive. I'm guessing it's actually unclear in the implementation as well. The AI can pretend to be all kinds of things, but I think you are trying to get past just pretend.
Thanks for commenting! These are really helpful. Super helpful framing.
Here's where my thinking is going (I could be totally wrong, but this is new ground for me);
You nailed the problem on commitments. A lot of AIs will say “I’ll do X” and then immediately let the thread drift. PMM logs those as commit_open events, and tracked promises.They don’t close unless there’s actual evidence (file, PR link, or at minimum a Done: markers that gets picked up by the BehaviorEngine).
That’s why my close rates look brutally low right now. I’d rather see a truthful 0.000% than a fake 100% “done.”
Over time, the evidence hooks should help close more loops, but always with proof. Or at least that's what I'm trying to nail down. lol
I went with “reflection” because it emphasizes the recursive/self-referential aspect, but “insight” or “observation” might be clearer. Functionally, it’s closer to what you described, building memory from a broader context, rather than snap-shotting a single moment.
The personality scores are a just a raw blunt tool at moment. Right now I’m using IAS/GAS metrics as scaffolding, but I don’t think numbers are the endgame. I am leaning toward descriptors, or tiers within the traits, as stable representations of states within these traits. The question is, how far down do I nest?
The emergence metrics are supposed to be descriptive. I’m trying to measure what’s happening, not tell the model what it should become. In early runs, they’re mostly flat, but the hope is that with continuity and reflection, I'll see them drift in ways that track identity change over time.
If I were to be completely honest, this is a thought experiment being fleshed out. How can I create a personal AI that's model agnostic, portable, and develops in alignment in a manner that is personalized to the person using it?
So far, things seems to be tracking in the right direction from what I can see. Either that, or I'm constructing the world most amazing AI confabulation LARP machine. :)
Either way, I'm pulling my hair out in the process.
Thought: if one of these automation tools wants to do some deep research task, is it legit if it just goes to chatgpt.com or notebooklm.google.com?
Obviously Anthropic or OpenAI doesn't need to do this, but there are a dozen other browser automation tools which aren't backed with these particular features, and whose users are probably already paying for one of these services.
When ChatGPT first came out there were lots of people using extensions to get "free" API calls (that just ran the completion through the UI). They blocked these, and there's terms of service or whatever to disallow them. But these companies are going to try to construct a theory where they can ignore those rules in other service's terms of service. And then... turnabout's fair play?
So, prior art! :)
[1] https://www.cs.tufts.edu/comp/150FP/archive/alan-kay/smallta... (page 4)
reply