AFAICS this has nothing to do with "open-source personal AI engines".
The recorded history is stored in a SQLite database and is quite trivial to examine[0][1]. A simple script could extract the information and feed them to your indexer of choice. Developing such a script isn't the task for an internet browser engineering team.
The question remains whether the indexer would really benefit from real-time ingestion while browsing.
This is a good headline. LLMs are remarkably good at writing code. Writing code isn't the same thing as delivering working software.
A human expert needs to identify the need for software, decide what the software should do, figure out what's feasible to deliver, build the first version (AI can help a bunch here), evaluate what they've built, show it to users, talk to them about whether it's fit for purpose, iterate based on their feedback, deploy and communicate the value of the software, and manage its existence and continued evolution in the future.
Some of that stuff can be handled by non-developer humans working with LLMs, but a human expert needs who understands code will be able to do this stuff a whole lot more effectively.
I guess the big question is if experienced product management types can pick up enough coding technical literacy to work like this without programmers, or if programmers can pick up enough enough PM skills to work without PMs.
My money is on both roles continuing to exist and benefit from each other, in a partnership that produces results a lot faster because the previously slow "writing the code" part is a lot faster than it used to be.
If you can get malicious instructions into the context of even the most powerful reasoning LLMs in the world you'll still be able to trick them into outputting vulnerable code like this if you try hard enough.
I don't think the fact that small models are easier to trick is particularly interesting from a security perspective, because you need to assume that ANY model can be prompt injected by a suitably motivated attacker.
On that basis I agree with the article that we need to be using additional layers of protection that work against compromised models, such as robust sandboxed execution of generated code and maybe techniques like static analysis too (I'm less sold on those, I expect plenty of malicious vulnerabilities could sneak past them.)
Me too. When they removed the option to download books I liberated everything I had ever bought, moved to Kavita+koreader and will never buy a kindle book again.
I jailbroke both kindles. And use koreader on them which now supports progress sync with Kavita which is amazing! So I don't really lose functionality.
Yeah I think so. With LangChain, LangFlow took off because it was the "no-code" n8n style version that was layered on top of LangChain. To me it was always frustrating that it wasn't one ecosystem // fully interoperable. We're looking to make sure there's a good solution that works in either modality for agents.
Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.
Yet a week ago I used Claude Code for my personal finances (not taxes) - I downloaded over a year’s worth of my bank account data. Since I pay for most things by card, if I buy lunch, it’s there.
With a single prompt (and about 10 minutes), it produced an analysis. It solved all the technical issues by itself (e.g., realizing it wasn’t CSV but TSV) and ran quite a few different explorations with Pandas. It was able to write an overview, find items that were likely misclassified, etc.
Everything I checked by hand was correct.
So, instead of pursuing a project to write an AI tool for personal finance, I ended up concluding: “just use Claude Code.” As a side note, I used 14 months of data due to my mistake - I wanted to analyze 2 months of data, since I didn’t believe it would handle a larger set, but I misclicked the year. The file was 350 KB.
English is a language in which words can have more than one meaning depending on the context they are used in.
"I am sick" meaning that I am not feeling well.
In the context of someone watching a performance: "This is sick!" as an exclamation that something they saw was impressive.
In the context of someone watching something gut wrenching: "This is sick!" as an exclamation that something they saw was horrible and unpleasant.
In the context of someone stepping in a puddle of vomit while walking around London: "This is sick!?" as a dismayed exclamation of realisation that what they stepped in was a puddle of vomit.
I can go on, and so can anyone who can speak English.
So, please stop this bullshit of "I want this word to mean this one thing that it has not meant at least the past 30 years."
It's extremely useful that we can agree on what words mean despite the fuzziness of English. You are intentionally trying to muddy the water.
Another fun one. The Cosmic Calendar. [1] Imagine breaking down the history of the universe into a single year. It really offers some amazing perspective on the length of life, and what it means.
Yes. The training process requires big expensive GPUs. The model it produces has 561M parameters, which should run on even a high end mobile phone (I run 4B models on my iPhone).
I've always thought about the best way to contribute to humanity: number of people you help x how much you help them. I think what Karpathy is doing is one of the highest leverage ways to achieve that.
Our current world is build on top of open source projects. This is possible because there are a lot of free resources to learn to code so anyone from anywhere in the world can learn and make a great piece of software.
I just hope the same will happen with the AI/LLM wave.
Cool. Is there a simple "howto" on running this repo with training on W&B for a programmer like me who has never done model training flows? Maybe you could share the steps you took?
Curios to try it someday on a set of specialized documents. Though as I understand the cost of running this is whatever GPU you can rent with 80GB of VRAM. Which kind of leaves hobbyists and students out. Unless some cloud is donating gpu compute capacity.
I recommend McGill's Back Mechanic book, which is an end-user focused distillation of his academic work.
It suggests simple tests to discover exactly where your pain is coming from and then appropriate exercises to mechanically strengthen the right area and a few workarounds to avoid stressing that area in regular life e.g. alternate ways to pick up light items from the floor.
McGillcs big three are three simple exercises that are generally good for those with no patience for ordering a book and intros to them can be found all over YouTube.
It'll be interesting to see if they still can design and build a new ground-up airplane design. The last all-new design was the 787, initiated in 2003 and launched in 2009, and its design was fraught with problems. Before then was the 777 in the early 90s (pre-McDonnell takeover), and the 757/767 in the early 80s.
There's a phenomena that ofter occurs with large organizations where once their markets mature, everybody who can build a product end-to-end leaves or gets forced out, leaving only people with highly specialized maintenance skillsets. The former group has no work to do, after all, so why should the company keep them around? But then if the market ecosystem shifts, and a new product is necessary, they no longer have the capacity to build ground-up new products. All those people have left, and won't come anywhere near the company.
Steve Jobs spoke eloquently about this phenomena in an old interview:
In 2023 a friend and I started a monthly dinner club with the goal of eating around the world without getting on a plane. We gather once a month at a restaurant on Long Island for a meal focused on a theme or region of the world. The meals are around 10+ courses and include a drink. We work with the restaurant to craft a menu that is as close to authentic to the region as possible.
Our first dinner was with 13 friends and has since grown into a group of just about 1,000 members. Last year we generated around $140k for local restaurants on off nights (dinners are on Tues and Wed when business is slow).
Now we are working on evolving into more of a lifestyle brand for people who love food. I'm currently working on our clothing line and new site, which we quietly launched a few days ago (there's still a few odds and ends to finish): https://www.deadchefssociety.com. Would love any feedback!
This is a solo startup that I've been working on for 2 years now. It's a labor of love and I'm very lucky and thankful that it's big enough to surprisingly pay all of our bills. Still constantly feeling FOMO over all of my startup buddies working with AI and LLMs while I plug away at old maps and GIS .
It gets ~80K MAUs and just slowly and consistently is growing organically through word of mouth through history focused communities. I'm currently playing with expanding the coverage internationally as I still only support the US which is a wickedly fun project.
The way it works is the user registers / imports MCP (Model Context Protocol) servers they would like to use. All the tools of those servers are imported and then the firewall uses structured LLM calls to decide what types of action the tool performs among:
- read private data (e.g. read a local file or read your emails)
- perform an activity on your behalf (e.g. send an email or update a calendar invite)
- read public data (e.g. search the web)
The idea is that if all 3 types of tool calls are performed in a single context session, the LLM is vulnerable to jailbreak attacks (e.g. reads personal data -> reads poisoned public data with malicious instructions -> LLM gets tricked and posts personal data).
Once all the tools are classified the user can go inside and make any adjustments and then they are given the option to set up the gateway as an MCP server in their LLM client of choice. For each LLM session the gateway keeps track of all tool calls and, in particular, which action types are raised in the session. If a tool call is attempted that raises all action types for a session, it gets blocked and the user gets a notification, which sends them to the firewall UI where they can see the offending tool calls, and decide to either block the most recent one or add the triggering "set" to an allowlist.
Next steps are transitioning from the web UI for the product to a desktop app with a much cleaner and more streamlined UI. We're still working on improving the UX but the backend is solid and we would really like to get some more feedback for it.
[0] https://www.youtube.com/watch?v=1a9HLrwvUO4&t=15s