You should probably change the name, I thought it was associated with Manus usually projects like this will specifically have in the tag line “An Open Source alternative to X.”
Other than that looks very cool, reading through the code now. A lot of these projects very heavily lean on Browser Use.
then why don’t they co-locate teams when they get RTO’d? I keep hearing about people who have to go sit in a mandatory hot desk but are still stuck on Zoom all day. Seems like the worst of both worlds
It’s ordinary corporate dysfunction. The mandates come top-down. People in management don’t think too hard about exceptions. The people making decisions are far-removed from the consequences of their decisions.
It’s not really an exception though. These are the same people who spent the last 20 years singing the praises of offshoring and follow-the-sun. It’s just trend chasing.
Honestly I think the mistake we made was calling it “work from home” instead of “telecommuting”.
> Honestly I think the mistake we made was calling it “work from home” instead of “telecommuting”.
I am curious. Why do you think calling it telecommuting would have made any significant difference? And what difference would it have made? Where do you imagine we would be today if more people referred to it with that word?
My drive-by opinion: "telecommuting" has an advantage in optics/marketing and flexibility over "work from home" for both business leaders and employees. If I tell a board of directors or shareholders that "80% of our workforce performs some fraction of their weekly tasks ______", I imagine the following:
- "from home" elicits images of relaxation and lost productivity, while "via telecommuting" sounds like a commute still takes place and work is just as productive
- "from home" sounds like retreating to a comfort zone, while "via telecommuting" sounds like embracing a new technology or skill
- "from home" sounds like remote workers ought to be performing their tasks from their domicile only, while "via telecommuting" sounds like remote workers can do their work wherever they are
If businesses had adopted "telecommuting" terminology, I believe business leaders would not feel obligated to push back in order to regain productivity. I think it's easier to attack the trend of WFH given the points above. I actually agree with the proposal that WFH is a weak terminology, but had never sat down and thought about it before.
From the perspective of initial adoption, I think it would have happened just as fast. Workers were thrust into remote work arrangements during COVID, and everyone would have quickly gotten the gist of what "telecommuting" means, so it would have been the new buzzword to attract talent in job listings just as "remote" or "work from home" have been. Just without the downsides in CEO perception.
RTOs generally have nothing to do with any of the things they say. They are just layoffs.
You can't argue with them about the effectiveness of remote work. They aren't trying to optimize work. They are trying to fire people.
Working from home doesn't fire people, being more productive and happy doesn't fire people. Your mental well being doesn't have any bearing on how many people they need to fire.
Reasoning models spend a whole bunch of time reasoning before returning an answer. I was toying with QWQ 32B last night and ran into one question I gave it where it spent 18 minutes at 13tok/s in the <think> phase before returning a final answer. I value local compute but reasoning models aren’t terribly feasible at this speed since you don’t really need to see the first 90% of their thinking output.
Exactly! I run it on my old T7910 Dell workstation (2x 2697A V4, 640GB RAM) that I build for way less than a $1k. But so what, it's about ~2 tokens / s. Just like you said, it's cool that it's run at all, but that's it.
It's meant to be a test/development setup for people to prepare the software environment and tooling for running the same on more expensive hardware. Not to be fast.
I remember people trying to run the game Crysis using CPU rendering. They got it to run and move around. People did it for fun and the "cool" factor. But no one actually played the game that way.
It's the same thing here. CPUs can run it but only as a gimmick.
> It's the same thing here. CPUs can run it but only as a gimmick.
No, that's not true.
I work on local inference code via llama.cpp, on both GPU and CPU on every platform, and the bottleneck is much more ram / bandwidth than compute.
Crappy Pixel Fold 2022 mid-range Android CPU gets you roughly same speed as 2024 Apple iPhone GPU, with Metal acceleration that dozens of very smart people hack on.
Additionally, and perhaps more importantly, Arc is a GPU, not a CPU.
The headline of the thing you're commenting on, the very first thing you see when you open it, is "Run llama.cpp Portable Zip on Intel GPU"
Additionally, the HN headline includes "1 or 2 Arc 7700"
It's both compute and bandwidth constrained - just like trying to run Crysis on CPU rendering.
A770 has 16GB of RAM. You're shuffling data to the GPU at a rate of 64GB/s, which is magnitudes slower than the internal VRAM of the GPU. Hence, this setup is memory bandwidth constrained.
However, once you want to use it to do anything useful like a longer context size, the CPU compute will be a huge bottleneck for time-to-first-token as well as tokens/s.
Trying to run a model this large, and a thinking one at that, on CPU RAM is a gimmick.
Okay, let's stipulate LLMs are compute and bandwidth sensitive (of course!)...
#1, should highlight it up front this time: We are talking about _G_PUs :)
#2 You can't get a single consumer GPU that has enough memory to load a 670B parameter model, there's some magic going on here. It's notable and distinct. This is probably due to FlashMoE, given it's prominence in the link.
TL;Dr: 1) these are Intel _G_PUs, and 2) it is a remarkable distinct achievement to be loading a 670B parameter model on only one to two cards
Can you share what LLMs do you run on such small devices/what user case they address?
(Not a rhetorical question, it's just that I see a lot of work on local inference for edge devices with small models, but I could never get a small model to work for me. So I'm curious about other people's user cases.)
Excellent and accurate q. You sound like the first person I've talked to who might appreciate full exposition here, apologies if this is too much info. TL;DR is you're def not missing anything, and we're just beginning to turn a corner and see some rays of light of hope, where it's a genuine substitute for remote models in consumer applications.
#1) I put a lot of effort into this and, quite frankly, it paid off absolutely 0 until recently.
#2) The "this" in "I put a lot of effort into this", means, I left Google 1.5 years ago and have been quietly building an app that is LLM-agnostic in service of coalescing a lot of nextgen thinking re: computing I saw that's A) now possible due to LLMs B) was shitcanned in 2020, because Android won politically, because all that next-gen thinking seemed impossible given it required a step change in AI capabilities.
This app is Telosnex (telosnex.com).
I have a couple stringent requirements I enforce on myself, it has to run on every platform, and it has to support local LLMs just as well as paid ones.
I see that as essential for avoiding continued algorithmic capture of the means of info distribution, and believe on a long enough timeline, all the rushed hacking people have done to llama.cpp to get model after model supported will give away to UX improvements.
You are completely, utterly, correct to note that the local models on device are, in my words, useless toys, at best. In practice, they kill your battery and barely work.
However, things did pay off recently. How?
#1) llama.cpp landed a significant opus of a PR by @ochafik that normalized tool handling across models, as well as implemented what the models need individually for formatting
#2) Phi-4 mini came out. Long story, but tl;dr: till now there's been various gaping flaws with each Phi release. This one looked absent of any issues. So I hack support for its tool vagaries on top of what @ochafik landed, and all of a sudden I'm seeing the first local model sub-Mixtral 8x7B that's reliably handling RAG flows (i.e. generate search query, then, accept 2K tokens of parsed web pages and answer a q following directions I give you) and tool calls (i.e. generate search query, or file operations like here: https://x.com/jpohhhh/status/1897717300330926109)
That's because the OP is linking to the quickstart guide. There are benchmark numbers on the github's root page, but it does not appear to include the new deepseek yet:
There’s a good chance the reason this is public is because he was using gist to transfer ChatGPT code between his personal computer and his government computer.
Probably true. I work in a regulated space and I have done this to get my zsh config to a secure laptop, but I have always had the common sense to mark the gist as hidden (can still be accessed if you know the hash, obvs).
I think ruby can get popular again with the sort of contrarian things Rails is doing like helping developers exit Cloud.
There isn’t really a much more productive web dev setup than Rails + your favorite LLM tool. Will take time to earn Gen Z back to Rails though and away from Python/TS or Go/Rust.
My impression is that a Rails app is an unmaintainable dynamically-typed ball of mud that might give you the fast upfront development to get to a market or get funded but will quickly fall apart at scale, e.g. Twitter fail whale. And Ruby is too full of "magic" that quickly makes it too hard to tell what's going on or accidentally make something grossly inefficient if you don't understand the magic, which defeats the point of the convenience. Is this perception outdated, and if so what changed?
If the the Twitter fail whale is your concern, then your perception is outdated. Twitter started moving off Ruby in 2009. Both the CRuby VM and Rails have seen extensive development during that decade and a half.
I never worked at Twitter, but based on the timeline it seems very likely they were running on the old Ruby 1.8.x line, which was a pure AST interpreter. The VM is now a bytecode interpreter that has been optimized over the intervening years. The GC is considerably more robust. There's a very fast JIT compiler included. Many libraries have been optimized and bugs squashed.
If your concern is Rails, please note that also has seen ongoing development and is more performant, more robust, and I'd say better architected. I'm not even sure it was thread-safe when Twitter was running on it.
You don't have to like Ruby or Rails, but you're really working off old data. I'm sure there's a breaking point in there somewhere, but I very much doubt most apps will hit in before going bust.
The CRuby VM, or the CRuby interpreter alone is at least 2-3x faster since Fail Whale time. And JIT doubles that to 4 - 6x. Rails itself also gotten 1.5x to 2x faster.
And then you have CPU that is 20 - 30x faster compared to 2009. SSD that is 100x - 1000x faster, Database that is much more battle tested and far easier to scale.
Sometimes I wonder, may be we could remake twitter with Rails again to see how well it goes.
My issue with Ruby (and Rails) has always been the "ball of mud" problem that I feel originates from its extensive use of syntactical sugar and automagic.
Rails can become a ball of mud as much as any other framework can.
It's not the fastest language, but it's faster than a lot of dynamic languages. Other than the lack of native types, you can manage pretty large rails apps easily. Chime, Stripe, and Shopify all use RoR and they all have very complex, high-scale financial systems.
The strength of your tool is limited to the person who uses the tool.
Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.
I haven’t seen a direct comparisons but I wouldn’t be surprised if Truffle Ruby was already faster than either elixir, erlang or php for single threaded CPU bound tasks too.
Of course that’s still way behind other languages but it’s still surprisingly good.
In my work I’ve seen that TruffleRuby codebases merging Ruby and Java libraries can easily keep pace with Go in terms of requests per second. Of course, the JVM uses more memory to do it. I mostly write Go code these days but Ruby is not necessarily slow. And it’s delightful to code in.
> Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.
Isn't that moving the goal post a lot?
We wen't from 'faster than a lot of others' to 'competing for worst in class'.
I'm not trying to be facetious, I'm curious as I often read "X is really fast" where X is a functional/OOP language that nearly always ends up being some combination of slow and with huge memory overhead. Even then, most Schemes (or Lisps in general) are faster.
Being faster single threaded against runtimes that are built specifically for multithreaded, distributed workloads is also perhaps not a fair comparison, esp. when both runtimes are heavily used to write webservers. And again, Erlang (et al) come out faster even in those benchmarks.
Is TruffleRuby production (eg. Rails) ready? If so, is it that much faster?
I remember when the infamous "Truffle beats all Ruby implementations"-article came out that a lot of Rubyists were shooting it down, however this was several years ago by now.
Moving the goal posts? Perhaps I misunderstand what you are asking.
Python is the not the worst in class scripting language. For example perl and TCL are both slower than python.
Originally you just asked, "such as" [which dynamic language ruby is faster than?]
Implying ruby is slower than every other dynamic language, which is not the case.
JRuby is faster than MRI Ruby for some Rails workloads and very much production ready.
Truffle Ruby is said to be about 97% compatible with MRI on the rubyspec but IMHO isn't production ready for Rails yet. It does work well enough for many stand alone non-rails tasks though and could potentially be used for running Sidekiq jobs.
The reason to mention the alternative ruby runtimes is to show that there's nothing about the language that means it can't improve in performance (within limits).
Whilst it's true that ruby is slower than Common Lisp or Scheme, ruby is still improving and the gap is going to greatly reduce, which is good news for those of us that enjoy using it.
Thank you for a great answer; I did not mean any ill will and apologize if that was how it came across.
Perl, Tcl, Smalltalk etc are basically non-existant from where I'm from, so they didn't occur to me.
Perhaps I'm projecting a lot here. I have worked a lot in high performance systems and am often triggered by claims of performance, eg. 'X is faster than C' when this is 99.9% of the times false by two orders of magnitude. This didn't happen here.
Java's Hotspot was originally designed for Smalltalk, and SELF.
Two very dynamic systems, designed for being a complete graphical workstation, Perl, Tcl, Python, Ruby were as originially implemented, not even close of the original Smalltalk JIT paper from Peter Deutsch's paper"Efficient Implementation of the Smalltalk-80 System." in 1984!
the ruby is faster than c is because of the yjit. they are moving a lot of c ruby standard library and core language stuff into ruby code so the yjit can optimize it better. akin to java and their bytecode being able to optimize things on the fly instead of just once at compile time.
Personally I use lightsail on AWS and cloudflare cause there is always an off ramp to try some of the fancy stuff but then you can always go back to just using cheap VMs behind cloudflare. You can also put it all behind a VPC and you can use CDK/CloudFormation so that’s also nice.
I gave up on using GCP even though the products like BigQuery are way better just because I got burned too many times like with the Google Domains -> Squarespace transition.
I’m thinking of switching back to a bare metal provider now like Vultr or DO (would love to know what people are using these days I haven’t used bare metal providers since ~2012).
Also, completely unrelated does anyone know what the best scraping proxy is these days for side projects (data journalism, archiving etc.)?
I think they define emergency a bit more widely than we are used to with other providers. For urgent change of router I was notified almost 2 months in advance.
For "real" unplaned emergencies I had in total like 5min of downtime last year, when some other router died.
I just wrote up a very similar comment. It’s really nice to see that there are other people who understand the limits of LLM in this hype cycle.
Like all the people surprised by Deepseek when it has been clear for the last 2 years there is no moat in foundation models and all the value is in 1) high quality data that becomes more valuable as the internet fills with AI junk 2) building the UX on top that will make specific tasks faster.
The argument has never changed the argument has always been the same.
LLMs do not think, they do not perform logic they are approximating thought. The reason why CoT works is because of the main feature of LLMs, they are extremely good at picking reasonable next tokens based on the context.
LLM are good and always have been good at three types of tasks:
- Closed form problems where the answer is in the prompt (CoT, Prompt Engineering, RAG)
- Recall from the training set as the Parameter space increases (15B -> 70B -> almost 1T now)
- Generalization and Zero shot tasks as a result of the first two (this is also what causes hallucinations which is a feature not a bug, we want the LLM to imitate thought not be a Q&A expert system from 1990)
If you keep being fooled by LLM thinking they are AGI after every impressive benchmark and everyone keeps telling you that in practice LLM are not good at tasks that are poorly defined, require niche knowledge, or require a special mental model that is on you.
I use LLM every day I speed up many tasks that would take 5-15 mins down to 10-120 seconds (worst case for re-prompts). Many times my tasks take longer than if I had done it myself because it’s not my work im just copying it. But overall I am more productive because of LLM.
Does LLM speeding up your work mean that LLM can replace Humans?
Personally I still don’t think LLM can replace Humans at the same level of quality because they are imitating thought not actually thinking. Now the question among the corporate overlords is will you reduce operating costs by XX% per year (wages) but reducing the quality of service for customers. The last 50 years have shown us the answer…
Other than that looks very cool, reading through the code now. A lot of these projects very heavily lean on Browser Use.