More

faizshah · 2025-03-24T00:50:48 1742777448

You should probably change the name, I thought it was associated with Manus usually projects like this will specifically have in the tag line “An Open Source alternative to X.”

Other than that looks very cool, reading through the code now. A lot of these projects very heavily lean on Browser Use.

spiderfarmer · 2025-03-24T05:46:43 1742795203

Branding and UX are very under appreciated in the OS community.

faizshah · 2025-03-16T05:13:00 1742101980

Same is true for remote work. All the engineers know the return to work policies are dumb but all the decision makers have decided we’re all wrong.

jes5199 · 2025-03-16T05:30:00 1742103000

then why don’t they co-locate teams when they get RTO’d? I keep hearing about people who have to go sit in a mandatory hot desk but are still stuck on Zoom all day. Seems like the worst of both worlds

klodolph · 2025-03-16T05:42:18 1742103738

It’s ordinary corporate dysfunction. The mandates come top-down. People in management don’t think too hard about exceptions. The people making decisions are far-removed from the consequences of their decisions.

mulmen · 2025-03-16T08:02:49 1742112169

It’s not really an exception though. These are the same people who spent the last 20 years singing the praises of offshoring and follow-the-sun. It’s just trend chasing.

Honestly I think the mistake we made was calling it “work from home” instead of “telecommuting”.

hosteur · 2025-03-16T11:51:16 1742125876

> Honestly I think the mistake we made was calling it “work from home” instead of “telecommuting”.

I am curious. Why do you think calling it telecommuting would have made any significant difference? And what difference would it have made? Where do you imagine we would be today if more people referred to it with that word?

0hijinks · 2025-03-16T14:19:09 1742134749

My drive-by opinion: "telecommuting" has an advantage in optics/marketing and flexibility over "work from home" for both business leaders and employees. If I tell a board of directors or shareholders that "80% of our workforce performs some fraction of their weekly tasks ______", I imagine the following:

- "from home" elicits images of relaxation and lost productivity, while "via telecommuting" sounds like a commute still takes place and work is just as productive

- "from home" sounds like retreating to a comfort zone, while "via telecommuting" sounds like embracing a new technology or skill

- "from home" sounds like remote workers ought to be performing their tasks from their domicile only, while "via telecommuting" sounds like remote workers can do their work wherever they are

If businesses had adopted "telecommuting" terminology, I believe business leaders would not feel obligated to push back in order to regain productivity. I think it's easier to attack the trend of WFH given the points above. I actually agree with the proposal that WFH is a weak terminology, but had never sat down and thought about it before.

From the perspective of initial adoption, I think it would have happened just as fast. Workers were thrust into remote work arrangements during COVID, and everyone would have quickly gotten the gist of what "telecommuting" means, so it would have been the new buzzword to attract talent in job listings just as "remote" or "work from home" have been. Just without the downsides in CEO perception.

dasil003 · 2025-03-16T19:00:36 1742151636

The only problem is that telecommuting is not a new buzzword, it was all the hype in 90s/00s, which is probably reason enough to pick a new word.

mulmen · 2025-03-16T20:08:23 1742155703

Yes and business leaders ate it up. They only pushed back when it was rebranded as “work from home”.

from-nibly · 2025-03-16T14:58:12 1742137092

RTOs generally have nothing to do with any of the things they say. They are just layoffs.

You can't argue with them about the effectiveness of remote work. They aren't trying to optimize work. They are trying to fire people.

Working from home doesn't fire people, being more productive and happy doesn't fire people. Your mental well being doesn't have any bearing on how many people they need to fire.

__loam · 2025-03-16T05:18:26 1742102306

Based on their own gut.

faizshah · 2025-03-07T09:10:23 1741338623

This is awesome, there’s a really nice one in python called prompt toolkit that has some a nice api as well: https://python-prompt-toolkit.readthedocs.io/en/master/

faizshah · 2025-03-06T03:06:04 1741230364

Anyone got a rough estimate of the cost of this setup?

I’m guessing it’s under 10k.

I also didn’t see tokens per second numbers.

ynniv · 2025-03-06T03:12:36 1741230756

It better be! AMD @ $2k: https://digitalspaceport.com/how-to-run-deepseek-r1-671b-ful...

aurareturn · 2025-03-06T03:49:54 1741232994

This article keeps getting posted but it runs a thinking model at 3-4 tokens/s. You might as well take a vacation if you ask it a question.

It’s a gimmick and not a real solution.

hnuser123456 · 2025-03-06T04:17:58 1741234678

If you value local compute and don't need massive speed, that's still twice as fast as most people can type.

aurareturn · 2025-03-06T05:33:12 1741239192

Human typing speed is magnitudes slower than our eyes scanning for the correct answer.

ChatGPT o3 mini high thinks at about 140 tokens/s by my estimation and I sometimes wish it can return answers quicker.

Getting a simple prompt answer would take 2-3 minutes using the AMD system and forget about longer context.

evilduck · 2025-03-06T14:40:01 1741272001

Reasoning models spend a whole bunch of time reasoning before returning an answer. I was toying with QWQ 32B last night and ran into one question I gave it where it spent 18 minutes at 13tok/s in the <think> phase before returning a final answer. I value local compute but reasoning models aren’t terribly feasible at this speed since you don’t really need to see the first 90% of their thinking output.

miklosz · 2025-03-06T05:42:29 1741239749

Exactly! I run it on my old T7910 Dell workstation (2x 2697A V4, 640GB RAM) that I build for way less than a $1k. But so what, it's about ~2 tokens / s. Just like you said, it's cool that it's run at all, but that's it.

walrus01 · 2025-03-06T04:25:37 1741235137

It's meant to be a test/development setup for people to prepare the software environment and tooling for running the same on more expensive hardware. Not to be fast.

aurareturn · 2025-03-06T05:34:56 1741239296

I remember people trying to run the game Crysis using CPU rendering. They got it to run and move around. People did it for fun and the "cool" factor. But no one actually played the game that way.

It's the same thing here. CPUs can run it but only as a gimmick.

refulgentis · 2025-03-06T05:42:24 1741239744

> It's the same thing here. CPUs can run it but only as a gimmick.

No, that's not true.

I work on local inference code via llama.cpp, on both GPU and CPU on every platform, and the bottleneck is much more ram / bandwidth than compute.

Crappy Pixel Fold 2022 mid-range Android CPU gets you roughly same speed as 2024 Apple iPhone GPU, with Metal acceleration that dozens of very smart people hack on.

Additionally, and perhaps more importantly, Arc is a GPU, not a CPU.

The headline of the thing you're commenting on, the very first thing you see when you open it, is "Run llama.cpp Portable Zip on Intel GPU"

Additionally, the HN headline includes "1 or 2 Arc 7700"

aurareturn · 2025-03-06T05:46:57 1741240017

It's both compute and bandwidth constrained - just like trying to run Crysis on CPU rendering.

A770 has 16GB of RAM. You're shuffling data to the GPU at a rate of 64GB/s, which is magnitudes slower than the internal VRAM of the GPU. Hence, this setup is memory bandwidth constrained.

However, once you want to use it to do anything useful like a longer context size, the CPU compute will be a huge bottleneck for time-to-first-token as well as tokens/s.

Trying to run a model this large, and a thinking one at that, on CPU RAM is a gimmick.

refulgentis · 2025-03-06T06:04:46 1741241086

Okay, let's stipulate LLMs are compute and bandwidth sensitive (of course!)...

#1, should highlight it up front this time: We are talking about _G_PUs :)

#2 You can't get a single consumer GPU that has enough memory to load a 670B parameter model, there's some magic going on here. It's notable and distinct. This is probably due to FlashMoE, given it's prominence in the link.

TL;Dr: 1) these are Intel _G_PUs, and 2) it is a remarkable distinct achievement to be loading a 670B parameter model on only one to two cards

aurareturn · 2025-03-06T06:14:03 1741241643

1) This system mostly uses normal DDR RAM, not GPU VRAM.

2) M3 Ultra can load Deepseek R1 671B Q4.

Using a very large LLM across the CPU and GPU is not new. It's been done since the beginning of local LLMs.

xoranth · 2025-03-06T07:56:52 1741247812

> Crappy Pixel Fold 2022 mid-range Android CPU

Can you share what LLMs do you run on such small devices/what user case they address?

(Not a rhetorical question, it's just that I see a lot of work on local inference for edge devices with small models, but I could never get a small model to work for me. So I'm curious about other people's user cases.)

refulgentis · 2025-03-06T20:42:47 1741293767

Excellent and accurate q. You sound like the first person I've talked to who might appreciate full exposition here, apologies if this is too much info. TL;DR is you're def not missing anything, and we're just beginning to turn a corner and see some rays of light of hope, where it's a genuine substitute for remote models in consumer applications.

#1) I put a lot of effort into this and, quite frankly, it paid off absolutely 0 until recently.

#2) The "this" in "I put a lot of effort into this", means, I left Google 1.5 years ago and have been quietly building an app that is LLM-agnostic in service of coalescing a lot of nextgen thinking re: computing I saw that's A) now possible due to LLMs B) was shitcanned in 2020, because Android won politically, because all that next-gen thinking seemed impossible given it required a step change in AI capabilities.

This app is Telosnex (telosnex.com).

I have a couple stringent requirements I enforce on myself, it has to run on every platform, and it has to support local LLMs just as well as paid ones.

I see that as essential for avoiding continued algorithmic capture of the means of info distribution, and believe on a long enough timeline, all the rushed hacking people have done to llama.cpp to get model after model supported will give away to UX improvements.

You are completely, utterly, correct to note that the local models on device are, in my words, useless toys, at best. In practice, they kill your battery and barely work.

However, things did pay off recently. How?

#1) llama.cpp landed a significant opus of a PR by @ochafik that normalized tool handling across models, as well as implemented what the models need individually for formatting

#2) Phi-4 mini came out. Long story, but tl;dr: till now there's been various gaping flaws with each Phi release. This one looked absent of any issues. So I hack support for its tool vagaries on top of what @ochafik landed, and all of a sudden I'm seeing the first local model sub-Mixtral 8x7B that's reliably handling RAG flows (i.e. generate search query, then, accept 2K tokens of parsed web pages and answer a q following directions I give you) and tool calls (i.e. generate search query, or file operations like here: https://x.com/jpohhhh/status/1897717300330926109)

utopcell · 2025-03-06T03:23:55 1741231435

What a teaser article! All this info for setting up the system, but no performance numbers.

yvdriess · 2025-03-06T10:36:08 1741257368

That's because the OP is linking to the quickstart guide. There are benchmark numbers on the github's root page, but it does not appear to include the new deepseek yet:

https://github.com/intel/ipex-llm/tree/main?tab=readme-ov-fi...

utopcell · 2025-03-06T15:40:56 1741275656

Am I missing something ? I see a lot of the small-scale models results but no results for DeepSeek-R1-671B-Q4_K_M on their github repos.

faizshah · 2025-03-01T11:30:36 1740828636

There’s a good chance the reason this is public is because he was using gist to transfer ChatGPT code between his personal computer and his government computer.

elif · 2025-03-01T13:13:34 1740834814

Good catch this is actually my workflow when I code on my phone in the hot tub.

kubectl_h · 2025-03-01T15:23:01 1740842581

Probably true. I work in a regulated space and I have done this to get my zsh config to a secure laptop, but I have always had the common sense to mark the gist as hidden (can still be accessed if you know the hash, obvs).

jerpint · 2025-03-01T12:30:34 1740832234

Interesting, that would be a pretty blunder

xiaoyu2006 · 2025-03-01T15:44:39 1740843879

Probably. By the time I check this it's already gone.

faizshah · 2025-02-13T02:54:03 1739415243

I think ruby can get popular again with the sort of contrarian things Rails is doing like helping developers exit Cloud.

There isn’t really a much more productive web dev setup than Rails + your favorite LLM tool. Will take time to earn Gen Z back to Rails though and away from Python/TS or Go/Rust.

jimmaswell · 2025-02-13T05:15:21 1739423721

My impression is that a Rails app is an unmaintainable dynamically-typed ball of mud that might give you the fast upfront development to get to a market or get funded but will quickly fall apart at scale, e.g. Twitter fail whale. And Ruby is too full of "magic" that quickly makes it too hard to tell what's going on or accidentally make something grossly inefficient if you don't understand the magic, which defeats the point of the convenience. Is this perception outdated, and if so what changed?

nirvdrum · 2025-02-13T09:00:44 1739437244

If the the Twitter fail whale is your concern, then your perception is outdated. Twitter started moving off Ruby in 2009. Both the CRuby VM and Rails have seen extensive development during that decade and a half.

I never worked at Twitter, but based on the timeline it seems very likely they were running on the old Ruby 1.8.x line, which was a pure AST interpreter. The VM is now a bytecode interpreter that has been optimized over the intervening years. The GC is considerably more robust. There's a very fast JIT compiler included. Many libraries have been optimized and bugs squashed.

If your concern is Rails, please note that also has seen ongoing development and is more performant, more robust, and I'd say better architected. I'm not even sure it was thread-safe when Twitter was running on it.

You don't have to like Ruby or Rails, but you're really working off old data. I'm sure there's a breaking point in there somewhere, but I very much doubt most apps will hit in before going bust.

ksec · 2025-02-13T15:14:09 1739459649

The CRuby VM, or the CRuby interpreter alone is at least 2-3x faster since Fail Whale time. And JIT doubles that to 4 - 6x. Rails itself also gotten 1.5x to 2x faster.

And then you have CPU that is 20 - 30x faster compared to 2009. SSD that is 100x - 1000x faster, Database that is much more battle tested and far easier to scale.

Sometimes I wonder, may be we could remake twitter with Rails again to see how well it goes.

caiusdurling · 2025-02-13T16:29:46 1739464186

> Sometimes I wonder, may be we could remake twitter with Rails again to see how well it goes.

Mastodon is written in Ruby on Rails (:

johnmaguire · 2025-02-13T16:41:24 1739464884

Maybe not the best testimonial. From what I've heard, Mastodon is a bit of a beast to scale. While some of this is probably due to ActivityPub (a la https://lucumr.pocoo.org/2022/11/14/scaling-mastodon/) itself, some of it may be related to Ruby's execution model: https://lukas.zapletalovi.com/posts/2022/why-mastodon-instan...

My issue with Ruby (and Rails) has always been the "ball of mud" problem that I feel originates from its extensive use of syntactical sugar and automagic.

m00x · 2025-02-13T05:27:23 1739424443

Rails can become a ball of mud as much as any other framework can.

It's not the fastest language, but it's faster than a lot of dynamic languages. Other than the lack of native types, you can manage pretty large rails apps easily. Chime, Stripe, and Shopify all use RoR and they all have very complex, high-scale financial systems.

The strength of your tool is limited to the person who uses the tool.

amomchilov · 2025-02-13T05:36:58 1739425018

The unrefactorable ball of mud problem is real, which is why both Stripe and Shopify have highly statically typed code bases (via Sorbet).

Btw Stripe uses Ruby, but not Rails.

byroot · 2025-02-13T06:54:55 1739429695

I'd say sorbet largely adds to the mud, but to each their own.

m00x · 2025-02-14T21:02:27 1739566947

Some Stripe services are Rails.

Having types helps, but it's not a necessity. When I was at Chime we faired very well with just rails and no types.

weaksauce · 2025-02-13T20:03:56 1739477036

it's faster than python in some tests that i've seen.

taurknaut · 2025-02-13T15:36:05 1739460965

> Rails can become a ball of mud as much as any other framework can.

...but rails can fit this dysfunction on a single slide ;)

fredrikholm · 2025-02-13T07:10:49 1739430649

> It's not the fastest language, but it's faster than a lot of dynamic languages.

Such as?

IME Ruby consistently fall behind, often way behind, nearly all popular languages in "benchmark battles".

Lio · 2025-02-13T07:56:37 1739433397

Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.

I haven’t seen a direct comparisons but I wouldn’t be surprised if Truffle Ruby was already faster than either elixir, erlang or php for single threaded CPU bound tasks too.

Of course that’s still way behind other languages but it’s still surprisingly good.

relistan · 2025-02-13T08:38:03 1739435883

In my work I’ve seen that TruffleRuby codebases merging Ruby and Java libraries can easily keep pace with Go in terms of requests per second. Of course, the JVM uses more memory to do it. I mostly write Go code these days but Ruby is not necessarily slow. And it’s delightful to code in.

fredrikholm · 2025-02-13T09:09:49 1739437789

> Python? Ruby with YJIT, JRuby or Truffle Ruby usually beats python code in benchmarks.

Isn't that moving the goal post a lot?

We wen't from 'faster than a lot of others' to 'competing for worst in class'.

I'm not trying to be facetious, I'm curious as I often read "X is really fast" where X is a functional/OOP language that nearly always ends up being some combination of slow and with huge memory overhead. Even then, most Schemes (or Lisps in general) are faster.

Being faster single threaded against runtimes that are built specifically for multithreaded, distributed workloads is also perhaps not a fair comparison, esp. when both runtimes are heavily used to write webservers. And again, Erlang (et al) come out faster even in those benchmarks.

Is TruffleRuby production (eg. Rails) ready? If so, is it that much faster?

I remember when the infamous "Truffle beats all Ruby implementations"-article came out that a lot of Rubyists were shooting it down, however this was several years ago by now.

Lio · 2025-02-13T11:47:39 1739447259

Moving the goal posts? Perhaps I misunderstand what you are asking. Python is the not the worst in class scripting language. For example perl and TCL are both slower than python.

Originally you just asked, "such as" [which dynamic language ruby is faster than?] Implying ruby is slower than every other dynamic language, which is not the case.

JRuby is faster than MRI Ruby for some Rails workloads and very much production ready.

Truffle Ruby is said to be about 97% compatible with MRI on the rubyspec but IMHO isn't production ready for Rails yet. It does work well enough for many stand alone non-rails tasks though and could potentially be used for running Sidekiq jobs.

The reason to mention the alternative ruby runtimes is to show that there's nothing about the language that means it can't improve in performance (within limits).

Whilst it's true that ruby is slower than Common Lisp or Scheme, ruby is still improving and the gap is going to greatly reduce, which is good news for those of us that enjoy using it.

fredrikholm · 2025-02-13T11:57:12 1739447832

Thank you for a great answer; I did not mean any ill will and apologize if that was how it came across.

Perl, Tcl, Smalltalk etc are basically non-existant from where I'm from, so they didn't occur to me.

Perhaps I'm projecting a lot here. I have worked a lot in high performance systems and am often triggered by claims of performance, eg. 'X is faster than C' when this is 99.9% of the times false by two orders of magnitude. This didn't happen here.

Thank you for taking the time to answer.

pjmlp · 2025-02-13T21:51:20 1739483480

Java's Hotspot was originally designed for Smalltalk, and SELF.

Two very dynamic systems, designed for being a complete graphical workstation, Perl, Tcl, Python, Ruby were as originially implemented, not even close of the original Smalltalk JIT paper from Peter Deutsch's paper"Efficient Implementation of the Smalltalk-80 System." in 1984!

Lio · 2025-02-13T12:26:19 1739449579

> I did not mean any ill will and apologize if that was how it came across.

Oh not at all, no I didn't think that. I'm enjoying the conversation.

It's interesting that you mention Smalltalk as I believe that some of the JIT ideas we're seeing in YJIT are borrowed from there.

As for all the "faster than C" talk here is very specific to ruby (or JIT'd) runtimes and overheads only in that context.

I think it gets mentioned because it seems so counter intuitive at first. It's not to imply C isn't orders of magnitude faster in general.

Along with the new out of the box features of Rails 8, the work on Ruby infrastructure is making it an exciting technology to work with again (IMHO).

weaksauce · 2025-02-13T20:07:23 1739477243

the ruby is faster than c is because of the yjit. they are moving a lot of c ruby standard library and core language stuff into ruby code so the yjit can optimize it better. akin to java and their bytecode being able to optimize things on the fly instead of just once at compile time.

faraaz98 · 2025-02-13T06:06:46 1739426806

Twitter fail whale was more skill issue that Rails shortcomings. If you read the book Hatching Twitter, you'll know quickly they weren't great at code

taurknaut · 2025-02-13T15:35:08 1739460908

This is actually pretty accurate, except ruby is just slower, not randomly fragile at scale.

faizshah · 2025-02-04T04:53:49 1738644829

Personally I use lightsail on AWS and cloudflare cause there is always an off ramp to try some of the fancy stuff but then you can always go back to just using cheap VMs behind cloudflare. You can also put it all behind a VPC and you can use CDK/CloudFormation so that’s also nice.

I gave up on using GCP even though the products like BigQuery are way better just because I got burned too many times like with the Google Domains -> Squarespace transition.

I’m thinking of switching back to a bare metal provider now like Vultr or DO (would love to know what people are using these days I haven’t used bare metal providers since ~2012).

Also, completely unrelated does anyone know what the best scraping proxy is these days for side projects (data journalism, archiving etc.)?

smatija · 2025-02-04T05:59:37 1738648777

Hetzner is good to me, but I am EU based.

Loving that German logic that even emergency maintenace has 2-week notification.

bingo-bongo · 2025-02-04T07:40:37 1738654837

But.. that’s just maintenance?

smatija · 2025-02-04T07:58:18 1738655898

I think they define emergency a bit more widely than we are used to with other providers. For urgent change of router I was notified almost 2 months in advance.

For "real" unplaned emergencies I had in total like 5min of downtime last year, when some other router died.

choilive · 2025-02-04T05:02:18 1738645338

Been using OVH for bare metal for a few years now and no major hiccups other than scheduled maintenance.

wahnfrieden · 2025-02-04T05:30:15 1738647015

OVH VPS in Canada is also a great deal

faizshah · 2025-01-28T17:39:33 1738085973

I just wrote up a very similar comment. It’s really nice to see that there are other people who understand the limits of LLM in this hype cycle.

Like all the people surprised by Deepseek when it has been clear for the last 2 years there is no moat in foundation models and all the value is in 1) high quality data that becomes more valuable as the internet fills with AI junk 2) building the UX on top that will make specific tasks faster.

faizshah · 2025-01-28T17:32:18 1738085538

The argument has never changed the argument has always been the same.

LLMs do not think, they do not perform logic they are approximating thought. The reason why CoT works is because of the main feature of LLMs, they are extremely good at picking reasonable next tokens based on the context.

LLM are good and always have been good at three types of tasks:

- Closed form problems where the answer is in the prompt (CoT, Prompt Engineering, RAG)

- Recall from the training set as the Parameter space increases (15B -> 70B -> almost 1T now)

- Generalization and Zero shot tasks as a result of the first two (this is also what causes hallucinations which is a feature not a bug, we want the LLM to imitate thought not be a Q&A expert system from 1990)

If you keep being fooled by LLM thinking they are AGI after every impressive benchmark and everyone keeps telling you that in practice LLM are not good at tasks that are poorly defined, require niche knowledge, or require a special mental model that is on you.

I use LLM every day I speed up many tasks that would take 5-15 mins down to 10-120 seconds (worst case for re-prompts). Many times my tasks take longer than if I had done it myself because it’s not my work im just copying it. But overall I am more productive because of LLM.

Does LLM speeding up your work mean that LLM can replace Humans?

Personally I still don’t think LLM can replace Humans at the same level of quality because they are imitating thought not actually thinking. Now the question among the corporate overlords is will you reduce operating costs by XX% per year (wages) but reducing the quality of service for customers. The last 50 years have shown us the answer…

faizshah · 2025-01-21T03:10:08 1737429008

It was surreal seeing Fox News have polymarket bets being taken on what Trump would say side by side with him speaking.

And I don’t even know what to say about the president and First Lady releasing official meme coins…

insane_dreamer · 2025-01-21T18:19:14 1737483554

> I don’t even know what to say about the president and First Lady releasing official meme coins…

wait, this actually happened?

thesuperbigfrog · 2025-01-21T19:02:17 1737486137

https://www.forbes.com/sites/siladityaray/2025/01/21/trump-a...

https://www.axios.com/2025/01/21/trump-meme-coin-crypto