Hacker Newsnew | past | comments | ask | show | jobs | submit | rudedogg's commentslogin

A setup like this seems perfect for a little shed to code in that has solar power.

I've wanted to build something like Roald Dahl's writing shed: https://youtu.be/AsxTR09_iWE?t=294 for a while.

I live in a climate with cold winters though, so I hate to invest in something like this and not be able to use it for a significant part of the year. I guess I could put a small pellet or wood stove in it..


> If you want fast compile times then use a compiler w/ no optimization passes, done. The compile times are now linear w/ respect to lines of code & there is provably no further improvement you can make on that b/c any further passes will add linear or superlinear amount of overhead depending on the complexity of the optimization.

Umm this is completely wrong. Compiling involves a lot of stuff, and the language design, as well as compiler design can make or break them. Parsing is relatively easy to make fast and linear, but the other stuff (semantic analysis) is not. Hence why we have a huge range of compile times across programming languages that are (mostly) the same.


You see negativity, I see disappointment that OpenAI isn’t trying to innovate, and instead hoping they can replay Google Search’s history for themselves

This regresses the incentives back to what we had with search engines where what I need (answers) and what OpenAI needs (money from ads) are at odds.

Search engines used to be very useful too until the endless profit a/b testing boiled us all


Five hundred gajillion dollars spent so we can end up in the same place except with these five men making all the money instead of those five men. Whee.

Don't forget all the extra electricity required to achieve roughly the same thing!

It’s not just that, if asking the AI agent is going to give me the same experience google search gives me today, that’s just a few hundred billion dollars we could’ve just not spent.

There is a massive difference between an AI agent understanding the intent of my question, and keyword search on old (pre-enshittified) search engines.

Even if OpenAI needs to feed the VC beast, they will always be open source LLMs that can be used freely inside home-made search engines.


Arguabilly we are in worst place than when Google Search was not enshittified.

Google is a public company! You could have been one of "those five men".

Yes of course, let me buy a sizable chunk of one of the largest companies in the world so I can be on the five men.

Owning a few shares is not the same thing as actually making all the money someone at the top of Google is making.


It was started in a garage. George Bush didn't give them the company.

Is this a surprise though?

This is the culture of America in a nutshell. Steve Jobs was a weirdo in that regard and an outlier.


Who said anything about being surprised?

No, and no.

Agreed. One big difference, though, is that the local AI tech we have as an alternative to OpenAI is significantly better compared to the local alternatives we had for Google. You can run a reasonably powerful AI on your own machine right now. Sure, it’s not going to be as good. And the cost of GPUs, RAM, and electricity is important to keep in mind. But the point is it’s not all-or-nothing and you are not beholden to these corporations.

There is also plenty of research going on to make models more efficient and powerful at small sizes. So that shift in the power gradient seems like it’s going to continue.


I agree with this, but there were/are alternatives to Google that are functional but not as good. People still ended up choosing to use Google.

Even ads in magazines was much better than what we have now. Ads are contextual (a tech mag won’t have ads for gardening), so apart from the repetitive aspects, what is shown may be not needed, but it’s more likely to make a mental note because you’re already in the relevant context.

Ads in ChatGPT was the most obvious outcome from day 1.

And this is not a bad thing, otherwise you can only image how many businesses will close when google traffic stars to decline.

Everyone likes to hate on ads but the reality is that without ads 99% users even on hacker news would be jobless as the companies where they work will have no way to find clients, and even if they manage to find some - those clients won't be able to sell and will go out of business.


Ads haven't made it yet, they are charging money for the purchase made:

> Merchants pay a small fee on completed purchases, but the service is free for users, doesn’t affect their prices, and doesn’t influence ChatGPT’s product results.


Yes, so they’re being incentivized to highlight products which they get a cut of, and more strongly to highlight more expensive products.

This is called affiliate marketing and it’s toxic.


> Ads in ChatGPT was the most obvious outcome from day 1

Agreed.

Tech companies always do this. With Ads, we’re back into speculation territory, and the “how do we pay for and justify all this shit?” can gets kicked down the road.

Can’t we actually solve problems in the real world instead? Wouldn’t people be willing to pay if AI makes them more productive? Why do we need an ad-supported business model when the product is only $20/mo?


> Wouldn’t people be willing to pay if AI makes them more productive? Why do we need an ad-supported business model when the product is only $20/mo?

This was always a fake reasoning (ads are there because people want everything for free!), but then paid HBO started ads, your purchased smart TVs started ads, cars that you bought with money started ads...

([some business model] + ads) will simply always generate more profit than [some business model] (at least that's how they think). Even if you already pay, if they also shove some ads in your eyes, they can make even more money. Corporations don't work the way humans do. There is no "enough". The task of the CEO is to grow the company, make more profit each quarter and is responsible to the shareholders. It's not like, ok, now we can pay all our bills, we don't need more revenue. You always need maximum possible revenue.


there are other ways to be probably, without ads. I'm optimistic we, as a society, will find those ways.

I think ads are great, but the tactics (tracking) around them aren't really in the good course.

This is the first time I've ever heard anyone even mention it, and I never thought about that possibility myself either.

Since it's tapped into etsy and shopify, ChatGPT might actually have a lot more power to shop local if you give it that as a constraint

There is nothing "local" about etsy, and there has been for over ten years+. You can find all the same "handmade" products on AliExpress, and often Amazon.

Fine... Shopify then? There exist brick and mortar stores near me with Shopify sites.

Etsy is thoroughly fucked and full of mass-produced junk. "Local" could just mean buying from the nearest person who's reselling stuff from Ali Express.

And have you noticed what sellers on Amazon are doing? Foreign companies are setting up distribution in the US and registering their US companies with Amazon as "small businesses" and "minority-owned businesses", making those labels utterly useless.


A lot more power than what?

Like trusting the SEO from Googling "<Product> made near <my city>", or Amazon

Free market 101 - profits over needs

reminds me of the animation of google's results page progressively becoming more and more "sponsored links"

Ahem, The article is about ChatGPT check-out. Ecommerce and a relatively quick shows no mention of ads.

Sure, this may in Google-style-monopoly direction or an Amazon-style-monopoly direction. I don't know which. I would indeed expect a large dose of enshittification would be involved.

You're welcome to argue this leads to ads. But jumps to this is ads and getting a dozen pearl-clutching is a symptom of hn's own crude enshittification, jeesh.


I just ran this through a simple change I’ve asked Sonnet 4 and Opus 4.1, and it fails too.

It’s a simple substitution request where I provide a Lint error that suggests the correct change. All the models fail. I could ask someone with no development experience to do this change and they could.

I worry everyone is chasing benchmarks to the detriment of general performance. Or the next token weight for the incorrect change outweigh my simple but precise instructions. Either way it’s no good

Edit: With a followup “please do what I asked” sort of prompt it came through, while Opus just loops. So theres that at least


> I worry everyone is chasing benchmarks to the detriment of general performance.

I've been worried about this for a while. I feel like Claude in particular took a step back in my own subjective performance evaluation in the switch from 3.7 to 4, while the benchmark scores leaped substantially.

To be fair, benchmarking has always been the most difficult problem to solve in this space, so it's not surprising that benchmark development isn't exactly keeping pace with all of the modeling/training development happening.


Not that it was better at programming, but I really miss Sonnet 3.5 for educational discussions. I've sometimes considered that what I actually miss was the improvement 3.5 delivered over other models at that time. Though since my system message for Sonnet since 3.7 has been primarily instructing it to behave like a human and have a personality, I really think we lost something.

I still use 3.5 today in Cursor. It's still the best model they've produced for my workflow. It's twice as fast as 4 and doesn't vomit pointless comments all over my code.

> I worry everyone is chasing benchmarks to the detriment of general performance.

I’m not sure this is entirely what you’re driving at, but the example I always think of in my head is “I want an AI agent that will scan through my 20 to 30,000 photos, remove all the duplicates, then organize them all in some coherent fashion.” that’s the kind of service I need right now, and it feels like something AI should be able to do, yet I have not encountered anything that remotely accomplishes this task. I’m still using Dupe Guru and depending on the ref system to not scatter my stuff all over further.

Sidebar, if anybody has any recommendations for this, I would love to hear them lol


azure vision / "cognitive services" can do this for literally a few bucks

am i even on hacker news? how do people not know there are optimized models for specific use cases? not everything (nor should it) has to run through an LLM

https://azure.microsoft.com/en-us/pricing/details/cognitive-...


This is hardly the fluid, turn key solution I am talking about, so I don’t know why you’re talking like this to me and acting like the answer is so obvious. Frankly your tone was rude and unnecessary. Not everyone on HN shares the same knowledge and experience about all the same subjects, let alone all the ones you expect all of us to know.

The reality of that specific ask is it would not be difficult to build, but I believe it would be extremely difficult to build and offer at a price that users would pay for. So you're unlikely to find a commercial offering that does that using a (V)LM.

Yeah I imagine so. Hell I would pay like $100 for them to just do it once. If they really could do it with like 99% accuracy I would pay upwards of $300 tbh. Still, that’s probably not good enough lol

Hey bro, I'd like to take this project using Claude for $300 :) Do you mind contacting me? stxcth9aoj at mozmail.com

I made this as a first step in the process of organizing large amounts of images. Once you have the keywords and descriptions in the metadata, it should be possible to have a more powerful text only LLM come up with an organizing scheme and enact it by giving it file or scripting access via MCP. Thanks for reminding me that I need to work on that step now since local LLMs are powerful enough.

* https://github.com/jabberjabberjabber/ImageIndexer


Very cool, thanks for sharing!

Perceptual Hash. I have a Python script that does just this I did a million years ago: https://gist.github.com/base698/42d24be9309520fe8ad768844868...

I used it to match frames between different quality video streams. Operates on gray scale.


More like churning benchmarks... Release new model at max power, get all the benchmark glory, silently reduce model capability in the following weeks, repeat by releasing newer, smarter model.

That (thankfully) can't compound, so would never be more than a one time offset. E.g. if you report a score of 60% SWE-bench verified for new model A, dumb A down to score 50%, and report a 20% improvement over A with new model B then it's pretty obvious when your last two model blogposts say 60%.

The only way around this is to never report on the same benchmark versions twice, which they include too many to realistically do every release.


The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8. Sprinkle a bit of training on the benchmarks in and you can ensure higher scores for the next model. A perfect scam loop to keep the people happy until they wise up.

> The benchmarks are not typically ongoing, we do not often see comparisons between week 1 and week 8

You don't need to compare "A (Week 1)" to "A (Week 8)" to be able to show "B (Week 1)" is genuinely x% better than "A (Week 1)".


As I said sprinkle a bit of benchmarks polluting the training and you have your loop. Each iteration will be better at benchmarks if that's the goal and that goal/context reinforces.

Sprinkling in benchmark training isn't a loop, it's just plain cheating. Regardless, not all of these benchmarks are public and, even with mass collusion across the board, it wouldn't make sense only open weight LLMS have been improving.

At this point it would be an interesting idea, to collect examples, in a form of a community database, were LLMs miserably fail. I have examples myself...

Any such examples are often "closely guarded secrets" to prevent them from being benchmaxxed and gamed - which is absolutely what would happen if you consolidated them in a publicly available centralized repository.

Since such a database should evolve continuously, I wouldn't see that as a problem. The important thing is, that each example is somehow verifiable, in the form of a unmodifiable test setup. So the LLM provides a solution, which is executed against the test to verify. Something like ACID3 Tests... But sure it can be gamed somehow in probably all setups...

This seems like a non-issue, unless I'm misunderstanding. If failures can be used to help game benchmarks, companies are doing so. They don't need us to avoid compiling such information, which would be helpful to actual users.

People might want to use the same test scenario in the future to see how much the models have improved. We can't do that if the example gets scraped into the training data set.

That's what I was thinking too; the models have the same data sources (they have all scraped the internet, github, book repositories, etc), they all optimize for the same standardized tests. Other than marginally better scores in those tests (and they will cherry-pick them to make them look better), how do the various competitors differentiate from each other still? What's the USP?

LLM (the model) is not the agent (ClaudeCode) that uses LLMs.

LLMs improve slowly, but the agents are where the real value is produced: when should it write tests, when should it try to compile, how to move fwd from a compile error, can it click on your web app to test its own work, etc. etc.


Downvoted because you didn’t mention the prompt and the issue.

>It’s a simple substitution request where I provide a Lint error that suggests the correct change. All the models fail. I could ask someone with no development experience to do this change and they could.

I don't understand why this kind of thing is useful. Do the thing yourself and move on. For every one problem like this, AI can do 10 better/faster than I can.


How can I trust it to do the complicated task well when it fails to do the simple thing?

The jagged edge effect: you can trust it to do some tasks extremely well, but a slightly different task might consistently fail. Your job as a tool user is to understand when it’ll work and when it won’t - it isn’t an oracle or a human.

It's not about simple vs. complex. It's about the types of tasks the AI has been trained on: pattern-matching, thinking, reasoning, research.

Tasks like linting and formatting a block of code are pretty simple, but also very specialized. You're much better off using formatters/linters than an AI.


I want the bot to do the drudge work, not me. I want the bot to fix lint errors the linter can't safely autofix, not me.

You're talking about designing a kitchen where robots do the cooking and humans do ingredient prep and dishwashing. We prefer kitchens where we do the cooking and use tools or machines to prep and wash dishes.

I don't want it to be an "architect" or "designer". I want it to write the annoying boilerplate. I don't want it to do the coding and me to do the debugging, I want to code while it debugs. Anything else and you are the bot's assistant, not vice-versa.


An agent being tasked to resolve simple issues from a compiler/test suite/linter/etc is pretty typical use case. It's not clear in this example if the linter was capable of auto fixing the problem, so ordinarily this would be a case where you'd hope an LLM would shine given specific, accurate context and known solution.

One reason is to simply say “fix all lints” and have the model do it

You dont understand how complete unreliability is a problem?

So instead of just "doing things" you want a world where you try it ai-way, fail, then "do thing" 47 times in a row, then 3 ai-way saved you 5 minutes. Then 7 ai-way fail, then try to remember hmm did this work last time or not? ai-way fails another 3 times. "do thing" 3 times. How many ai-way failed today? oh it wasted 30% of the day and i forget which ways worked or not, i better start writing that all down. Lets call it the MAGIC TOME of incantations. oh i have to rewrite the tome again the model changed


Dang, I was really excited about this too.

I guess I'll either stick with sqlite-vec or give turso another look. I'm not fond of the idea of a SQLite fork though.

Do you know if anything else I should take a look at? I know you use a lot of this stuff for your open-source AI/ML stuff. I'd like something I can use on device.


You can point DuckDB at a SQLite file and it will read it using its special columnar format. I'm not sure if that's what you need, though.

If you look at even the Claude/OpenAI chat UIs, they kind of suck. Not sure why you think someone else can't/won't do it better. Yes, the big players will copy what they can, but they also need to chase insane growth and getting every human on earth paying for an LLM subscription.

A tool that is good for everyone is great for no one.

Also, I think we're seeing the limits on "value" of a chat interface already. Now they're all chasing developers since there's a real potential to improve productivity (or sadly cut-costs) there. But even that is proving difficult.


Zed isn’t special, I doubt Sublime Text has thousands of dependencies. It’s a language/culture problem.

Edit: Ghostty is a good counter-example that is open source. https://github.com/ghostty-org/ghostty/tree/main/pkg


Zed is closer to IntelliJ or VSCode than to Sublime Text.


In the amount of bloat, yes.


It is also important to note that this is not specific to Zed. As someone else have mentioned, it is a cultural problem. I picked Zed as an example because that is what I compiled the last time, but it is definitely not limited to Zed. There are many Rust projects that pull in over 1000 dependencies and they do much less than Zed.


Yeah tbh one time I had a Rust job and their back-end had like 700-800 dependencies.


Is this kind of an Elm replacement?


I think another split is between:

- people who have gone down the webview path, and know how difficult it is to do well

- people who have been told they can simply package their webapp into a native application

You can probably guess which group has more people


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: