Hacker News new | past | comments | ask | show | jobs | submit | candiddevmike's comments login

I think you could extrapolate it and say folks are primarily using GenAI for things they aren't considered a specialist in.

America really can't afford this right now. We spent _trillions_ on the last middle east operation.

Money has never been a problem for America; it can just create it as per its needs.

When every country lining up to buy US debt they can print as much money as they wish. Lately however that buying of the US debt does not seem as attractive as it used to be.

I don't think our current administration cares what we can afford

America can ALWAYS afford war.

We can totally afford it, DOGE deleted a bazillion dollars worth of waste, so we have plenty of money to burn on another crusade in the Middle East! /s

Maybe they're throwing them in the garbage and getting Macs? When the choice is leave the ecosystem or embrace 1984: the desktop, it seems like a worthwhile complaint. Microsoft burned everyone's trust by thinking they can charge money for an OS AND sell your data.

Apple also arbitrarily drops OS support on devices. On mobile devices they used to be the leading edge, but on desktop they end support faster than Microsoft does. Microsoft guaranteed 10 years of support when Windows 10 launched, but Apple is dropping support for their 2020 Intel Macs after this year. Plus, in terms of "1984", Apple's notarization system is as bad as it gets, and it's even harder to turn off than Windows Smartscreen.

Apple does have better speakers, though. So that's a nice plus.


Whoa there - the Apple ecosystem is also 1984: the desktop.

I have an old MacBook that even after a factory reset or a wipe and reinstall from scratch is unusably slow and can’t update to the latest OS, and so isn’t compatible with updates to software I use or new stuff I might be interested in -

Like come on. Let’s not pretend this is a M$ problem. Apple is just as greedy in terms of what is effectively planned obsolescence.


The problem doesn't seem to apply to other Apple devices. The iPhone 11 I've been using every day for the last 6 years is still working exactly the same as when it was new, all while still getting the newest OS updates.

Curious to hear what folks are doing with Gemini outside of the coding space and why you chose it. Are you building your app so you can swap the underlying GenAI easily? Do you "load balance" your usage across other providers for redundancy or cost savings? What would happen if there was ever some kind of spot market for LLMs?

In my experience, Gemini 2.5 Pro really shines in some non-coding use cases such as translation and summarization via Canvas. The gigantic context window and large usage limits help in this regard.

I also believe Gemini is much better than ChatGPT in generating deep research reports. Google has an edge in web search and it shows. Gemini’s reports draw on a vast number of sources, thus tend to be more accurate. In general, I even prefer its writing style, and I like the possibility of exporting reports to Google Docs.

One thing that I don’t like about Gemini is its UI, which is miles behind the competition. Custom instructions, projects, temporary chats… these things either have no equivalent in Gemini or are underdeveloped.


If you're a power user, you should probably be using Gemini through AI studio rather than the "basic user" version. That allows you to set system instructions, temperature, structured output, etc. There's also NotebookLM. Google seems to be trying to make a bunch of side projects based on Gemini and seeing what sticks, and the generic gemini app/webchat is just one of those.

My complaint is that any data within AI Studio can be kept by Google and used for training purposes — even if using the paid tier of the API, as far as I know. Because of that, I end up only using it rarely, when I don’t care about the fate of the data.

This is only true for the free tier. Paid Ai Studio users have strong privacy protections.

Can you elaborate on “paid” ? Because I honestly still have no idea if my usage of AI Studio is used for training purposes.

I have google workspace business standard, which comes with some pro AI features. Eg, Gemini chat clearly shows “Pro”, and says something like “chats in your organization won’t be used for training”. On AI Studio it’s not clear at all. I do have some version of paid AI services through Google, but no idea if it applies to AI studio. I did create some dummy Google cloud project which allowed me to generate api key, but afaik I still haven’t authorized any billing method.


Thank you for clarifying that. I’ve researched this once again and confirmed that Google treats all AI Studio usage as private if there’s at least one API project with billing enabled in an account.

for translation you'll still be limited for longer texts by the 65K output limit though I suppose?

Yes. I haven't had problems with the output limit so far, as I do translations iteratively, over each section of longer texts.

What I like the most about translating with Gemini is that its default performance is already good enough, and it can be improved via the one million tokens of the context window. I load to the context my private databases of idiomatic translations, separated by language pairs and subject areas. After doing that, the need for manually reviewing Gemini translations is greatly diminished.


I’ve found the 2.5 pro to be pretty insane at math. Having a lot of fun doing math that normally I wouldn’t be able to touch. I’ve always been good at math, but it’s one of those things where you have to do a LOT of learning to do anything. Being able to breeze through topics I don’t know with the help of AI and a good CAS + sympy and Mathematica verification lets me chew on problems I have no right to be even thinking about considering my mathematical background. (I did minor in math.. but the kinds of problems I’m chewing on are things people spend lifetimes working on. That I can even poke at the edges of them thanks to Gemini is really neat.)

I can throw a pile of NDAs at it and it neatly pulls out relevant stuff from them within a few seconds. The huge context window and excellent needle in a haystack performance is great for this kind of task.

The NIAH performance is a misleading indicator for performance on the tasks people really want the long context for. It's great as a smoke/regression test. If you're bad on NIAH, you're not gonna do well on the more holistic evals.

But the long context eval they used (MRCR) is limited. It's multi-needle, so that's a start, but its not evaluating long range dependency resolution nor topic modeling, which are the things you actually care about beyond raw retrieval for downstream tasks. Better than nothing, but not great for just throwing a pile of text at it and hoping for the best. Particularly for out-of-distribution token sequences.

I do give google some credit though, they didn't try to hide how poorly they did on that eval. But there's a reason you don't see them adding RULER, HELMET, or LongProc to this. The performance is abysmal after ~32k.

EDIT: I still love using 2.5 Pro for a ton of different tasks. I just tend to have all my custom agents compress the context aggressively for any long context or long horizon tasks.


> The performance is abysmal after ~32k.

Huh. We've not seen this in real-world use. 2.5 pro has been the only model where you can throw a bunch of docs into it, give it a "template" document (report, proposal, etc), even some other-project-example stuff, and tell it to gather all relevant context from each file and produce "template", and it does surprisingly well. Couldn't reproduce this with any other top tier model, at this level of quality.


We're a G-suite shop so I set aside a ton of time trying to get 2.5 pro to work for us. I'm not entirely unhappy with it, its a highly capable model, but the long context implosion significantly limits it for the majority of task domains.

We have long context evals using internal data that are leveraged for this (modeled after longproc specifically) and the performance across the board is pretty bad. Task-wise for us, it's about as real world as it gets, using production data. Summarization, Q&A, coding, reasoning, etc.

But I think this is where the in-distribution vs out-of-distribution distinction really carries weight. If the model has seen more instances of your token sequences in training and thus has more stable semantic representations of them in latent space, it would make sense that it would perform better on average.

In my case, the public evals align very closely with performance on internal enterprise data. They both tank pretty hard. Notably, this is true for all models after a certain context cliff. The flagship frontier models predictably do the best.


MRCR does go significantly beyond multi-needle retrieval - that's why the performance drops off as a function of context length. It's still a very simple task (reproduce the i^th essay about rocks), but it's very much not solved.

See contextarena.ai and the original paper https://arxiv.org/abs/2409.12640

It also seems to match up well with evals like https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/o...

The other evals you mention are not necessarily harder than this relatively simple one..


Sure. I didn't imply (or didn't mean to imply at least) that I thought MRCR was solved, only pointing out that it's closer to testing raw retrieval than it is testing long range dependency resolution like Longproc does. If retrieval is great but the model still implodes on the downstream task, the benchmark doesn't tell you the whole story. The intent/point of my original comment was that even the frontier models are nowhere near as good at long context tasks than what I see anecdotally claimed about them in the wild.

> The other evals you mention are not necessarily harder than this relatively simple one.

If you're comparing MRCR to for example Longproc, I do think the latter is much harder. Or at least, much more applicable to long-horizon task domains where long context accumulates over time. But I think it's probably more accurate to say its a more holistic, granular eval by comparison.

The tasks require the model to synthesize and reason over information that is scattered throughout the input context and across previously generated output segments. Additionally, the required output is lengthy (up to 8K tokens) and must adhere to a specific, structured format. The scoring is also more flexible than MRCR: you can use row-level F1 scores for tables, execution-based checks for code, or exact matches for formatted traces.

Just like NIAH, I don't think MRCR should be thrown out wholesale. I just don't think it can be pressed into the service of representing a more realistic long context performance measure.

EDIT: also wanted to note that using both types of evals in tandem is very useful for research and training/finetuning. If Longproc tanks and you dont have the NIAH/MRCR context, its hard to know what capabilities are regressing. So using both in a hybrid eval approach is valuable in certain contexts. For end users only trying to guage the current inference-time performance, I think evals like RULER and Longproc have a much higher value.


Right, the way I see it, MRCR isn't a retrieval task in the same vein as RULER. It’s less about finding one (or multiple) specific facts and more about piecing together scattered information to figure out the ordering of a set of relevant keys. Of course, it’s still a fairly simple challenge in the grand scheme of things.

LongProc looks like a fantastic test for a different but related problem, getting models to generate long answers. It seems to measure a skill the others don't. Meanwhile, RULER feels even more artificial than MRCR, since it's almost entirely focused on that simple "find the fact" skill.

But I think you're spot-on with the main takeaway, and the best frontier models are still struggling with long context. The DeepMind team points this out in the paper with that Pokemon example and the MRCR evaluation scores themselves.


Gemini Flash 2.0 is an absolute workhorse of a model at extremely low cost. It's obviously not going to measure up to frontier models in terms of intelligence but the combination of low cost, extreme speed, and highly reliable structured output generation make it really pleasant to develop with. I'll probably test against 2.5 Lite for an upgrade here.

I want to know what use cases you're using if for it it's not confidential.

We use it by having a Large Model delegate to Flash 2.0. Let's say you have a big collection of objects and a SOTA model identifies the need to edit some properties of one of them. Rather than have the Large Model perform a tool call or structured output itself (potentially slow/costly at scale), it can create a small summary of the context and change needed.

You can then provide this Flash 2.0 and have it generate the full object or diffed object in a safe way using the OpenAPI schema that Gemini accepts. The controlled generation is quite powerful, especially if you create the schema dynamically. You can generate an arbtirarily complex object with full typing, restrict valid values by enum, etc. And it's super fast and cheap and easily parallelizable. Have 100 objects to edit? No problem, send 100 simultaneous flash 2.0 calls. It's google, they can handle it.


I use it extensively for https://lexikon.ai - in particular one part of what Lexikon does involves processing large amounts of images, and the way Google charges for vision is vastly cheaper compared to the big alternatives (OpenAI, Anthropic)

Wow, if I knew that someone was using your product on my conversation with them I'd probably have to block them.

I mean I've copy pasted conversations and emails into ChatGPT as well, it often gives good advice on tricky problems (essentially like your own personalized r/AmITheAsshole chat). This service seems to just automate that process.

I use Gemini 2.5 Flash (non thinking) as a thought partner. It helps me organize my thoughts or maybe even give some new input I didn't think of before.

I really like to use it also for self reflection where I just input my thoughts and maybe concerns and just see what it has to say.


It basically made a university physics exam for me. It almost one-shot it as well. Just uploaded some exams from previous years together with a latex template and told it to make me a similar one. Worked great. Also made it do the solutions.

Simple unstructured to structured data transformation.

I find Flash and Flash Lite are more consistent than others as well as being really fast and cheap.

I could swap to other providers fairly easily, but don't intend to at this point. I don't operate at a large scale.


I use it for https://toolong.link Youtube summaries with images because only Gemini has easy access to YouTube and it has a gigantic context window

Turn local real estate agents websites to RSS to get new properties on the market before they get uploaded to real estate market place platforms.

I give it the HTML, it finds the appropriate selector for the property item and then I use a HTML to RSS tool to publish the feed


I've yet to run out of free image gen credits with Gemini, so I use it for any low-effort image gen like when my kids want to play with it or for testing prompts before committing my o4 tokens for better quality results.

Web scraping - creating semi-structured data from a wide variety of horrific HTML soups.

Absolutely do swap out models sometimes, but Gemini 2.0 Flash is the right price/performance mix for me right now. Will test Gemini 2.5 Flash-Lite tomorrow though.


Yes, we implemented a separate service internally that interfaces with an LLM and so the callers can be agnostic as to what provider or model is being used. Haven't needed to load balance between models though.

Low-latency LLM for my home automation. Anecdotally, Gemini was much quicker than OpenAI in responding to simple commands.

In general, when I need "cheap and fast" I choose Gemini.


I tried swapping for my project which involves having the LLM summarize and critique medical research and didn’t have great results. The prompt I found works best with the main LLM I use fucks up the intended format when fed to other LLMs. Thinking about refining prompts for each different llm but haven’t gotten there.

My favorite personal use of Gemini right now is basically as a book club. Of course it’s not as good as my real one but I often can’t them to read the books I want and Gemini is always ready when I want to explore themes. It’s often more profound than the book club too and seems a bit less likely to tunnel vision. Before LLMs I found exploring book themes pretty tedious, often I would have to wait a while to find someone who had read it but now I can get into it as soon as I’m done reading.


It's very good at automatically segmenting and recognizing handwritten and badly scanned text. I use it to make spreadsheets out of handwritten petitions.

Interesting article, but I can't get over the quoting of the author's own social media posts, it's really cringey.

We all only have so much time on this earth. Why is it fair for some folks to spend a disproportionate amount of that time toiling away for basic sustenance while others spend their days on their personal hotel sized yachts?

For the same reason that it's fair when one seed falls on asphalt and dies when another seed falls in shit and thrives: it isn't. Fairness turns out not to be one of the forces of the universe.

Wealth inequality isn't some random thing though, our government and economic system enables it. It's not like we have no control over it.

At the most fundamental level it is. Abilities of individuals varies wildly, which translates into productivity, and therefore wealth. As the economy becomes more globalized and more knowledge-based, differences in ability are magnified even more. A skilled antelope hunter can at best hunt 5x more than the median hunter, but someone who makes a killer app can make billions. Sure, government can play a role in redistributing that wealth, but that's an intervention, not the default state of things.

> Abilities of individuals varies wildly, which translates into productivity

Even if I accept your premise, it doesn’t explain why some places have far more wealth inequality than others despite having similar differences in abilities in those places. There might be some innate differences in abilities, but the magnification of those differences is socially constructed. It’s not a fact of nature.

You also are discounting luck. Some people are lucky. They were born with enormous inherited wealth. Or, their business happened to be in the right place in the right time.


so what, what's wrong with luck? you are pretending that inherited wealth came from nowhere. someone did it and they love their family more than randoms. so of course their family benefits.

why should i work hard for strangers who vilify me?


It’s at odds with meritocracy. We should reward competence and hard work, not membership in a dynasty.

Do you have kids? If so, are you going to send him/her to a private school, assuming you can afford it? What about stumping up money for extracurriculars or tutoring? Or if you're not really wealthy, what about giving immaterial aid like tutoring yourself? All of these things are "at odds with meritocracy", but that doesn't mean it's a slam dunk argument against them.

Government plays the interventionist role of enabling the killer app maker to make billions of dollars, rather than zero antelopes - or more realistically, a meager amount of food doing menial labor for the local warlord. That's the thing many app makers have seem to forgotten.

I'm coming from a libertarian perspective, so I'm certainly not trying to use that to justify no-exit totalitarian thinking. But it's still important to remember that base truth when analyzing the overall outcomes of our current system.


> Abilities of individuals varies wildly, which translates into productivity, and therefore wealth.

Show me literally any study that correlates the amount of work performed/the value of work/the ability of the worker with wealth. I'll wait.

I have read study after study after paper after paper, research on research, research verifying research, over and over, so many they have run utterly into a black ichor that issues from my eyes when people talk this brand of shit. The best predictor in the world of having wealth is being born into it. The second best is marrying into it. The third best is striking it lucky at the free market lottery, entry into which also requires some level of wealth and not a tiny amount of it either.


> Show me literally any study that correlates the amount of work performed/the value of work/the ability of the worker with wealth. I'll wait.

Not too long ago I dug a large hole, and then filled it back in again. It was very difficult and tiring, and entirely useless.

If you accept that I don’t deserve money for this, then you reject the premise that effort/work is the only factor determining value, and “utility” or value to others also matters


There’s no objective way to determine how much of a product’s utility is created by whom. For example, if I invent a thingamajig and hire people to build and sell it, how can we determine what percentage of the value comes from me, the workers, or the users who find new ways to use it? We can’t.

As a result, money gets distributed based on the relative power of those involved in the process. Business owners typically hold the most power, in-demand workers have some leverage, and others have less. So being rich doesn’t necessarily mean you’ve created a lot of value for others, it may just mean you’ve held positions of power.

Getting rid of these positions of power is the way to create a more equal and prosperous society.


>Show me literally any study that correlates the amount of work performed/the value of work/the ability of the worker with wealth. I'll wait.

This is trivially true if you accept the premise that "value of work" is the same as "amount paid", because the statement basically becomes "show me literally any study that correlates salary with wealth". However I suspect you reject the market wage as "value of work", and would rather have some subjective measure like "social value" or whatever. As imperfect market wage is, it's as objective of a measure as we can get, and letting people use whatever subjective measure they want will mean the argument will go nowhere because you can define your value function to whatever you want.

>I have read study after study after paper after paper, research on research, research verifying research, over and over, so many they have run utterly into a black ichor that issues from my eyes when people talk this brand of shit. The best predictor in the world of having wealth is being born into it. The second best is marrying into it. The third best is striking it lucky at the free market lottery.

My claim isn't that wealth right now is distributed 100% meritocratically, only that inequalities will emerge even if we somehow reset everyone's wealth, and therefore the claim that "Wealth inequality isn't some random thing" is incorrect.


> This is trivially true if you accept the premise that "value of work" is the same as "amount paid"

I do not even remotely accept your premise. A short list of jobs that are crucial to modern life that are chronically underpaid:

* Teachers

* Nursing/care staff

* Daycare workers

* Janitorial staff

* Delivery/logistics workers

FAR from an exhaustive list.

> However I suspect you reject the market wage as "value of work"

Considering how many working poor there are I'd say there's a solid reason for rejection. If people are working full time hours and still unable to meet their needs, clearly something is wrong.

> only that inequalities will emerge even if we somehow reset everyone's wealth, and therefore the claim that "Wealth inequality isn't some random thing" is incorrect.

This is an utter non-sequitur to anything I was talking about. You assert that value of work is tied to the wealth of the one doing the work. I challenged this by pointing out numerous whole categories of laborer that are and have been underpaid for some time. You assert that this is a subjective measurement. I don't know what to really say here.

If doing work that needs doing for the understood full time hours we as a society have stated is not a path to at least a stable life, if not a particularly luxurious one, then what's the point of working? And, more concerningly, why would anyone take up that job that being the case? Nurse and teacher retention right now is horrific specifically because the pay isn't very good and it's a very demanding job, and as a result we have a shortage of both. But we still need them.


>letting people use whatever subjective measure they want will mean the argument will go nowhere because you can define your value function to whatever you want.

I guess the question is is wealth inequality/income inequality something to be targeted? Also, there's the question of whether the living standards of the poorest improve faster if we target wealth/income inequality?

Empirically, the only working paths to wealth equality is that everyone is poor.

For a society to become wealthy, those who produce more wealth need to get to end up with more wealth.


Wealth inequality broadly follows the Pareto distribution, which is natural and does derive from randomness. We could define fairness as a flat distribution and redistribute accordingly, but that requires continual work to be done against the random Pareto distribution or it reverts. It's do-able, but it requires a long term consensus that doesn't currently exist.

There is no one pareto distribution — it is a family of distributions, with different parameters meaning different intercepts, and therefore different levels on inequality.

To some extent. If you vote in a rep and he lies about his policy, then the people are stuck with him for 2 or 6 years. We don't have mechanisms to impeach someone like that (some states might, but it's not a federal mechanism); his colleagues need to impeach him, which they hesitate to unless something dsmning occurs.

But that unfairness is itself based on forces of the universe, in this case: a seed can grow in shit, a seed cannot grow in asphalt.

To extend your metaphor, we have tons of the available "surface area" for people to fall on paved with asphalt, to suit the preferences of those sitting in shit. These are not fixed things. We placed the asphalt. We can tear it up, if we so choose to.


We have the surface area for the next generation, but since unconstrained life grows exponentially, it runs out within a few generations. Isn't some kind of homeostasis preferable to repeated booms and busts?

If unconstrained life grows exponentially why is it every developed country is having a birthrate crisis? It seems most populations tend to naturally stabilize once parents realize they don't need to have like 7 kids hoping 2 or 3 survive childhood.

When someone asks how something is fair - coming back with life is like that or life isn't fair is not a valid response. Humanity should strive to make the systems as fair as possible while accepting the fact that unfairness will still exist. Why will theft etc be a crime if not for the idea of fairness. You can make the same life is unfair argument to defend theft but that's not the way it should be / is.

>Humanity should strive to make the systems as fair as possible while accepting the fact that unfairness will still exist.

The standard argument against this is that "inequality is a good thing because it leads to innovation" or whatever.


So the question is "why is life unfair?"

Or is it "why do bad beginnings with lots of drudgery not lead to yacht ownership?"


It’s not fair, just like most of life due to the genes/parents/geography/etc you are born to.

I don't think open sourcing is going to fix their adoption issue. Like the other comments mention, you need to be worth the time investment to gain traction. If Dark was truly as revolutionary as it was marketed as, it wouldn't have had problems staying source available, IMO. Folks will pay or put up with whatever it takes to be in the ecosystem (such as CUDA).

I agree it won't fit it, but IMO it will remove one of the barriers to adoption. The problem with doing something revolutionary, is that it's only going to be revolutionary in some ways, and it has to compete with things that are mature in ways you are not. And the original version (now called Darklang-Classic) was quite immature in an awful lot of ways that made it difficult to build on.

That's being addressed with the new version of course!


Why did you change your license to Apache? You folks were big proponents of source available licensing, you even worked with Heather on your original one AFAIK.

Some reasoning is laid out here[0], but I'm curious to hear more too.

[0]: https://blog.darklang.com/darklang-goes-open-source/


Source-available was a hurdle for adoption, and maybe it would have been a problem had we hit it big, but for the last few years we've wanted to get rid of it.

IPOs are probably one of the worst gambles retail investors can make. Almost all of the financials are juiced, and more often than not the stock tanks in the near term post-IPO. You can't really make an informed buying decision, best to let the dust settle.

All value has been extracted by VC before the IPO. All the IPO does is dump the stock on institutions and retail investors.

This is trivially proven false, at least as a generalization. Looking at recent-ish IPOs:

Spotify: up 380% since IPO

Service Now: 4,000%

Shopify: 685%

Meta: 1,600%


There are at least 150-250 IPOs every year, good luck picking winners like those four.

If https://site.warrington.ufl.edu/ritter/files/IPOs-VC-backed.... is right, the last time there were 150+ tech IPOs in a year was 2000 (1992-2000 were all anomalously high). 2021 is the only year to break 100 since then, and only three other years cracked 50.

um, perhaps it's worth reviewing the stock charts for MAANG whose early investors extracted a tiny % of the their total marketcaps.

The subtlety is that you can buy shares on the open market post-IPO, so when you buy "pre-IPO" you're just getting the one-time "pop."


All that I've seen with AI in the workplace is my coworkers becoming dumber. Asking them technical questions they should know the answer to has turned into "let me LLM that for you" and giving me an overly broad response. It's infuriating. I've also hilariously seen this in meetings, where folks are being asked something and awkwardly fill time while they wait for a response which they try and not read word for word.

That's totally normal actually. When you ask, you have to tell them "Think through this step by step."

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: