Hacker Newsnew | past | comments | ask | show | jobs | submit | aprilthird2021's commentslogin

Tracks very well with something I saw recently, that the biggest fans of and users of generative AI for writing (and in your case music) are people who want to write a book but never got around to doing it, not people who want to read and pay for books

Yes, I have no commercial interest when it comes to music, it's just something I find joy in. Using Suno did not detract from that joy. If music is a hobby, Suno is an incredibly fun tool.

DJ's and producers have been getting hate for years. "It's just a guy with a laptop on stage", "he isn't really playing those instruments", etc. Or think of a band leader, someone who composes but doesn't actually play the indiviudual part. I tried thinking of Suno this way and it helped ease whatever "guilt" I had about my own creative integrity.


So long as the music is just for you and never published, I wish you well

> that the biggest fans of and users of generative AI for writing (and in your case music) are people who want to write a book but never got around to doing it, not people who want to read and pay for books

That makes sense right? At the advent of computer DAWs, the biggest fans and users wasn't people listening to music, but people who want to make music. Production tools are indeed meant for people producing things, not the consumers, as it should be :)


“I never got around to it” is a useful filter. Actual artists are able to pass through it because they are driven to do the work.

If some retired lawyer wants to “write” a novel, good for them I guess. But AI is not the only reason it won’t be worth reading. The other problem is that the “writer” is actually just a reader. Consuming and producing are totally different.


It's very odd reading this because, to me, the Beats were never regarded well by authority figures, teachers, or other established credentialiers of literature that you interact with as a kid. They were seen as comic books, video games, etc. Junk for people who like junk.

I think the appeal of them was never that they had great, enviable lives. Ginsberg's famous refrain is that he saw the best minds of his generation destroyed by madness. Doesn't that resonate so much with young people, especially today, who have all the acumen, follow all the rules, and end up priced out of any kind of normal middle class life? Sure it's not the same thing the Beats faced, but isn't the idea of seeing a society from the outside and never being able to join (or for the Beats, wanting to join) isn't that common?

They were talented writers who didn't fit into the times they lived in, and who made choices that made their lives worse (and documented them extensively) and who reached for drink and drugs (and Eastern spirituality) to numb themselves at being in a world which they felt so apart from. How much different is that than many famous writers across many times and places in history?


Writing RTL or LTR and alphabet alone don't make a language different.

Hindi and Urdu are 90% the exact same language, and are mutually inteligible (Urdu speaker and Hindi speaker can have complete full conversation with each other) but each is written differently (one LTR the other RTL) and with different alphabets


See also Croatian, Serbian, and Bosnian. I also find Chinese to be interesting, e.g. Mandarin and (formal) Cantonese have a near identical written language, while the spoken language is completely different, views on whether or not those languages are different languages or dialects vary wildly.

In my books, the distinction between languages and dialects are so arbitrary that the best method is simply to ask the people that speak those languages/dialects. If they consider them to be different language (which Maltese speakers seemingly do) I call them different languages.


Mandarin-Cantonese is very interesting and a unique (to my knowledge) example where the same written language can be completely different to two different people.

I don't buy the argument of just asking the speakers. There are cultural, political, etc. reasons people may think things which don't conform with reality. Many Hindi-Urdu speakers get insulted by the reality that the languages are pretty much the same because they don't want to identify with people from another country their country is constantly at war with.


Unfair comparison, imo. Irish (Gaelic) is a language which was intentionally suppressed for centuries.

What relevance does that have? I'd say it's more important to acknowledge the fact that there are zero Irish speakers who don't also speak English. Including it as an official EU language is an ideological project rather than a pragmatic one.

Because there is a cause to revert the intentional damage done to Irish by the former rulers of the land. With Frisian there was no resistance to it. I think official language status helps provide resources to conservationists of various languages. And trying to conserve a language most of the speakers don't care about us a lot different than trying to conserve a language people do care about but we're forced to suppress for many years so have less ability to conserve it

It's also an enormous waste of money, which this admin loves to do. Billions to Israel, $40B to Argentina, $1T+ to bomb boats in the Gulf of Mexico, waste waste waste everywhere and send the bill to the average American, whose economic prospects haven't improved

[flagged]


> I dare you to define what you mean by waste.

I'll just go with the dictionary:

>to allow to be used inefficiently or become dissipated

So, why are we bailing out Argentina, or bombing the South American gulfs? We can't be efficient if we can't even explain our reasoning. I've heard very little reasoning from the administration.

The best I heard was "we're hitting drug dealers". Even if I believed that, pending hundreds of billions to attack boats with drugs on them sounds horribly inefficient. Drugs are not an immediate threat to people and we have many methods through negotiation to simply limit/stop such imports.

I've heard zero justifications anywhere on the Argentina issue. It seems even many republicans do not like this approach.


> We can't be efficient if we can't even explain our reasoning

You're confusing transparency with efficiency. Military and international politics decisions often need public lies or omissions for political reasons but that doesn't mean they're inefficient for their intended purpose. If you word it more honestly as "The government can't be efficient if it doesn't explain its reasoning to the public", then it obviously doesn't follow from the definition of waste.


>You're confusing transparency with efficiency

They go hand in hand. Or is it fine that the government is openly lying about how it claims to want to be "America First"?

>that doesn't mean they're inefficient for their intended purpose.

And that's what I ask. What is the intended purpose? I fail to explain it, and even with my most cynical interpretations I don't see how this is an efficient route.

Transparency would help a lot in evaluating if they aren't being wasteful. But as is, it seems to be a bunch of special interests all clashing with one another in the White House. They don't make sense because there's no unified plan.

Which meets the above definition of "waste"


How about spending on things that will have no benefit to the average US citizen, and in fact might just make things worse without solving the stated problem?

How about "stuff that is internally logically inconsistent"?

How about they spend that fucking money on funding food stamps or any of the other programs affected by the shutdown? If they can illegally move money elsewhere then they can do so for making sure people can eat. I'm going to need to start sending money out of my savings to support my parents because this administration is so inept that they're taking away benefits that our tax dollars have already paid for.

Actually, this sounds like a really fun exploit to try to come up with. We saw the Agent select items and put them in a cart...

Why would you even comment that Codex CLI is potentially worth switching an enormous amount of spend over ($70k) and give literally 0 evidence of why it's better? That's all you've got? "Trust me bro"?

With Gravity's Rainbow you can either read it and make peace with the fact that every acronym, reference, character, plot point etc. won't be fully understood or remembered (that's kind of the point with this type of writing style, imo), or you can read it with a companion guide kinda like how some people play video games.

I also liked Crying of Lot 49. Inherent Vice is also a bit of an easier time.


> this year, they decided to pilot a tool that would extract that info (using "GPT-5o-mini": https://www.thalamusgme.com/blogs/methodology-for-creation-a...).

Mind-boggling idea to do this because OCR and pulling info out of PDFs has been done better and for longer by so many more mature methods than having an LLM do it


Nit, I’d say as someone who spend a fair amount of time doing it in the life insurance space, actually parsing arbitrary pdfs is very much not a solved problem without LLMs. Parsing a particular pdf is, at least until they change their table format or w/e.

I don’t think this idea is totally cursed, I think the implementation is. Instead of using it to shortcut filling in grades that the applicant could spot check, like a resume scraper, they are just taking the first pass from the LLM as gospel.


Right - the problem with PDF extraction is always the enormous variety of shapes that data might take in those PDFs.

If all the PDFs are the same format you can use plenty of existing techniques. If you have no control at all over that format you're in for a much harder time, and vLLMs look perilously close to being a great solution.

Just not the GPT-5 series! My experiments so far put Gemini 2.5 at the top of the pack, to the point where I'd almost trust it for some tasks - but definitely not for something as critical as extracting medical grades that influence people's ongoing careers!


-> put Gemini 2.5 at the top of the pack

I have come to the same conclusion having built a workflow that has seen 10 million+ non-standardized PDFs (freight bill of ladings) with running evaluations, as well as against the initial "ground-truth" dataset of 1,000 PDFs.

Humans: ~65% accurate

Gemini 1.5: ~72% accurate

Gemini 2.0: ~88% accurate

Gemini 2.5: ~92%* accurate

*Funny enough we were getting a consistent 2% improvement with 2.5 over 2.0 (90% versus 88%) until as a lark we decided to just copy the same prompt 10x. Squeezed 2% more out of that one :D


Gemini 3.0 is rumored to drop any day now, will be very interesting to see the score that gets for your benchmark here.


As long as the ergonomics with the SDK stay the same. Jumping up to a new model this far in is something I don't want to contemplate wrestling with honestly. When we were forced off of 1.5 to 2.0 we found that our context strategy had to be completely reworked to recover and see better returns.


>Just not the GPT-5 series! My experiments so far put Gemini 2.5 at the top of the pack, to the point where I'd almost trust it for some tasks

Got it. The non-experts are holding it wrong!

The laymen are told "just use the app" or "just use the website". No need to worry about API keys or routers or wrapper scripts that way!

Sure.

Yet the laymen are expected to maintain a mental model of the failure modes and intended applications of Grok 4 vs Grok 4 Fast vs Gemini 2.5 Pro vs GPT-4.1 Mini vs GPT-5 vs Claude Sonnet 4.5...

It's a moving target. The laymen read the marketing puffery around each new model release and think the newest model is even more capable.

"This model sounds awesome. OpenAI does it again! Surely it can OCR my invoice PDFs this time!"

I mean, look at it:

    GPT‑5 not only outperforms previous models on benchmarks and answers questions more quickly, but—most importantly—is more useful for real-world queries.

    GPT‑5 is our best model yet for health-related questions, empowering users to be informed about and advocate for their health. The model scores significantly higher than any previous model on HealthBench , an evaluation we published earlier this year based on realistic scenarios and physician-defined criteria.

    GPT‑5 is much smarter across the board, as reflected by its performance on academic and human-evaluated benchmarks, particularly in math, coding, visual perception, and health. It sets a new state of the art across math (94.6% on AIME 2025 without tools), real-world coding (74.9% on SWE-bench Verified, 88% on Aider Polyglot), multimodal understanding (84.2% on MMMU), and health (46.2% on HealthBench Hard)

    The model excels across a range of multimodal benchmarks, spanning visual, video-based, spatial, and scientific reasoning. Stronger multimodal performance means ChatGPT can reason more accurately over images and other non-text inputs—whether that’s interpreting a chart, summarizing a photo of a presentation, or answering questions about a diagram.
And on and on it goes...


"The non-experts are holding it wrong!"

We aren't talking about non-experts here. Go read https://www.thalamusgme.com/blogs/methodology-for-creation-a...

They're clearly competent developers (despite mis-identifying GPT-5-mini as GPT-5o-mini) - but they also don't appear to have evaluated the alternative models, presumably because of this bit:

"This solution was selected given Thalamus utilizes Microsoft Azure for cloud hosting and has an enterprise agreement with them, as well as with OpenAI, which improves overall data and model security"

I agree with your general point though. I've been a pretty consistent voice in saying that this stuff is extremely difficult to use.


> The laymen

The solution architect, leads, product managers and engineers that were behind this feature are now laymen who shouldn't do their due diligence on a system to be used to do an extremely important task? They shouldn't test this system across a wide range of input pdfs for accuracy and accept nothing below 100%?


I've been doing PDF data extraction with LLMs at my day job, and my experience is to get them sufficiently reliable for a document of even moderate complexity (say, has tables and such, form fields, that kind of thing) you end up writing prompts so tightly-coupled to the format of the document that there's nothing but down-side versus doing the same thing with traditional computer vision systems. Like, it works (ask me again in a couple years as the underlying LLMs have been switched out, whether it's turned into wack-a-mole and long-missed data corruption issues... I'd bet it will) but using an LLM isn't gaining us anything at all.

Like, this company could have done the same projects we've been doing but probably gotten them done faster (and certainly with better performance and lower operational costs) any time in the last 15 years or so. We're doing them now because "we gotta do 'AI'!" so there's funding for it, but they could have just spent less money doing it with OpenCV or whatever years and years ago.


Eh, I guess we’re looked at different PDFs and models. Gemini 2.5 flash is very good, and Gemini 2.0 and Claude 3.7 were passable at parsing out complicated tables in image chunks, and we did have a fairly small prompt that worked >90% of cases. Where we had failures they were almost always in asking the model to do something infeasible (like parse a table where the header was on a previous, not provided page).

If you have a better way to parse PDFs using opencv or whatever, please provide this service and people will buy it for their RAG chat bots or to train vlms.


Would it be helpful if LLM creates bounding boxes for "traditional" OCR to work on? I.e. allowing extraction of information of arbitrary PDF as if it was a "particular pdf"


The parent says

> that information is buried in PDFs sent by schools (often not standardized).

I don't think OCR will help you there.

An LLM can help, but _trusting_ it is irresponsible. Use it to help a human quickly find the grade in the PDF, don't expect it to always get it right.


Don't most jobs do OCR on the resumes sent in for employment? I get that a resume is a more standard format. Maybe that's the rub


The challenge here is that it's not just OCR for extracting text from a resume, this is about extracting grades from school transcripts. That's a LOT harder, see this excellent comment: https://news.ycombinator.com/item?id=45581480


I would love to hear more about the solutions you have in mind, if you're willing.

The particular challenge here I think is that the PDFs are coming in any flavor and format (including scans of paper) and so you can't know where the grades are going to be or what they'll look like ahead of time. For this I can't think of any mature solutions.


I would assume they OCR first, then extract whatever info they need from the result using LLMs

Edit: Does sound like it - "Cortex uses automated extraction (optical character recognition (OCR) and natural language processing (NLP)) to parse clerkship grades from medical school transcripts."


It's a bit difficult to derive exactly what they're using here. There's quite a lot of detail in https://www.thalamusgme.com/blogs/methodology-for-creation-a... but still mentions "OCR models" separately from LLMs, including a diagram that shows OCR models as a separate layer before the LLM layer.

But... that document also says:

"For machine-readable transcripts, text was directly parsed and normalized without modification. For non-machine-readable transcripts, advanced Optical Character Recognition (OCR) powered by a Large Language Model (LLM) was applied to convert unstructured image-based data into text"

Which makes it sounds like they were using vision-LLMs for that OCR step.

Using a separate OCR step before the LLMs is a lot harder when you are dealing with weird table layouts in the documents, which traditional OCR has usually had trouble with. Current vision LLMs are notably good at that kind of data extraction.


Thanks, I didn't see that part!


Welcome to the world of greybeards, baffled by everyone using AWS at 100s to 100000s of times the cost of your own servers.


spectre/meltdown, finding out your 6 month order of ssd's was stolen after opening empty boxes in the datacenter, and having to write RCA's for customers after your racks go over the PSU's limit are things ya'll greybeards seem to gloss over in your calculations, heh


The article didn't say they were lying to investors? In fact, it several times points out that they have trouble raising money from even private investors like Softbank


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: