Hacker Newsnew | past | comments | ask | show | jobs | submit | cyp0633's commentslogin

A brief search in Chinese social media turns out:

- the cases are rare, once every several months

- the affected battery pack is usually replaced free of charge

- many of the cases are from Chinese in North America, and even one case is about imported Model 3

Considering Chinese media prefer reporting Tesla defects to gain more clicks, I would assume this is not a Shanghai-specific, broadly affecting problem.


I suppose that may depend how one defines a "broadly affecting problem". 2/3 (~2.9k) of the cases are for 2021 vehicles, and maybe it's possible those relate to specific batches shipped only to South Korea or something, but that still wouldn't explain the other 1/3 (~1.6k) of cases.

Tesla hasn't provided statistics about cases in the Chinese market, but I have a hard time believing it's really a handful of battery issues per year (regardless what one assumes of social media). That's just too far beyond any reasonable quality level expectation for ~half a million cars over the time period.


I have never considered posting to social media about automotive issues, unless something truly egregious happened that I felt I should warn people about. If you see complaints, it's probably fairly frequent.

That's you, not the common Chinese crowd - compared to a known problem of another good-selling car, I can say that Tesla problem is indeed rare.

I loved typesetting my bachelor thesis with Typst (but with LaTeX math formula), and now it's even more promising after being able to embed PDF figures this July (see issue #145).


Someone just copy-pasted an implementation from a random Chinese blog, completely unaware of what the key means.


> copy-pasted an implementation from a random Chinese blog

But.. the blog was chosen by a series of dice rolls, guaranteed to be random!


All 4s for some reason. [1]

[1] https://xkcd.com/221/


Chinese coordinates definitely can be converted to WGS-84 - it's Google that did not do that. Look at Shenzhen River in OpenStreetMap, the streets of Hong Kong and Shenzhen align with each other perfectly.


Really great job, been happy so far. Would be better if the text is searchable via OCR.


ah it's not? that would be a blocker for me


We truly live in the future already, where OCR is a normal expectation. (And possible)


With a little search you can find it's a laboratory within the CS department of THU. It's a fairly large lab though, not those led by just one or two professors.


The runner of Compiler Explorer tried to collect the public shortlinks and do the redirection themselves:

Compiler Explorer and the Promise of URLs That Last Forever (May 2025, 357 points, 189 comments)

https://news.ycombinator.com/item?id=44117722


The same happens with whisper-large-v3 on Chinese transcription: silence is transcribed to something like "please upvote, share and favourite this video". I suspect they trained the model on some random YouTube video without carefully picking really useful data.


In Chinese, it always added something like "For study/research purpose only. Please delete after 48 hours." This is what those volunteers added in subtitles of (pirated) movies/shows.


Fair, if AI companies are allowed to download pirated content for "learning", why ordinary people cannot.


There is so much damning evidence that AI companies have committed absolutely shocking amounts of piracy, yet nothing is being done.

It only highlights how the world really works. If you have money you get to do whatever the fuck you want. If you're just a normal person you get to spend years in jail or worse.

Reminds me of https://www.youtube.com/watch?v=8GptobqPsvg


There's actually a lot of court activity on this topic, but the law moves slowly and is reluctant to issue injunctions where harm is not obvious.

It's more that the law about "one guy decides to pirate twelve movies to watch them at home and share with his buddies" is already well-settled, but the law about "a company pirates 10,000,000 pieces to use as training data for an AI model (a practice that the law already says is legal in an academic setting, i.e. universities do this all the time and nobody bats an eye)" is more complicated and requires additional trials to resolve. And no, even though the right answer may be self-evident to you or me, it's not settled law, and if the force of law is applied poorly suddenly what the universities are doing runs afoul of it and basically nobody wants that outcome.


What’s ironic to me is that had these companies pirated only a single work, wouldn’t that be a chargeable crime?

Clearly Bonnie and Clyde shouldn’t have been prosecuted. Imagine they were just robbing banks for literary research purposes. They could have then used the learnings to write a book and sell it commercially…

Or imagine one cracks 10000 copyrighted DVDs and then sells 30 second clips… (a derived work).

To me, for profit companies and universities have a huge difference — the latter is not seeking to directly commercially profit from copyrighted data.


There is a distinction that must be made that very few people do, but thankfully the courts seems to grasp:

Training on copyright is a separate claim than skirting payment for copyright.

Which pretty much boils down to: "If they put it out there for everyone to see, it's probably OK to train on it, if they put it behind a paywall and you don't pay, the training part doesn't matter, it's a violation."


Whether it’s legal slash fair use to train on copyrighted material is only one of the questions currently being asked though. There’s a separate issue at play where these companies are pirating the material for the training process.

By comparison, someone here brought up that it might be transformative fair use to write a play heavily based on Blood Meridian, but you still need to buy a copy of the book. It would still be infringement to pirate the e-book for your writing process, even if the end result was legal.


If they would buy material at a large scale, the seller might require them to sign a contract that requires royalty if the material is used for training an AI. So buying legally is a way to put yourself into a trap.


They can buy individual works like anyone else.

Or they can negotiate a deal at scale with whatever price / restrictions make sense to both parties.

I don’t see a way they could be “trapped”. Worst case they pay retail price.


What is the precedent on that kind of agreement?

The only thing I've been able to find is the note that since copyright is federal law, state contract law actually can't supersede it, to wit: if you try to put a clause in the contract that says the contract is void if I use your work to make transformative fair-use works (or I owe you a fee), that clause is functionally unenforceable (for the same reason that I don't owe you a fee if I make transformative fair-use works of your creations in general).


So if I download copyrighted material like the new disney movie with fansubs and watch it for training purposes instead of enjoyment purposes it's fine? In that case I've just been training myself, your honor. No, no, I'm not enjoying these TV shows.

Because it's important to grasp the scale of these copyright violations:

* They downloaded, and admitted to using, Anna's Archive: Millions of books and papers, most of which are paywalled but they pirated it instead

* They acquired Movies and TV shows and used unofficial subtitles distributed by websites such as OpenSubtitles, which are typically used for pirated media. Official releases such as DVDs tend to have official subtitles that don't sign off with "For study/research purpose only. Please delete after 48 hours" or "Subtitles by %some_username%"


I don't know what is confusing here, perhaps my comment isn't clear.

If you skirt payment, its a violation. If it's free, but still copyright, it's likely not a violation.


They've done both, so my confusion is about why you are bringing this up?


OpenSubtitles has nothing to do with pirated media. Transcripts/translations are fair use. Their own use case is fair use as well.


OpenSubtitles is almost exclusively used with pirated media. Official copies come with official subtitles. OpenSubtitles itself is legal, but that's not the point at all.


If you owe the bank $1,000 you have a problem.

If you owe the bank $100,000,000 the bank has a problem.

We live in an era where the president of the United States uses his position to pump crypto scams purely for personal profit.


10% for the big don


The dead corpses of filmmakers and authors and actors are buried in unmarked graves out behind those companies' corporate headquarters. Unimaginable horror, that piracy. Why has no one intervened?

>If you're just a normal person you get to spend years in jail or worse.

Not that I'm a big fan of the criminalization of copyright infringement in the United States, but who has ever spent years in jail for this?

Besides, if it really bothered you, then we might not see this weird tone-switch from one sentence to the next, where you seem to think that piracy is shocking and "something should be done" and then "it's not good tht someone should spend time in jail for it". What gives?


> who has ever spent years in jail for this?

Aaron Swartz?

EDIT: apparently he wasn't in jail, he was on bail while the case was ongoing - but the shortest plea deal would still have had him in jail for 6 months, and the penalty was 35 to 50 years.


Nope, he didn't go to jail.


> Besides, if it really bothered you, then we might not see this weird tone-switch from one sentence to the next, where you seem to think that piracy is shocking and "something should be done" and then "it's not good tht someone should spend time in jail for it". What gives?

What a weirdly condescending way to interpret my post. My point boils down to: Either prosecute copyright infringement or don't. The current status quo of individuals getting their lives ruined while companies get to make billions is disgusting.


> Either prosecute copyright infringement or don't

This is the absolute core of the issue. Technical people see law as code, where context can be disregarded and all that matters is specifying the outputs for a given set of inputs.

But law doesn’t work that way, and it should not work that way. Context matters, and it needs to.

If you go down the road of “the law is the law and billion dollar companies working on product should be treated the same as individual consumers”, it follows that individuals should do SEC filings (“either require 10q’s or don’t!”), and surgeons should be jailed (“either prosecute cutting people with knives or don’t!”).

There is a lot to dislike about AI companies, and while I believe that training models is transformative, I don’t believe that maintaining libraries of pirated content is OK just because it’s an ingredient to training.

But insisting that individual piracy to enjoy entertainment without paying must be treated exactly the same as datasets for model training is the absolute weakest possible argument here. The law is not that reductive.


> But law doesn’t work that way, and it should not work that way. Context matters, and it needs to.

As Anatole France famously quipped:

"The law, in its majestic equality, forbids the rich and poor alike to sleep under bridges, to beg in the streets, and to steal bread."


Pretty funny that your argument boils down to: It's okay to break the law if you do it as a company.

Copyright laws target everyone. SEC laws don't.


Not sure if I was unclear or you’re disingenuous. But that is not at all what I said.


It doesn't matter whether it's transformative. Copyright covers derivative works.


No one (in the US) has been jailed for downloading copyrighted material.


https://en.wikipedia.org/wiki/Aaron_Swartz

And the US is not the only jurisdiction


That's not the same as piracy though. He wasn't downloading millions of scientific papers from libgen or sci-hub, he was downloading them directly from jstor. Indeed, none of his charge was for copyright infringement. It was for stuff like "breaking and entering" and "unauthorized access to a computer network".


The exact same charges could apply to the AI scrapers illegitimately accessing random websites.


No, they couldn't, since the then-novel and untested strained interpretation of the CFAA that the prosecutor was relying on has since been tested in the courts and soundly rejected.


I haven’t seen any accusations that they’ve done that, though. Usually people get pirated material from sources that intentionally share pirated material.


They're not just training on pirated content, they've also scraped literally the entire internet and used that too.


Scraping the public internet is also not a CFAA violation


CFAA bans accessing a protected computer without authorization. Hitting URLs denied by robots.txt has been argued to be just that.


> Hitting URLs denied by robots.txt has been argued to be just that.

"Has been argued" -- sure, but never successfully; in fact, in HiQ v. LinkedIn, the 9th Circuit ruled (twice, both before and on remand again after and applying the Supreme Court ruling in Van Buren v. US) against a cease and desist on top of robots.txt to stop accessing data on a public website constituting "without authorization" under the CFAA.


Now do every other jurisdiction


CFAA was mentioned specifically, which means only US jurisdiction is relevant here.


Part of the accusation comes from the fact that Swartz accessed the downloads through a MIT network closet, which AI companies wasn't doing. The equivalent to that would be if openai broke into a wiring closet at Disneyland to download Disney movies.


The CFAA is vague enough to punish unauthorized access to a computer system. I don't have an example case in mind, but people have gotten in trouble for scraping websites before while ignoring e.g. robots.txt


The CFAA might be vague, but the case law on scraping pretty much has been resolved to "it's pretty much legal except in very limited circumstances". It's regrettable that less resourced defendants were harassed before large corporations were able to secure such rulings, but the rulings that allowed scraping occurred before AI companies' scraping was done, so it's unclear why AI companies in particular should be getting flak here.


Aaron Swartz was not jailed or even charged for copyright infringement. The discussion and the comment I replied to is centered around US companies and jurisdiction.


The thread is centered around US companies, but not US jurisdiction.


There could be a moral question. For example a researcher might not want to download a pirated paper and cause loss to a fellow researcher. But it becomes pretty stupid to pay when everyone, including large reputable companies endorsed by the government, is just downloading the content for free. Maybe his research will help developing faster chips to win against China, why should he pay?

Would it be a "fair use" to download pirated papers for research instead of buying?

Also I was gradually migrating from obtaining software from questionable sources to open source software, thinking that this is going out of trend and nobody torrents apps anymore, but it seems I was wrong?

Or another example: if someone wants to make contributions to Wine but needs a Windows for developing the patch, what would be the right choice, buy it or download a free copy from questionable source?


Researchers don't get paid when their papers are downloaded, though. They pay to have their papers downloaded, and the middleman makes money on both sides. Piracy is the only moral option for them. There is a reason every single competent professor in the western world will email you a free copy of their papers if you ask nicely.


What about people filming movies in the cinema (for learning of course)? [1]

[1] https://www.thefederalcriminalattorneys.com/unauthorized-rec...


No, if you revolutionize both the practice and philosophy of computing and advance mankind to the next stage of its own intellectual evolution, you get to do whatever the fuck you want.

Seems fair.


Hm. Not a given that it's an advance.


I get the common cynical response to new tech, and the reasons for it.

We wish we lived in a world where change was reliably positive for our lives. Often changes are sold that way, but they rarely are.

But when new things introduce dramatic capabilities that former things couldn't match (every chatbot before LLMs), it is as clear of an objective technological advance as has ever happened.

--

Not every technical advance reliably or immediately makes society better.

But whether or when technology improves the human condition is far more likely to be a function of human choices than the bare technology. Outcomes are strongly dependent on the trajectories of who has a technology, when they do, and how they use it. And what would be the realistic (not wished for) outcome of not having or using it.

For instance, even something as corrosive as social media, as it is today, could have existed in strongly constructive forms instead. If society viewed private surveillance, unpermissioned collation across third parties, and weaponizing of dossiers via personalized manipulation of media, increased ad impact and addictive-type responses, as ALL being violations of human rights to privacy and freedom from coercion or manipulation. And worth legally banning.

Ergo, if we want tech to more reliably improve lives, we need to ban obviously perverse human/corporate behaviors and conflicts of interest.

(Not just shade tech. Which despite being a pervasive response, doesn't seem to improve anything.)


At the risk of stepping on a well-known land mine around here, how'd you do on the IMO problem set this year?


I didn't participate. I probably wouldn't have done well. I disagree with your framing.


Well, wait, if somebody writes a computer program that answers 5 of 6 IMO questions/proofs correctly, and you don't consider it an "advance," what would qualify?

Either both AI teams cheated, in which case there's nothing to worry about, or they didn't, in which case you've set a pretty high bar. Where is that bar, exactly? What exactly does it take to justify blowing off copyright law in the larger interest of progress? (I have my own answers to that question, including equitable access to the resulting models regardless of how impressive their performance might be, but am curious to hear yours.)


The technology is capable in a way that never existed before. We haven't yet begun to see the impacts of that. I don't think it will be a good for humanity.

Social networks as they exist today represent technology that didn't exist decades ago. I wouldn't call it an "advancement" though. I think social media is terrible for humans in aggregate.


I notice you've motte-and-baileyed from "revolutionize both the practice and philosophy of computing and advance mankind to the next stage of its own intellectual evolution" to simply "is considered an 'advance'".


You may have meant to reply to someone else. recursive is the one who questioned whether an advance had really been made, and I just asked for clarification (which they provided).

I'm pretty bullish on ML progress in general, but I'm finding it harder every day to disagree with recursive's take on social media.


Except that the jury’s (at best) still out on whether the influence of LLMs and similarly tech on knowledge workers is actually a net good, since it might stunt our ability to critically think and problem solve while confidently spewing hallucinations at random while model alignment is unregulated, haphazard, and (again at best) more of an art than a science.


Well, if it's no big deal, you and the other copyright maximalists who have popped out of the woodwork lately have nothing to worry about, at least in the long run. Right?


It's not about copyright _maximalism,_ it's about having _literally any regard for copyright_ and enforcing the law in a proportionate way regardless of who's breaking the laws.

Everyone I know has stories about their ISP sending nastygrams threatening legal action over torrenting, but now that corporations (whose US legal personhood appears to matter only when it benefits them) are doing it as part of the development of a commercial product that they expect to charge people for, that's fine?

And in any case, my argument had nothing to do with copyright (though I do hate the hypocrisy of the situation), and whether or not it's "nothing to worry about" in the long run, it seems like it'll cause a lot of harm before the benefits are felt in society at large. Whatever purported benefits actually come of this, we'll have to deal with:

- Even more mass layoffs that use LLMs as justification (not just in software, either). These are people's livelihoods; we're coming off of several nearly-consecutive "once-in-a-generation" financial crises, a growing affordability crisis in much of the developed world, and stagnating wages. Many people will be hit very hard by layoffs.

- A seniority crisis as companies increasingly try to replace entry-level jobs with LLMs, meaning that people in a crucial learning stage of their jobs will have to either replace much of the learning curve for their domain with the learning curve of using LLMs (which is dubiously a good thing), or face unemployment, and leaving industries to deal with the aging-out of their talent pools

- We've already been heading towards something of an information apocalypse, but now it seems more real than ever, and the industry's response seems to broadly be "let's make the lying machines lie even more convincingly"

- The financial viability of these products seems... questionable right now, at best, and given that the people running the show are opening up data centres in some of the most expensive energy markets around (and in the US's case, one that uniquely disincentivizes the development of affordable clean energy), I'm not sure that anyone's really interested in a path to financial sustainability for this tech

- The environmental impact of these projects is getting to be significant. It's not as bad as Bitcoin mining yet, AFAIK, but if we keep on, it'll get there.

- Recent reports show that the LLM industry is starting to take up a significant slice of the US economy, and that's never a good sign for an industry that seems to be backed by so much speculation rather than real-world profitability. This is how market crashes happen.


>why ordinary people cannot

They can. I don't think anyone got prosecuted for using an illegal streaming site or downloading from sci-hub, for instance. What people do get sued for is seeding, which counts as distribution. If anything AI companies are getting prosecuted more aggressively than "ordinary people", presumably because of their scale. In a recent lawsuit Anthropic won on the part about AI training on books, but lost on the part where they used pirated books.


People got in trouble for filming in the cinema as I understand, there is a separate law for that.


But in that case even though filming isn't technically distribution, it's clearly a step to distributing copies? To take this to the extreme, suppose you ripped a blu-ray, made a thousand copies, but haven't packaged or sold them yet. If the FBI busted in, you'd probably be prosecuted for "conspiracy to commit copyright infringement" at the very least.


It's just "training"


You seem to equate "training" (with scare quotes) with someone actually pirating a blu-ray, but they really aren't equivalent. Courts so far have ruled that training is fair use and it's not hard to see why. Unlike copying a movie almost verbatim (as with ripping a blu-ray), AI companies are actually producing something transformative in the form of AI models. You don't have to like AI models, or the AI companies' business models, but it strains credulity to pretend ripping a blu-ray is somehow equivalent to training an AI model.


Who's to say why I downloaded and am now watching a movie? Is it for my enjoyment? Is it because I'm training my brain? How is me training my brain any different from companies training their LLMs?

Same goes for recording: I'm just training my skills of recording. Or maybe I'm just recording it so I can rewatch it later, for training purposes, of course.


>Who's to say why I downloaded and am now watching a movie? Is it for my enjoyment? Is it because I'm training my brain? How is me training my brain any different from companies training their LLMs?

None of this is relevant because Anthropic was only left off the hook for training, and not for pirating the books itself. So far as the court cases are playing out, there doesn't appear to be a special piracy exemption for AI companies.

>Same goes for recording: I'm just training my skills of recording. Or maybe I'm just recording it so I can rewatch it later, for training purposes, of course.

You can certainly use that as a defense. That's why we have judges, otherwise there's going to be some smartass caught with 1KG of coke and claiming it's for "personal consumption" rather than distribution.

None of this matters in reality, though. If you're caught with AV gear in a movie theater once, you'd likely be ejected and banned from the establishment/chain, not have the FBI/MPAA go after you for piracy. If you come again, you'd likely be prosecuted for trespassing. In the cases where they're going after someone in particular for making these rips, they usually have a dossier of evidence, like surveillance/transaction history showing that the same individual has been repeatedly recording movies, and watermarks correlating the screenings that the person has been in to files showing up on torrent sites.


> If you're caught with AV gear in a movie theater once, you'd likely be ejected and banned from the establishment/chain, not have the FBI/MPAA go after you for piracy

Good example, because this is exactly what websites are doing with LLM companies, who are doing their damnest to evade the blocks. Which brings us back around to "trespassing" or the CFAA or whatever.


>Which brings us back around to "trespassing" or the CFAA or whatever.

That argument is pretty much dead after https://en.wikipedia.org/wiki/Van_Buren_v._United_States and https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn


https://www.rvo.nl/onderwerpen/octrooien-ofwel-patenten/vorm...

I'll leave all other jurisdictions up to you.


IANAL, but reading a bit on this topic: the relevant part of the copyright law for AI isn't academia, it's transformative work. The AI created by training on copyrighted material transforms the material so much that it is no longer the original protected work (collage and sampling are the analogous transformations in the visual-arts and music industries).

As for actually gathering the copyrighted material: I believe the jury hasn't even been empaneled for that yet (in the OpenAI case), but the latest ruling from the court is that copyright may have been violated in the creation of their training corpus.


AFAIK, downloading or watching pirated stuff isn't something you'll get in trouble for. Hosting and distributing it is what will get you.


Well, it just shows that they've downloaded subtitles.


Interesting, in Russian, it often ends with "Subtitles by %some_username%"


That is not the case here - I never encountered this with whisper-large-v3 or similar ASR models. Part of the reason, I guess, is that those subs are burnt into the movie, which makes them hard to extract. Standalone subs need the corresponding video resource to match the audio and text. So nothing is better than YouTube videos which are already aligned.


At least for English, those "fansubs" aren't typically burnt into the movie*, but ride along in the video container (MP4/MKV) as subtitle streams. They can typically be extracted as SRT files (plain text with sentence level timestamps).

*Although it used to be more common for AVI files in the olden days.


SRT is ancient. Nowadays everyone uses ASS subtitles which can be randomly styled.


In general? In the past I've known ASS to be used a lot for things like anime, but less for live action shows.


I have also found them inside mkvs as the subtitle track. I think SRT was the default because most content was ripped from DVD/BD, but now most of the content is from streaming sources and you need to convert the subtitles anyway.


WebVTT (a SubRip successor) is probably more widely used than ASS


By legit providers, probably.


flashbacks of trying to track down subs sync’d to a specific release


Indeed, with another model I would get persistent transcriptions of silent parts into 'Thanks for watching!' or '[MUSIC]'. Pretty dumb that this failure mode wasn't caught in some QA process, and there are now multiple transcription models suffering from the same issue. Having silent parts in your input audio seems like it should be a very common occurrence...


When I was taught mathematics, the zero value was always considered the most important edge case. You prove something for N=0 (or N=1), then for N=M+1.

It's even more important in audio DSP: processing near-zeroes can end up being extremely CPU intensive, look up denormal/subnormal floats.


Yeah, I studied mathematics (algebra and number theory) and zero is the point, often sporting discontinuities, or weird asymptotic behavior.

Quite a lot of algorithms use some form of division and zero is the only number in our typical structures (Z, Q, R, C), that cannot be used to divide with.


In machine integer arithmetics, one must also beware division by -1, which can convert MIN_INT into MIN_INT with a signed overflow and violate some arithmetics invariants, such as sign (negative divided by negative is _usually_ positive).


Well, now in this brave new age of AI we can enjoy computer programs crashing with an

    Error: division by please upvote, share and like!


This also works; I upvoted your comment.


I have discovered a truly marvelous proof of how to smash that like and subscribe button, which this comment box is too small to contain.


Signed by Pierre de FermAIt


NaN


Denormals are flushed to zero by default on most GPUs by the way.


Makes total sense, execution time is bounded. The point is it's still a case you must consider (what if near-zero is distinct from zero and significant?)


whisper MUST be combined with silence detection / VAD


Ah, the good old "you're holding it wrong".

What good is a speech recognition tool that literally hears imaginary voices?


Considering that if you DO use VAD (voice activity detection), it's the best open weights voice recognition model by a very wide margin, it's quite good. I'd be willing to be that commercial products that "don't have this problem" are using VAD as well, and that this is well known to them. But Whisper is just the weights, and I suppose a simple reference implementation, not a full product.


> What good is a speech recognition tool that literally hears imaginary voices?

Well, if it is supposed to work after silence detection, then it is good for speech recognition I guess. It's like blaming a wheel why is it circular, you can't sit on it. It's a part of a larger machine.


Just lay the wheel on its side and it makes a fine seat.


>imaginary voices

On the other hand, I can imagine that when things get quiet and the signal-to-noise ratio gets close to zero, random background audio (or randomness introduced in the transcription model) will be enough to tickle a critical number of neurons and elicit hallucinations.

The related thought exercise is this: Try scanning across the band with an AM or sideband radio, and after a while your brain will start to wonder "was that a voice I just heard, or music perhaps?" when in reality it was just environmental static.


Yes, you are holding it wrong. The good of it is that it does not output imaginary voices when used with VAD.

Show us a technology with better results that does not use VAD. If you can’t, then I’m not sure what you’re arguing against except superficialities so inconsequential that I can’t comprehend the condescension. The results speak for itself


faster-whisper has a min_silence_duration_ms option


There are much higher quality VAD solutions available


Please name a couple to get someone started who's hacking on webapps?

I'd really appreciate it.


(as would future readers, I'm sure)



I last used silero but haven’t kept up with stage of the art so didn’t mention it


So if a tool has a process to have it perform at its best then it's a problem?

Do you also moan that before applying glue to a surface or it won't stick? Or if you need to drill a guiding hole before making a larger one in wood? Or that you need to use truly prime numbers for a security key to actually be safe?


What's a good starter VAD lib, and if you know, the best implementation of something like this to use in a browser-based app?

Say if I wanted to use it for Voice Nav, or Voice Input, but not piss off random people speaking the wrong language.


If that's truly the case then they should make it part of the product, IMHO.


How is it not the case? It is unusable without VAD or editing. I don't understand what you're questioning

I agree their products could be better "end to end" integrated. Meanwhile there is a continuously-improving field of work for detecting speech (which Whisper is incapable of). They offer official "cookbooks" with guidance on an approach they recommend: https://cookbook.openai.com/examples/whisper_processing_guid...

> At times, files with long silences at the beginning can cause Whisper to transcribe the audio incorrectly. We'll use Pydub to detect and trim the silence.

(Official OpenAI quote)


What's VAD?


Voice Activity Detection (it predicts whether a short clip contains speech, eg to mute your microphone when you aren't speaking).


Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?


I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.


I can. He was asking if Babbage was cheating.

You put in 2+2 - the right figures. The machine says 4 - the right answer. If you put in the wrong figures, like 3+3, will the machine still say 4? It's easy to make a machine that always says 4.

The people who asked him that question, however, probably got a different scam demonstrated to them every every. Remember the Mechanical Turk? Babbage's reply paints him very honestly. It shows that he couldn't even conceive that someone might try to trick the royal court (or whoever it was) into accepting a fake device.


Having zero exposure to any form of computation for your entire life, as the vast majority of people in the early 19th century were.


What's the defence for the current population?


When YouTube began building automatic transcriptions for captions, it regularly flagged any noise or music -- typically industrial noise -- with "[foreign]"

If it couldn't understand it, it was "foreign" for the longest time.


Hey, Netflix occasionally still puts in its English subtitles "[foreign music]", it always cracks me up.


[speaks japanese]

To be fair, there is a difference between when subtitles match the source language and when they don't. Former are often verbatim.


Haha, yes, it's fair when English subtitles write something like [speaks Japanese], especially when at least one of the characters is not supposed to understand what's being said (when they do, it's more appropriate to write "[in Japanese]: let's go shopping!").

Netflix sometimes takes the cake with what I consider the most outrageous option: writing "[in English]" when they mean "in whatever language the protagonist considers native", which is mind-bogglingly wrong and hilarious at the same time.

They do this with the English subtitles of the German production "Die Kaiserin" ("The Empress"): whenever Sisi is speaking in another language, say French, the subtitles will say "[in French] I love you...", and when she switches back to German they will say "[in English] I love you...". WTF, Netflix? Note this is unrelated to understanding German; it's mostly Netflix looking down on its customers and assuming they cannot comprehend there are people in the world for whom their native tongue is different to the viewer's native tongue.

This has happened in more shows, enough to know it's not a fluke, though Netflix is inconsistent about it.


[laughs in Japanese]


Yeah, I can confirm seeing that a fair bit specifically during non-verbal parts of videos when someone is using a tool.


Can confirm as well, although to my recollection it just shows up as if it's a word the transcription model heard, not "[foreign]" in brackets like with "[Music]" or "[Applause]". It's especially weird to me because I recall the auto-transcriptions being reasonably serviceable when they first rolled them out, only to degrade over time to the point where it was hallucinating the word "foreign" and dropping letters from words or using weird abbreviations (like "koby" for "kilobyte", "TBTE" for "terabyte", or, most memorably weirdly, transcribing the phrase "nanosecond-by-nanosecond" as "nond by nanc") if it didn't decide it heard another one entirely.

I also noticed a couple of months ago that YouTube seems to have quietly rolled out a new auto-transcription model that can make reasonable guesses at where capitalization, punctuation, and sentence boundaries should go. It seems to have degraded even more rapidly than the old one, falling victim to the same kinds of transcription errors. Although the new one has a different hallucination in silence and noise that it wasn't able to classify (which, incidentally, its ability to recognize things like music and applause seems worse than the old one's): where the old model would have hallucinated the word "foreign", the new one thinks it's hearing the word "heat", often repeated ("Heat. Heat.").


That's interesting, the few times I tried playing with whisper, I had the impression that YouTube style videos or random cellphone videos was something it did particularly bad with (compared to movies). My guess at the time was that most of the training material might be sub titles and raw screen plays.

The videos I tried to transcribe were also Mandarin Chinese, using whisper-large-v3. Besides the usual complaints that it would phonetically "mishear" things and generate nonsense, it was still surprisingly good, compared to other software I played around with.

That said, it would often invent names for the speakers and prefix their lines, or randomly switch between simplified and traditional Chinese. For the videos I tested, intermittent silence would often result in repeating the last line several times, or occasionally, it would insert direction cues (in English for some reason). I've never seen credits or anything like that.

In one video I transcribed, somebody had a cold and was sniffling. Whisper decided the person was crying (transcribed as "* crying *", a cough was turned into "* door closing *"). It then transcribed the next line as something quite unfriendly. It didn't do that anymore after I cut the sniffling out (but then the output switched back to traditional Chinese again).


Similar in the English model. Pretty clear they trained on YouTube videos where creators will put that in otherwise silent sections to ensure it shows up for people with CC on.


The number one hallucination in my transcriptions was "Subtitles by the Amara.org community".


> I suspect they trained the model on some random YouTube video without carefully picking really useful data.

They trained the model on every YouTube video they could, and hoped the aggregate was useful data.


This reminds me, some years ago as Google was expanding its translation service, someone tried translating text into and out of an obscure African language (don't recall which) and it always came out as weird Biblical-sounding semi-gibberish.

My revelation was that machine translation needs a corpus of bilingual documents to learn from, and if the language is sufficiently obscure, there may not be any bilingual documents except for the Bible, which missionaries have translated into just about every language on Earth.


This is totally happening with other models too, at least with Spanish. Many transcriptions will end with something that roughly translates to "Thanks for watching!" even if it's never present in the original audio.


oh yeah this happens a lot on reddit on videos in foreign languages


lmao


Stellantis brands sales in China have been in decline long before the emergence of EVs. There is indeed some existence of Stellantis in China via Leapmotor (laugh).


I use fprintd and it works well with GNOME + builtin Elan sensor. It indeed needs more complex configuration than Touch ID or Windows Hello though.


In Fedora it's (supposed to be) pretty simple. Just go into settings -> users and add your fingerprint. In practice I usually have to use dnf to nuke pam and reinstall it manually for it to start working. But they have a good skeleton set up. Still no predesktop authentication, though.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: