I assume people talk about transcription, not translation. Translation in youtube ime is indeed horrible in all languages I have tried, but transcription in english is good enough to be useful. However, the more technical jargon a video uses, the worse transcription is (translation is totally useless in anything technical there).
Automatic transcription in English heavily depend on accent, sound quality, and how well the speaker is articulating. It will often mistake words that sound alike to make non-sensible sentences, randomly skip words, or just inserts random words for no clear reason.
It does seem to do a few clever things. For lyrics it seem to first look for existing transcribed lyrics before making their own guesses (Timing however can be quite bad when it does this). Outside of that, AI transcribed videos is like an alien who has read a book on a dead language and is transcribing based on what the book say that the word should sound like phonetically. At times that can be good enough.
(A note on sound quality. It not the perceived quality. Many low res videos has perfectly acceptable, if somewhat lossy sound quality, but the transcriber goes insane. It likes prefer 1080p videos with what I assume much higher bit-rate for the sound.)
In the times I have noticed the transcription be bad, my speech comprehension itself is even worse. So I still find it useful. It is not substitution for human created (or at least curated) subtitles by any means, but better than nothing.