> Sure it can generally put words in grammatically correct order and is passible for writing no one is going to read anyway (like marketing emails). But the moment it requires actual domain knowledge all these AI models completely fall down because, again, they don't actually understand anything, they just are really good at guessing what word comes next.
Many of us are frequently using it to obtain answers requiring domain knowledge; the evidence that this works is that the answers it gives are often found to be correct, and yet things we did not know. So your description is simply contradicting the experience of many people. Perhaps you're using it in a different domain, or not writing clear prompts, but it's well worth experiencing what people are talking about.
I can ask ChatGPT to write code and assess myself whether it's correct or not. (It's frequently not, but it can still be useful for certain tasks.) I can ask it to explain something about the industry I work in or adjacent fields and have a pretty good sense of whether it's leading me in the right direction. But if I asked it to teach me about, say, Chinese history, I would have very little ability to assess whether it's telling me the truth or sprinkling little falsehoods into things. I would only be able to catch the most blatant mistakes; the rest would require more detailed independent research.
Now imagine you're a kid who's still learning reading comprehension and hasn't yet developed a functional BS detector. You're not going to have much of an ability to separate truth from fiction, but you're likely going to trust this thing that sounds authoritative.
Yes, that's fair, but I think we can do better in our education revolution than just plonking kids down in a classroom with ChatGPT! If we take a fairly constrained subject like high school algebra, we should be able to, for example, study its BS rate by having knowledgeable testers converse with it at length, and thus perhaps produce AIs suitably calibrated and QA'd for classroom usage. But I should go read Sal's book, it sounds promising.
While these are all valid concerns, they also apply to human teachers. Humans spew bullshit all the time.
But that's OK. A teacher's knowledge doesn't need to be perfect, it just needs to be better than the student's.
If ChatGPT explains Chinese history in a way that is 80% accurate, to a student whose understanding of Chinese history is 10% accurate, the student will walk away with a better understanding of Chinese history.
This seems very incorrect to me. Human teachers make mistakes, or understand details of something incorrectly sometimes, but it doesn't come up in every lesson. It only happens occasionally, and sometimes a student will correct them/challenge them on it.
ChatGPT makes mistakes literally every time I use it (in a domain that I'm knowledgable in). How is that the same thing? Being given incorrect information is worse than not having the knowledge at all IMO.
I do find that current LLMs are quite bad at design problems and answering very specific questions for which they may lack sufficient training data. I like them for general Q&A though.
A different architecture or an additional component might be needed for them to generalize better for out-of-training-distribution questions.
“Fall down” doesn’t mean they’re always wrong. It means they’re right just often enough to make you believe them when they’re making stuff up, which is a terrible characteristic for a teacher.
At this point, ChatGPT is good enough that I can tell what authoritative source I should go look at for the answer, which is great, but I’m not going to believe any statements it makes without checking. I think people suffer a bit of Gell-Mann amnesia when dealing with ChatGPT.
I actually do get a lot of value from ChatGPT as a self-motivated learner with a specific goal in mind, but that’s not the scenario we’re talking about.
Many of us are frequently using it to obtain answers requiring domain knowledge; the evidence that this works is that the answers it gives are often found to be correct, and yet things we did not know. So your description is simply contradicting the experience of many people. Perhaps you're using it in a different domain, or not writing clear prompts, but it's well worth experiencing what people are talking about.