it's a "language" model (LLM), not a "math" model. when it is generating your answer, predicting and outputing a word after word it is _not_ multiplying your numbers internally.
Yes, I know. It's just kind of interesting how it can make inferences about complicated things but not get multiplications correct that would almost definitely have been in its training set many times (two digit by two digit)