Yeah, the heavily distilled models are very bad with hallucinations. I think the...

		jjoonathan 54 days ago \| parent \| context \| favorite \| on: Claude says “You're absolutely right!” about every... Yeah, the heavily distilled models are very bad with hallucinations. I think they use them to cover for decreased capacity. A 1B model will happily attempt the same complex coding tasks as a 1T model but the hard parts will be pushed into an API call that doesn't exist, lol.