Is MOE then basically divide and conquer? I have no deep knowledge of this so I ...

declaredapple · on Feb 16, 2024

> I assumed MOE was where each expert analyzed the problem in a different way

Uh sorta but not like parent described at all. You have multiple "experts" and you have a routing layer(s) that decide which expert to send it to. Usually every token is sent to at least 2. You can't just send half the tokens to one expert and half to another.

Also the "experts" are not "domain experts" - there is not a "programming expert" and an "essay expert".