Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is MOE then basically divide and conquer? I have no deep knowledge of this so I assumed MOE was where each expert analyzed the problem in a different way and then there was some map-reduce like operation on the generated expert results. Kinda like random forest but for inference.


> I assumed MOE was where each expert analyzed the problem in a different way

Uh sorta but not like parent described at all. You have multiple "experts" and you have a routing layer(s) that decide which expert to send it to. Usually every token is sent to at least 2. You can't just send half the tokens to one expert and half to another.

Also the "experts" are not "domain experts" - there is not a "programming expert" and an "essay expert".




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: