Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why do we have weird LLM sizes like 3B, 7B, 13B, 33B, 70B? Who invented those? (reddit.com)
16 points by mtmail on Dec 19, 2023 | hide | past | favorite | 7 comments


Heh. Those aren't the weird ones. Those are just the base model sizes for popular semi-open foundational models. The weird ones are the "merged" models where people arbitrarily combine layers from multiple models to make larger ones with more layers, like Solar-10-7B (https://huggingface.co/upstage/SOLAR-10.7B-v1.0) a 10B made from 7B Mistral fine tunes. Or the 22B "llamas".


Anyone has any suggestions on the best models to run on a M3 Max with 128GB Ram?


Probably the Mixtral-8x7B.


Thanks for the tip, will play with Mixtral.


mixtral-8x7b-v0.1.Q8_0.gguf


What determines the number of parameters? How do you say we need 3b here and 7 there?


> What are LLM Parameters? LLM Parameters are settings that you can adjust to control how the LLM generates texts. They can affect the quality, diversity, and creativity of the generated texts. Some of the common LLM parameters are temperature, number of tokens, top-p, presence penalty, and frequency penalty.

In this post, I will explain what these parameters are and how to use them effectively. I will also show you some examples of texts that can be generated by LLMs with different parameter values.

https://michaelehab.medium.com/the-secrets-of-large-language...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: