Why do we have weird LLM sizes like 3B, 7B, 13B, 33B, 70B? Who invented those?

superkuh · on Dec 19, 2023

Heh. Those aren't the weird ones. Those are just the base model sizes for popular semi-open foundational models. The weird ones are the "merged" models where people arbitrarily combine layers from multiple models to make larger ones with more layers, like Solar-10-7B (https://huggingface.co/upstage/SOLAR-10.7B-v1.0) a 10B made from 7B Mistral fine tunes. Or the 22B "llamas".

htk · on Dec 19, 2023

Anyone has any suggestions on the best models to run on a M3 Max with 128GB Ram?

superkuh · on Dec 19, 2023

Probably the Mixtral-8x7B.

htk · on Dec 19, 2023

Thanks for the tip, will play with Mixtral.

m3kw9 · on Dec 19, 2023

mixtral-8x7b-v0.1.Q8_0.gguf

m3kw9 · on Dec 19, 2023

What determines the number of parameters? How do you say we need 3b here and 7 there?

mistrial9 · on Dec 19, 2023

> What are LLM Parameters? LLM Parameters are settings that you can adjust to control how the LLM generates texts. They can affect the quality, diversity, and creativity of the generated texts. Some of the common LLM parameters are temperature, number of tokens, top-p, presence penalty, and frequency penalty.

In this post, I will explain what these parameters are and how to use them effectively. I will also show you some examples of texts that can be generated by LLMs with different parameter values.

https://michaelehab.medium.com/the-secrets-of-large-language...