The output of an autoregressive model is a probability for each token to appear ...

The output of an autoregressive model is a probability for each token to appear next after the input sequence. Computing these is strictly deterministic from the prior context and the model's weights.

Based on that probability distribution, a variety of text generation strategies are possible. The simplest (greedy decoding) is picking the token with the highest probability. To allow creativity, a random number generator is used to choose among the possible outputs, biased by the probabilities of course.

Temperature scales the output probabilities. As temperature increases, the probabilities approach 1/dictionary size, and the output becomes completely random. For very small temperature values, text generation approaches greedy sampling.

If all you want is a spam filter, better replace the output layer of an LLM with one with just two outputs, and finetune that on a public collection of spam mails and some "ham" from your inbox.