With `ollama run gemma3:27b-it-qat "What is blue"`, GPU memory usage is just a h... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jffry 7 months ago \| parent \| context \| favorite \| on: Gemma 3 QAT Models: Bringing AI to Consumer GPUs With `ollama run gemma3:27b-it-qat "What is blue"`, GPU memory usage is just a hair over 20GB, so no, probably not without a nerfed context window

woadwarrior01 7 months ago [–]

Indeed, the default context length in ollama is a mere 2048 tokens.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact