Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
jffry
7 months ago
|
parent
|
context
|
favorite
| on:
Gemma 3 QAT Models: Bringing AI to Consumer GPUs
With `ollama run gemma3:27b-it-qat "What is blue"`, GPU memory usage is just a hair over 20GB, so no, probably not without a nerfed context window
woadwarrior01
7 months ago
[–]
Indeed, the default context length in ollama is a mere 2048 tokens.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: