Hacker News new | past | comments | ask | show | jobs | submit login

Longer context is definitely slower, especially for local models. Hosted models running on who knows what kind of overpowered hardware can crunch through them pretty fast though. There's also token caching available for OpenAI, Anthropic, Gemini and DeepSeek which can dramatically speed up processing of long context prompts if they've been previously cached.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: