yeah I mean that's basically what Javi talks about in the post... if you can throw hardware at it you can scale it (ingestion scales linearly with shards)
but the post has some interesting thoughts on how you do the high-scale ingestion while also handling background merge processes, reads, etc.
According to our SQL Generation Benchmark (methodology linked in the results dash), Claude Opus 4 is the best of the popular models at SQL generation by a pretty decent margin.
Yeah I mean SQL is pretty nuanced - one of the things we want to improve in the benchmark is how we measure "success", in the sense that multiple correct SQL results can look structurally dissimilar while semantically answering the prompt.
i pay for claude premium but actually use grok quite a bit, the 'think' function usually gets me where i want more often than not. odd you don't have any xAI models listed. sure grok is a terrible name but it surprises me more often. i have not tried the $250 chatgpt model yet though, just don't like openAI practices lately.
Not saying you're wrong about "OpenAI practices", but that's kind of a strange thing to complain about right after praising an LLM that was only recently inserting claims of "white genocide" into every other response.
Even if you don't care about racial politics, or even good-vs-evil or legal-vs-criminal, the fact that that entire LLM got (obviously, and ineptly) tuned to the whim of one rich individual — even if he wasn't as creepy as he is — should be a deal-breaker, shouldn't it?
Just curious, how do you know your questions and the SQL aren't in the LLM training data? Looks like the benchmark questions w/SQL are online (https://ghe.clickhouse.tech/).
Actually no, we have it up to 3 attempts. In fact, Opus 4 failed on 36/50 tests on the first attempt, but it was REALLY good at nailing the second attempt after receiving error feedback.
No it's definitely interesting. It suggests that Opus 4 actually failed to write proper syntax on the first attempt, but given feedback it absolutely nailed the 2nd attempt. My takeaway is that this is great for peer-coding workflows - less "FIX IT CLAUDE"
yeah this was a surprising result. of course, bear in mind that testing an LLM on SQL generation is pretty nuanced, so take everything with a grain of salt :)
Since MCP Servers are installed locally, it can be a bit painful to log and analyze usage of MCP Servers you built. My coworker built a utility to capture remote logging events from our MCP Server, could be extended to any MCP Server. Free to use, easy to set up. It uses Tinybird to capture events + generate Prometheus endpoints for Grafana, Datadog, etc.
I think Tinybird is a nice option here. It's sort of a managed service for ClickHouse with some other nice abstractions. For your streaming case, they have an HTTP endpoint that you can stream to that accepts up to 1k EPS and you can micro-batch events if you need to send more events than that. They also have some good connectors for BigQuery, Snowflake, DynamoDB, etc.
but the post has some interesting thoughts on how you do the high-scale ingestion while also handling background merge processes, reads, etc.