mjarrett's comments

mjarrett · on Nov 19, 2024

What kinds of SQL queries could ClickHouse not handle? Were the limitations about expressivity of queries, performance, or something else? I'm considering using CH for storing observability (particularly tracing) data, so I'm curious about any footguns or other reasons it wouldn't be a good fit.

BiteCode_dev · on Nov 19, 2024

I'm editing the transcript right now, and he says it's more about exposing a nice API to the user.

E.G: Clickhouse interval support, which is an important type for observability, was lacking. You couldn't subtract datetimes to get an interval. If you'd compared 2 milliseconds intervals to one second ones, it wouldn't look at the unit and would say 2 ms is bigger, etc. So he had to go to the dev team, and after enough back and forth, instead of fixing it, they decided to return an error and he had to insist for a long time until they actually implemented a proper solution.

Quoting him "But like these endless issues with ClickHouse's flavor of SQL were problematic."

Another problem seemed to be that to benefit from very big scaling with things like data in Parquet at rest + local cache meant basically leaking all your money to AWS because the self-hosted version didn't expose a way to do that yourself. Click house scales fine at my size, so I can only trust him on that front since I'm nowhere that big.

Funnily after that, they moved to TimeScale, and the perfs wouldn't work for their use case.

They landed on DataFusion after a lot of trials and errors.

But really interesting perspective on the whole thing, you can see he is kinda obsessed with the user experience. The guy wrote a popular marshmallow alternative, 2 popular celery alternative and one watchdog popular alternative, all FOSS.

These kind of people are the source of all imposter syndrome in the world.

I'll publish that video next week on Bite Code if I can. If I can't, it will have to wait 3 weeks cause I'm leaving for a bit. But Charlie Marsh's one (uv's author) is up, if you are into overachievers.

gazpacho · on Nov 26, 2024

One of the devs working on Logfire here. Part of it was the level of support. Like Samuel said the ClickHouse folks were not receptive to bug reports. The Timescale team is leagues ahead in that sense, they’re super responsive and helpful. Ultimately one of the reasons for choosing DataFusion was that it’s much more approachable of a project and indeed we’ve already gotten tremendous bidirectional benefit: the DataFusion team has helped us figure out some complex bits and we’ve done significant upstream contributions. By the way, DataFusion is now the fastest single node query engine on ClickBench: https://datafusion.apache.org/blog/2024/11/18/datafusion-fas...

Another reason we use DataFusion is multi-tenancy: we found it was hard to use RLS and such to implement multi-tenancy. We’ve had much better luck with the extensibility of DataFusion.

mjarrett · on Oct 20, 2024

Isn't git-worktree designed for this kind of situation, where you could open each PR or branch in its own directory and switch your editor / other tooling between workspaces without losing state between them?

JTyQZSnP3cQGa8B · on Oct 20, 2024

You can already do that with git branches. Worktrees is a cleaner way to share history but it’s irrelevant to merge conflicts and multiple PRs.

MarceColl · on Oct 20, 2024

yeah, you could do this, but it gets kind of annoying in my experience. However it is another solution, most of what you do in jj you can do in git, its just more convenient

mjarrett · on July 25, 2024

Hence, Objective Caml! This can be modeled in OCaml as an object type,

  type foo_bar = < _foo : int; _bar : string >

For a family of types matching any object with those methods, I think you can write something like

  type 'a has_foo_bar = < _foo : int; _bar : string; .. > as 'a

mjarrett · on April 8, 2024

One of my favorite features in OCaml is labeled arguments. You get a similar flavor of partial application, but without the strict argument order requirement (most of the time -- higher order functions are strict about labeled-function arguments).

mjarrett · on April 24, 2023

> No complex dev setup required, or RPC framework needed, it’s the same old monolith, just operated differently.

It seems to me like "operated differently" is doing a lot of heavy lifting that often involves those same frameworks or dev/testing environments. If a monolith used to communicate between workloads in-process, now there needs to be some way for those workloads to communicate between processes, and it has to continue to work in dev. The example in the article mentions roundtripping through something like Redis or Postgres, but now doesn't your dev environment need to spin up a database? What if the communication pattern isn't a shared cache, but instead a direct function call that, say, spins up a new goroutine to process some work asynchronously? Now you need to operate and maintain either an RPC protocol or an event queue, and make sure those things can be started up and discoverable during testing and local development.

mattmanser · on April 24, 2023

It's pretty normal to have a local dev database on your machine. It's how we did it for decades. Are there really developers in the wild now that have never been exposed to working with a locally installed SQL instance?

These days you can even set it up in docker, automatically migrate the database to the latest version and install a bunch of test data. So you can wipe the whole thing and go back to a good state if you muck it up with a bad or ill-thought-out local migration, etc.

Same with redis, etc.

And it's still much, much simpler than a microservice architecture.

nickdandakis · on April 24, 2023

You should already be spinning up caches and databases in dev anyway?

I agree though that the article is missing some explicit insight into how this change is handled on the local dev environment. I'm assuming the local dev environment run commands were also updated to be these three commands, one per workload.

Basically, this distinction should be represented throughout all environments, dev/test/prod