I was under the impression that Yugabyte requires signing a CLA to contribute which leads me to avoid it for fear of them relicensing the thing when the VC's start squeezing.
Also: very unique and single vendor driven. Seems like too much of a risk longer term but that is just my take.
EDIT: in response to your question I did run a PoC of it but it had issues where I wasn't able to create very large indexes without the statement timing out on me. Basic simple hand-benchmarking of complex joins on very large tables were very slow if they finished at all. I suppose systems like this and cockroach really need short, simple statements and high client-concurrency rather than large, complex queries.
That’s normal for building indices on large tables, regardless of the RDBMS. Increase the timeout, and build them with the CONCURRENTLY option.
> Query speed
Without knowing your schema and query I can’t say with any certainty, but it shouldn’t be dramatically slower than single-node Postgres, assuming your table statistics are accurate (have you run ANALYZE <table>?), necessary indices are in place, and there aren’t some horrendously wrong parameters set.
Not sure about the CLA process, but the database is already under a restrictive, proprietary license:
## Free Trial
Use to evaluate whether the software suits a particular
application for less than 32 consecutive calendar days, on
behalf of you or your company, is use for a permitted purpose.
It's not really clear what this means (what is a permitted purpose?), but it seems the intent is that after 32 days, you are expected to pay up. Or at least prepare for a future when the infrastructure to charge customers is in place (if it isn't there yet).
Thanks. I think that only covers the commercial bits they run themselves though:
"The entire database with all its features (including the enterprise ones) is licensed under the Apache License 2.0
The binaries that contain -managed in the artifact and help run a managed service are licensed under the Polyform Free Trial License 1.0.0."
Index creation should not be controlled by statement timeout, but backfill_index_client_rpc_timeout_ms which defaults to 24 hours. May have been lower in old versions
It seems cockroach got all the love here indeed. We use Yugabyte and we are happy with it; for our usecases it is a lot faster and easier to work with than cockroach.
I'm curious about this as well. I often see people talk about CockroachDB in production, but I don't think I've ever heard of anyone running Yugabyte. But it is definitely under active development.
I found two threads discussing it from the past year:
Yugabyte (as with CockroachDB and TiDB) is based on mapping relations to an LSM-tree-based KV store, where ranges of keys get mapped to different nodes managed through a Raft group. That kind of structure has very different performance characteristics compared to Postgres' page-based MVCC. In particular, LSM trees are not a free lunch.
Query execution is also very different when a table's data is spread over multiple nodes. For example, joins are done on the query executor side by executing remote scans against each participating storage node and then merging the results. That's always going to be slower than a system that already has all the data locally.
YB also lacks some index optimizations. There is some work to make bitmap index scans work in YB, which will give a huge performance boost to many queries, but it's incomplete. YB does have some optimizations (like loose index scans) that Postgres does not have. So it's probably fair to say that YB is probably a lot slower than PG for some things and a little faster at others.
I think it's fundamentally not a bad architecture, just different from Postgres. So even though they took the higher layers from Postgres, there's a whole bunch of rearchitecting needed in order to make the higher layers work with the lower ones. You do get some Postgres stuff for free, but I wonder if the amount of work here is worth it in the end. So much in Postgres makes the assumption of a local page heap.
What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions. YDB is on Postgres 12, which came out in 2019, and is slowly upgrading to 15, which came out 2022. By the time they've upgraded to 15, it will probably be 2-3 versions behind, and the work continues.
Worth noting: Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign.
Regarding the last point << Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign. >>
> What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions.
The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner.
When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.
I was referring to the effort by the developers to keep the forked codebase itself up to date with mainline. Isn't that the main hurdle?
My understanding is that you are patching a lot of core Postgres code rather than providing the functionality through any kind of plugin interface, so every time there is a major Postgres release, "rebasing" on top of it is a large effort.
That, to my knowledge, is why Greenplum fell behind so much. It took them four years to get from 9.6 to 12, and I believe that's where they are today.
Cutis is an extension. That's the best you can get by being outside the core db.
If you want true distributed architecture then you need to change the QO, DDL, transaction, even query stat components. At which point it ends up being a fork.
Yes, the merges are hard. But pg12 changed lots of fundamental things making it very challenging. Pg15 to pg17 should be much simpler.
Yugabyte is Postgres compatible, not actually Postgres.
It's also only compatible insofar that you can use only a subset of Postgres features, as they're only supporting the most basics things like select, views etc
Triggers, notifys etc were out of scope the last time I checked (which has admittedly been a while)
I only skimmed that blog post after a while because it reads like a pitch from an MBA targeted at venture capitalists.
I feel like thats not actually a distinction that matters to application developers, because they know thats just a technical detail that only concerns the developers of the database. Ultimately, all the compatibility has to be implemented in the storage engine. The fact that they're using Postgres's porcelain is surely a time saver for them, but of no consequence to the consumers/users of the database
YugabyteDB supports much more than basic things. I've been a 3+ years dev advocate for Yugabyte, and I've always seen triggers. LISTEN/NOTIFY is not yet there (it is an anti-pattern for horizontal scalability, but we will add it as some frameworks use it). Not yet 100% compatible, but there's no Distributed SQL with more PG compatibility. Many (Spanner, CRDB, DSQL) are only wire protocol + dialect. YugabyteDB runs Postgres code and provides the same behavior (locks, isolation levels, datatype arithmetic...)
One thing possibly holding some folks back is the version of Postgres it's held back to. Right now YDB has PostgreSQL 12 comparability. Support for PG15 is under active development, so hopefully it's a 2025 feature. I really wanted to be able to actually use YugabyteDB for once, but our developers reportedly are using PG15+ features.
Right now it's based on PostgreSQL 11.2 with some patches pulled from newer releases. The upgrade will be to PG 15, and includes work to make further PG upgrades easier (think online upgrading a cluster on postgresql major versions).
The best way I have found is to setup keepalived -> pgbouncer -> Postgres. Use repmgr to manage replication and barman for backups. Setup a VIP with keepalived with a small script that checks if the server is primary. You loose about 7-9 pings during a failover, have keepalived check about every 2 seconds and flip after 3 consecutive failures.
> If the PostgreSQL backend is cancelled while waiting to acknowledge replication (as a result of packet cancellation due to client timeout or backend failure) transaction changes become visible for other backends. Such changes are not yet replicated and may be lost in case of standby promotion.
This sounds like the two generals problem, which has no solution. But I may be misunderstanding.
"[The two generals problem] said that you can't achieve consensus (both safety and liveness at the same time), they did not say you have to sacrifice safety under message-losses or asynchrony conditions. So Paxos preserves safety under all conditions and achieves liveness when conditions improve outside the impossibility realm (less message losses, some timing assumptions start to hold)." http://muratbuffalo.blogspot.com/2010/10/paxos-taught.html
Wouldn't the simple fix be to delay backend (= one connection) closing until all pending replication it initiated is finished?
That still leaves actual crashes, which would need to use the shared memory to store the list of pending replications before the recovery of transactions is finished.
MySQL has had first class replication and failover built into it for years. You deploy a server, enable binlogs, clone the server (percona's xtranackup can live-clone a server locally or remotely) and start the new instance. Then you point the replica at the master using 'CHANGE MASTER TO <host> ...' and it starts pulling binlogs and applying them. On the master, you can repeat this last step, making the replica it's master. This means that you have multi-master replication. And it just works. There are some other tools you can use to detect failure of the current master and switch to another, that is up to you.
There are also solutions like MySQL Cluster and Galera which provide a more cluster-like solution with synchronous replication. If you've got a suitable use case (low writes, high reads and no gigantic transactions) this can work extremely well. You bootstrap a cluster on node 1, and new members automatically take a copy of the cluster data when they join. You can have 3 or 5 mode clusters, and reads are distributed across all nodes since it's synchronous. Beware though, operating one of these things requires care. I've seen people suffer read downtime or full cluster outages by doing operations without understanding how they work under the hood. And if you're cluster is hard-down, you need to pick the node with "most recent transaction" to re-bootstrap the cluster or you can lose transactions.
There's regular single primary/n replica replication built in. There's no built in automatic failover.
There's also Group replication built in. This can be either single primary/n replica with automatic election of a new primary during a failure, or it can be multi-primary.
Then there's Galera, which is similar to the multi-primary mode of Group replication.
I've been running some small galera clusters for close to a decade with very good results. It's multi-master, and mostly "just works", though once every year or three I have to go in and re-bootstrap the cluster from the node with the highest sequence number if all of the nodes have gone down for one reason or another. I have some ansible playbooks and scripts that help with it.
My use case is a fairly small database of email address mappings. I have a mail server that sends out emails on behalf of our users, using their outside e-mail address (gmail, yahoo, whathaveyou). In order to allow replies, but prevent being an open relay, I have a milter that creates a lookalike address, with a token in it ([email protected] -> [email protected]).
I store those mappings in the mysql database, less than 10K of those. So a trivial database, very high read to write ratio.
More recently, maybe 3-4 years ago, I added another couple tables that store all the logs. I have a job that reads the postfix logfiles and writes out all the log information so we can show users the status of every of their outgoing e-mails. That upped the amount of traffic to the database by quite a lot, but it's still pretty simple: basically no transactions, just simple single insert statements, and IIRC one overview table that gets some updates, then a clean up job to delete them after 30-90 days.
Galera has been a real workhorse.
For a while I was going to go with cockroachdb, and I set up a POC. It was pretty robust with setting it up and clustering, I never ran into having to re-bootstrap the couster. But, at the time, postfix couldn't write directly to cockroachdb because cockroach could only do UTF-8, and the postfix pgsql code would only do latin-1 or something. This has changed since and I've thought about switching back to cockroach, but I hear there are some licensing issues I may have to be aware of.
We use proxysql (https://proxysql.com) which works very well. We have not seen any downtime for years. We wrote our own master promotion code a very long time ago; it has proven to be very robust.
Writing Clojure without spaghetti isn't too hard, and definitely more practical than waiting for a Jepsen alternative to come along.
The Jepsen author gave a great talk on all the performance engineering work that has gone into it, Jepsen is near enough an entire DBMS in its own right https://www.youtube.com/watch?v=EUdhyAdYfpA
It seems like a compelling option:
* Much closer to Postgres compatibility than CockroachDB.
* A more permissive license.
* Built-in connection manager [1], which should simplify deployment.
* Supports both high availability and geo-distribution, which is useful if scaling globally becomes necessary later.
That said, I don't see it mentioned around here often. I wonder if anyone here has tried it and can comment on it.
--
1: https://docs.yugabyte.com/preview/explore/going-beyond-sql/c...