This is mostly correct, but it's worth mentioning that cloudberry substantially predates Greenplum going closed source. It just got quite a boost from that change happening. Different dev team too, afaik none of the original Greenplum team was involved with Cloudberry until very recently.
Also, Greenplum 7 tracks postgres 14. Which is still old at this point, but not so bad as 12....
I also don't think I'd call the architecture ancient. Just very tightly coupled to postgres' own (as a fork of postgres that tries to ingest new versions from upstream every year or two) and paying the overhead of that choice in the modern landscape.
Source: former member of the Greenplum Kernel team.
Thanks for the context. In what way would you say Cloudberry lags behind Greenplum technology-wise? I see newer Greenplum versions have a lot of planner improvements.
Greenplum 7 is listed as tracking Postgres 12 in the release announcement [1], and the release notes for later 7.x versions don't mention anything. Is there a newer release with higher compatibility?
When I say ancient, I mean that it's a "classical" shared-nothing design where the database is partitioned and hosted as parallel, self-contained replica servers, where each node runs as a shard that could, in theory, by queried independently of the master database. This is in contrast to newer architectures where data is sharded at the heap level (e.g. Yugabyte, CockroachDB) and/or compute is separated from data (e.g. Aurora, ClickHouse, Neon, TiDB).
Cloudberry, last I checked, took their snapshot of all the Greenplum utilities way before the repos got archived and development went private. The backup/restore, DR, Upgrade, and other such seem to leave a lot on the table. I haven't checked in a bit, it's possible they've picked back up some of that progress.
You're completely right, I had the wrong PG version in my memory. Embarrassing, thanks for catching that.
All the Greenplum utilities you mentioned here are also open-sourced and available for Cloudberry, but some of them are not in the main repo of Apache Cloudberry (This is more a matter of adhering to the Apache Software Foundation's regulations than a technical limitation).
Here is the unofficial roadmap of Cloudberry:
1. Continuously upgrading the PostgreSQL core version, maintaining compatibility with Greenplum Database, and strengthening the product's stability.
2. End-to-end performance optimization to support near real-time analytics, including streaming ingestion, vectorized batch processing, JIT compilation, incremental materialized views, PAX storage format, etc.
3. Supporting lakehouse applications by fully integrating open data lake table formats represented by Apache Iceberg, Hudi, and Delta Lake.
4. Gradually transforming Cloudberry Database into a data foundation supporting AI/ML applications, based on Directory Table, pgvector, and PostgresML.
Delighted to see Greenplum mentioned in this article, also equally pleased to see Apache Cloudberry mentioned in the comments. Greenplum has been open-source for nearly a decade, forming a fairly mature global open-source ecosystem, with many core developers distributed around the world ( they were not necessarily hired by Pivotal/VMware/Broadcom). Greenplum forked as Cloudberry wasn't to outdo Greenplum Database, but to foster a more neutral and open community around an MPP database with a substantial global following. To that end, the project was donated to the Apache Software Foundation following Greenplum's decision to close source. Since the project is in its early stages within the Apache incubator, our immediate goal is to build a solid foundation that adheres to Apache standards. Instead of introducing extensive new features, we are concentrating on developing a stable and compatible open-source alternative to Greenplum.
Also, Greenplum 7 tracks postgres 14. Which is still old at this point, but not so bad as 12....
I also don't think I'd call the architecture ancient. Just very tightly coupled to postgres' own (as a fork of postgres that tries to ingest new versions from upstream every year or two) and paying the overhead of that choice in the modern landscape.
Source: former member of the Greenplum Kernel team.