Agreed with most of this but I’m skeptical of the rsc.io/script dsl approach. I’...

RSHEPP · on Dec 6, 2023

Just helped a junior setup dockertest on a new service, interested in checking this out. Are you familiar with test containers or dockertest? How would you compare pgtestdb. We use postgres/pgx.

peterldowns · on Dec 6, 2023

Great question! pgtestdb requires that you somehow run a postgres server. It then connects to that server and handles all of the database creation. The README goes into more detail here, but basically you should be easily able to point pgtestdb at a dockertest-managed postgres server.

One massive advantage of pgtestdb compared to using dockertest on its own is that it handles running migrations for you in a very efficient way that amortizes the cost to zero as the number of tests trends to infinity. This should be much faster than naively creating a new database and re-running the migrations for each test.

I happen to recommend using docker-compose over dockertest out of some slight personal preference that is one part separation of concerns (it's weird to me to have the test code handle the state of the postgres container), one part developer-experience related (it's nice for devs to be able to easily manage their postgres server instance with existing docker tools outside of golang code), one part infra related (it's nice to be able to use any method to provide and manage the postgres server, particularly in CI), and one part totally arbitrary.

RSHEPP · on Dec 6, 2023

Yeah I think we will give it a try today. We have a mix of devs using devcontainers, runnning pg themselves or working in cloud environment. But they all have a pg instance running already.

For migrations we have a built in house tool, but looks like we just need to satisfy the interface and we should be good to use our tool. Any tips on writing a migrator that won't be obvious?

Yeah, in some cases (devcontainer) we are using docker-compose to for the dev/pg/other systems and dockertest with docker in docker, to spin up the ephemeral pg instance for tests. It works fine, but is a little bit more complicated than I would like it.

peterldowns · on Dec 6, 2023

I’m happy to hear you’re gonna give it a try! No special tips beyond looking at the existing docs and migrators to see what they do. If you’re stuck please file an issue, I can help debug. I’d really like to know what works and what doesn’t. (It would also be nice to just hear that it all works!)

ebabani · on Dec 6, 2023

That project looks interesting. I'm using test containers and I'm thinking it could be combined. Currently we're setting up the db container for the test suite, running migrations once then having each test run in a separate transaction, and rolling back after the test completes.

Want to see if there's any speed improvements and better separation if we switch to pgtestdb. The transaction approach has worked well but on some larger integration tests I have seen some conflicts due to our test fixture ids.

peterldowns · on Dec 6, 2023

pgtestdb should work for your use case, and because each test gets its own database, you shouldn't run into any conflicts that are caused by different tests each trying to write data with the same ID. If it doesn't, please let me know and/or file a bug in Github.

One other nice thing about pgtestdb is that you can run all your tests in parallel, and if a test fails the related db will be left up for you to connect to via psql (helps a lot with debugging.) Compared to your transaction setup, the only data in the db will be from the test that failed.

anitil · on Dec 6, 2023

I had never heard of template databases before, and I wish I had - this would have saved me an enormous amount of time late last year!

peterldowns · on Dec 6, 2023

You’re not alone — I am sure other people have done this before me but I haven’t ever seen it published, and I rediscovered this technique independently. The logic behind this library has been ported to both scala and typescript without issue, I hope you and others can benefit from the idea even if you don’t use my implementation.

anitil · on Dec 7, 2023

It's really great. It reminds me of using overlay filesystems for tests so that you can maintain a clean read-only template and then run all your tests non-destructively on the overlay.

Fire-Dragon-DoL · on Dec 11, 2023

This seems amazing, I will try this

fierro · on Dec 6, 2023

pgtestdb looks great. Looks like it leverages Postgres template DBs. We use MySQL more often than Postgres; I may take a stab at creating something similar for MySQL. Nicely done!

arp242 · on Dec 6, 2023

I haven't really been able to find a way that works well on MySQL (well, MariaDB). Transactions are too unreliable and magical in MariaDB, Memory databases/tables have all sorts of caveats and limitations, and there isn't really anything like PostgreSQL's schemas or templates.

The best I could come up with was to run MariaDB server with libeatmydata, which I believe was pretty much intended for this. It's not very "works on any MariaDB server"-generic, but it's good enough.

fierro · on Dec 6, 2023

dang. Only thing I've found so far is a `mysqldump --no-data` and then a restore from dump. Not fast at all. Maybe you could pre-provision a few thousand DBs this way in parallel and write a service to hand out DB handles...

evanelias · on Dec 7, 2023

Using tmpfs for MySQL/MariaDB's data directory helps tremendously. If you're using Docker natively on Linux, use `docker run --tmpfs /var/lib/mysql ...` and that'll do the trick. Only downside is each container restart is slightly slower due to having to re-init the database instance from scratch.

Tuning the database server settings can help a lot too. You can add overrides to the very end of your `docker run` command-line, so that they get sent as command-line args to the database server. For example, use --skip-performance-schema to avoid the overhead of performance_schema if you don't need it in your test/CI environment.

For MySQL 8 in particular, I've found a few additional options tend to help: --skip-log-bin --skip-innodb-adaptive-hash-index --skip-innodb-log-writer-threads

(Those don't help on MariaDB, since it already defaults to disabling the binary log and adaptive hash index; and it doesn't have separate log writer threads.)

A lot of other options may be workload-specific. My product Skeema [1] can optionally use ephemeral containerized databases [2] for testing DDL and linting database objects, so its workload is very DDL-heavy, which means the settings can be tuned pretty differently than a typical DML-based workload.

[1] https://github.com/skeema/skeema/

[2] https://www.skeema.io/docs/options/#workspace