Ask HN: How do you iterate quickly on table schemas?

combatentropy · on April 19, 2018

Well first of all, I agree with a book called Databases Demystified that recommends you sort out your user interface before you make your database. To a programmer that might sound backwards, because isn't the database the foundation of your interface? The reason it works is that your user interface helps you decide what is important to store. There are so many facts you could store about your subject, and if you make your database first, you likely will choose the wrong facts or format them in the wrong way.

After that, I don't change my database much. But when I do, I prefer vi and psql:

  > vi tables.sql
  
  begin;

  drop schema stuff cascade;
  create schema stuff;
  set search_path to stuff;
  
  create table blah (
      id serial primary key,
      name text unique not null,
      kind int not null references kinds
         on update cascade,
      color int not null references colors
         on update cascade
      note text
  );

  rollback;

  > psql
  psql=> \i tables.sql  # yay, no errors
  psql=> \q
  > vi tables.sql  # change rollback to commit
  > psql
  psql=> \i tables.sql  # table created

Whenever I want to make a change, I use vi to edit tables.sql. Then I run psql and its \i command to run the stuff in the file --- which drops the schema and all its tables, and recreates everything again. I don't know if any other database treats schemas as Postgres does though, as folders for tables.

By the way, you only want to do this on your test database, while making it, before users start using it, because it drops all the tables and their data. After you've gone live, it's alter table, baby.

Even after go-live this technique is useful for idempotent commands like creating views and functions. But instead of the drop-schema-cascade command, you have to put commands at the top to drop each of the views or functions, usually in the opposite order you create them, in case there are any dependencies.

If you like GUIs or are new to SQL, this way is daunting. I think it is the happiest way long term, though.

scarface74 · on April 19, 2018

I would say just the opposite. Model your domain first. Why tie your database to your user interface? Your user interface may change. You may write a separate API to be consumed by another team, etc.

cpersona · on April 19, 2018

This is the answer that I would give. Over time, my development has shifted to modeling the domain first, including simple APIs that use those domain models well before I intend to write a table.

To answer the question as asked, I use pgadmin or the corresponding database management tool to update the database. My first actual migration is a stable version of the database.

bjourne · on April 19, 2018

100% right. Trying to design the perfect database schema and then building the website is a trap almost all intermediate programmers fall for at least once. To add to the above, it is also a good idea to have the drop-and-recreate database script insert dummy data. There are modules for all major web frameworks for generating data like that. It allows you to quickly get a rough idea about whether the ui design will work or not.

tedmiston · on April 19, 2018

ORM and auto migration files. This is super easy in Django for instance, just:

    python manage.py makemigrations
    (commit them)
    python manage.py migrate

https://docs.djangoproject.com/en/2.0/topics/migrations/

jasonkester · on April 19, 2018

Never use an ORM that wants you to define your schema anywhere except the database. Never use NoSQL just to avoid writing SQL. (There are lots of valid reasons to use NoSQL, but that is not one of them).

First week or so: Change the DB as much as you like, directly, using the visual tools or create/alter statements as you see fit.

Once you're ready to do things right: Use change scripts. Save every create/alter statement, as well as data population and modification queries into a .sql file in a directory, named sensibly so that you can run them in order to exactly reproduce your schema at any point, on any machine.

Important to note: This is not hard. SQL is not hard. Avoid tooling written by people who think SQL is hard, that therefore build elaborate workflows that are in fact hard in an effort to shield you from having to touch the database.

scarface74 · on April 19, 2018

He never said anything about SQL being hard. In fact, I think MongoQuery is much more obtuse than SQL.

That being said, when you don't know what your schema is going to look like, why not use a Nosql solution - either a pure nosql solution or the JSON support built into modern databases? If you write your code correctly and have one module retrieve for one domaim, you should be able to isolate changes.

But, if I do have to use Sql, I agree, put all of your changes in sql files numbered and have a "database version" and find/create tooling that can automate running the Sql scripts and run it as part of your CI/CD.

jasonkester · on April 19, 2018

when you don't know what your schema is going to look like, why not use a Nosql solution

Because one day soon, you will know what your schema is going to look like. And the next 10 years of your business will be a lot more pleasant had you optimized for them rather than the first week of greenfield development.

scarface74 · on April 19, 2018

And there are all sorts of well known solutions that let you create separate solutions for OLTP and OLAP. You don't have to have one type of data store to rule them all.

kasey_junk · on April 19, 2018

Put reads behind a view, put writes behind a stored procedure.

scarface74 · on April 19, 2018

That makes it a lot harder to deploy and rollback. Your business logic is in two separate places - the stored procedures and the code. Then you see stuff like

UpdateCustomer_1

UpdateCustomer_2

kasey_junk · on April 20, 2018

Not sure what you mean about deploy/rollback difficulties. It is of course something to consider, but generally speaking modern rdbms have the most sophisticated tooling around this in the ecosystem.

Versioning and compatibility are of course concerns but that is true of any scheme you come up with.

If you think you haven’t exported “business logic” to your data store in any case you are wrong.

scarface74 · on April 20, 2018

If all of your business logic is in your code, there is one source of truth. I know when I deploy code from a certain branch or tag, all of the logic is right there.

My build process creates an artifact and that artifact gets deployed through the entire pipeline - Dev/Qa/UAT/Prod. There is one source of truth.

But when half your code is in stored procedures and the other half is in code, how do you keep them in sync? If I need to redeploy an older version, it's a simple matter of redeploying the artifact from v1. How do you make sure the code stays in sync with the stored procedure?

I can honestly say, that any system I've designed from scratch has no business logic in the data store.

kasey_junk · on April 20, 2018

Do you have types in your schema? That’s a business rule. Do you have columns, business rule.

Those might be business rules you are ok with having in your data store, but they are a point of sync between your code and the store that you have to manage.

If your stored procedures are their to abstract your table structure for write operations they act as a way to make that sync easier not harder.

scarface74 · on April 20, 2018

How often do types change? Changing schema usually involves adding columns not deleting them. I'm not aware of a realistic scenario where adding a column isn't backwards compatible.

As far as writing your domain model to a relational schema, you should be doing that as a single module (in process) or micro service (out of process) that all other code calls. Meaning changing the representation of a domain model should only require a change in one place. For a monolith it means redeploying one artifact. For a microservice, it means deploying that microservice.

Even better with a nosql solution like Mongo, if you are using a strongly typed language. Your domain model is your schema. You change your schema and the data model changes.

With some ORMs you can just change your object and it creates alter table statements for you - I personally don't trust them but it is an alternative.

tytytytytytytyt · on April 19, 2018

What's wrong with alter table? It's not like you have to manually re-enter rows. And this is during development, right?

dhruvkar · on April 19, 2018

Just adding that I feel this as well and would love to know the same.