When working on a new app I don't always know the exact DB schemas I need. Maybe my mind is mush, but I have a feeling lots of devs deal with this. What's your preferred way of working in this scenario? Use NoSQL and just throw everything into objects? Just write a bunch of `ALTER TABLE` commands? Let an ORM handle it?
Well first of all, I agree with a book called Databases Demystified that recommends you sort out your user interface before you make your database. To a programmer that might sound backwards, because isn't the database the foundation of your interface? The reason it works is that your user interface helps you decide what is important to store. There are so many facts you could store about your subject, and if you make your database first, you likely will choose the wrong facts or format them in the wrong way.
After that, I don't change my database much. But when I do, I prefer vi and psql:
> vi tables.sql
begin;
drop schema stuff cascade;
create schema stuff;
set search_path to stuff;
create table blah (
id serial primary key,
name text unique not null,
kind int not null references kinds
on update cascade,
color int not null references colors
on update cascade
note text
);
rollback;
> psql
psql=> \i tables.sql # yay, no errors
psql=> \q
> vi tables.sql # change rollback to commit
> psql
psql=> \i tables.sql # table created
Whenever I want to make a change, I use vi to edit tables.sql. Then I run psql and its \i command to run the stuff in the file --- which drops the schema and all its tables, and recreates everything again. I don't know if any other database treats schemas as Postgres does though, as folders for tables.
By the way, you only want to do this on your test database, while making it, before users start using it, because it drops all the tables and their data. After you've gone live, it's alter table, baby.
Even after go-live this technique is useful for idempotent commands like creating views and functions. But instead of the drop-schema-cascade command, you have to put commands at the top to drop each of the views or functions, usually in the opposite order you create them, in case there are any dependencies.
If you like GUIs or are new to SQL, this way is daunting. I think it is the happiest way long term, though.
I would say just the opposite. Model your domain first. Why tie your database to your user interface? Your user interface may change. You may write a separate API to be consumed by another team, etc.
This is the answer that I would give. Over time, my development has shifted to modeling the domain first, including simple APIs that use those domain models well before I intend to write a table.
To answer the question as asked, I use pgadmin or the corresponding database management tool to update the database. My first actual migration is a stable version of the database.
100% right. Trying to design the perfect database schema and then building the website is a trap almost all intermediate programmers fall for at least once. To add to the above, it is also a good idea to have the drop-and-recreate database script insert dummy data. There are modules for all major web frameworks for generating data like that. It allows you to quickly get a rough idea about whether the ui design will work or not.
Never use an ORM that wants you to define your schema anywhere except the database. Never use NoSQL just to avoid writing SQL. (There are lots of valid reasons to use NoSQL, but that is not one of them).
First week or so: Change the DB as much as you like, directly, using the visual tools or create/alter statements as you see fit.
Once you're ready to do things right: Use change scripts. Save every create/alter statement, as well as data population and modification queries into a .sql file in a directory, named sensibly so that you can run them in order to exactly reproduce your schema at any point, on any machine.
Important to note: This is not hard. SQL is not hard. Avoid tooling written by people who think SQL is hard, that therefore build elaborate workflows that are in fact hard in an effort to shield you from having to touch the database.
He never said anything about SQL being hard. In fact, I think MongoQuery is much more obtuse than SQL.
That being said, when you don't know what your schema is going to look like, why not use a Nosql solution - either a pure nosql solution or the JSON support built into modern databases? If you write your code correctly and have one module retrieve for one domaim, you should be able to isolate changes.
But, if I do have to use Sql, I agree, put all of your changes in sql files numbered and have a "database version" and find/create tooling that can automate running the Sql scripts and run it as part of your CI/CD.
when you don't know what your schema is going to look like, why not use a Nosql solution
Because one day soon, you will know what your schema is going to look like. And the next 10 years of your business will be a lot more pleasant had you optimized for them rather than the first week of greenfield development.
And there are all sorts of well known solutions that let you create separate solutions for OLTP and OLAP. You don't have to have one type of data store to rule them all.
That makes it a lot harder to deploy and rollback. Your business logic is in two separate places - the stored procedures and the code. Then you see stuff like
Not sure what you mean about deploy/rollback difficulties. It is of course something to consider, but generally speaking modern rdbms have the most sophisticated tooling around this in the ecosystem.
Versioning and compatibility are of course concerns but that is true of any scheme you come up with.
If you think you haven’t exported “business logic” to your data store in any case you are wrong.
If all of your business logic is in your code, there is one source of truth. I know when I deploy code from a certain branch or tag, all of the logic is right there.
My build process creates an artifact and that artifact gets deployed through the entire pipeline - Dev/Qa/UAT/Prod. There is one source of truth.
But when half your code is in stored procedures and the other half is in code, how do you keep them in sync? If I need to redeploy an older version, it's a simple matter of redeploying the artifact from v1. How do you make sure the code stays in sync with the stored procedure?
I can honestly say, that any system I've designed from scratch has no business logic in the data store.
Do you have types in your schema? That’s a business rule. Do you have columns, business rule.
Those might be business rules you are ok with having in your data store, but they are a point of sync between your code and the store that you have to manage.
If your stored procedures are their to abstract your table structure for write operations they act as a way to make that sync easier not harder.
How often do types change? Changing schema usually involves adding columns not deleting them. I'm not aware of a realistic scenario where adding a column isn't backwards compatible.
As far as writing your domain model to a relational schema, you should be doing that as a single module (in process) or micro service (out of process) that all other code calls. Meaning changing the representation of a domain model should only require a change in one place. For a monolith it means redeploying one artifact. For a microservice, it means deploying that microservice.
Even better with a nosql solution like Mongo, if you are using a strongly typed language. Your domain model is your schema. You change your schema and the data model changes.
With some ORMs you can just change your object and it creates alter table statements for you - I personally don't trust them but it is an alternative.
After that, I don't change my database much. But when I do, I prefer vi and psql:
Whenever I want to make a change, I use vi to edit tables.sql. Then I run psql and its \i command to run the stuff in the file --- which drops the schema and all its tables, and recreates everything again. I don't know if any other database treats schemas as Postgres does though, as folders for tables.By the way, you only want to do this on your test database, while making it, before users start using it, because it drops all the tables and their data. After you've gone live, it's alter table, baby.
Even after go-live this technique is useful for idempotent commands like creating views and functions. But instead of the drop-schema-cascade command, you have to put commands at the top to drop each of the views or functions, usually in the opposite order you create them, in case there are any dependencies.
If you like GUIs or are new to SQL, this way is daunting. I think it is the happiest way long term, though.