Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A database is probably better if the schema doesn't evolve and the present state of the data fits neatly into a relational schema, because then you are just extending it with versioning and the transformation to a version tracking database is fairly mechanical.

But if the data is already in a textual format where [or which can be canonicalized so that] each line is an atomic unit, Git + existing common diff tools get you a lot with very little work and are resilient to schema changes.

You could also reduce the data to EAV triples and do a change-tracking DB for that, wwhich would get you immunity to schema changes at the cost of losing all the value that a tracking DB has for known-schema data.

So, really, I'd say Git is the easy & general solution, though a DB might be worth the extra work if the schema was known to be fixed.



I’ve put a git-scraped history of 12 years of the FAA registration database into an EAVT database (Datomic) and it worked reasonably well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: