If you have 20TB of data on a single machine, you're better off with just Postgr...

dsacco · on May 23, 2017

Okay, why? If you can sell me on that I'd be eager to change my workflow.

For reference - this is entirely timeseries financial data and PyTables. For basically everything else I use postgres.

dintech · on May 24, 2017

Definitely stick with HDF5 and Python for what you're doing. Postgress doesn't lend itself well to timeseries joins and queries in the same way that a more time series specific database like KDB+ would. The end result is most likely that you'd be bringing the data from a database into python anyway, probably caching in HDF5 before using using whatever python libs you want to use. You could alternatively bring your code/logic to the data using Q in KDB+, but there will be a learning curve and you will have to code for yourself a lot of functionality that just isn't available in library form. The performance will be a lot better though.