Hacker News new | past | comments | ask | show | jobs | submit login

If you have 20TB of data on a single machine, you're better off with just Postgres 90% of the time. If you predict you're going have more data to fit on a single machine by the end of the year, then it makes sense to invest in distributed systems.



Okay, why? If you can sell me on that I'd be eager to change my workflow.

For reference - this is entirely timeseries financial data and PyTables. For basically everything else I use postgres.


Definitely stick with HDF5 and Python for what you're doing. Postgress doesn't lend itself well to timeseries joins and queries in the same way that a more time series specific database like KDB+ would. The end result is most likely that you'd be bringing the data from a database into python anyway, probably caching in HDF5 before using using whatever python libs you want to use. You could alternatively bring your code/logic to the data using Q in KDB+, but there will be a learning curve and you will have to code for yourself a lot of functionality that just isn't available in library form. The performance will be a lot better though.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: