Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People make the mistake of thinking it's a single 100GB dataset.

When it's more common to be manipulating lots of smaller datasets together in a way where you need to have more than 100GB of disk space. And in this situation you really need Spark.




Can you elaborate? I had no problem with DuckDB and a few TB of data. But it depends on what exactly you do with it, of course.


few hundred TBs here - no issues :)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: