People make the mistake of thinking it's a single 100GB dataset. When it's more ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

threeseed 7 months ago | parent | context | favorite | on: Should you ditch Spark for DuckDB or Polars?

People make the mistake of thinking it's a single 100GB dataset.

When it's more common to be manipulating lots of smaller datasets together in a way where you need to have more than 100GB of disk space. And in this situation you really need Spark.

tomthe 7 months ago [–]

Can you elaborate? I had no problem with DuckDB and a few TB of data. But it depends on what exactly you do with it, of course.

bdangubic 7 months ago | [–]

few hundred TBs here - no issues :)

Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact