Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the point is if you really have big data then it makes sense, but many shops add huge cost and complexity to projects where simpler tools would be more than adequate.


It's the tooling, not the size of the data. Using "big data ecosystem" tools allows you to use all kinds of useful things like Airflow for pipeline processing, Presto to query the data, Spark for enrichment and machine learning etc... all of that without moving the data, which simplifies greatly metadata management which has to be done if you're serious about things like data provenance and quality.


A SQL db + Tableau are vastly more powerful and mature than those tools, they just can't do "big data", that's all.


the data preparation work involved in doing anything trivial with that setup is mind boggling. I'm going to assume that this very cavalier assumption comes from a place of relatively little knowledge. Do some of this and you'll realize very soon you'd rather shoot yourself than doing complex data prep on constantly evolving schema using schema-on-write tools.


...and how often do you need to do all that to a dataset?


That's definitely true too. Being able to accurately assess whether that need will ever exist (or not) early on is invaluable.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: