Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, this and also parallelizing disk I/O. For example you could fit a 5TB table on a single machine, but if you have an operation that requires doing a full scan (e.g. uniqueness count over arbitrary dates), that will take a very long time on one disk. Yes you could partition into multiple disks, but Hadoop offers a nice generalized solution.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: