Yes, this and also parallelizing disk I/O. For example you could fit a 5TB table...

727374 on May 23, 2017 | parent | context | favorite | on: Don't use Hadoop when your data isn't that big (20...

Yes, this and also parallelizing disk I/O. For example you could fit a 5TB table on a single machine, but if you have an operation that requires doing a full scan (e.g. uniqueness count over arbitrary dates), that will take a very long time on one disk. Yes you could partition into multiple disks, but Hadoop offers a nice generalized solution.