Hope the parent comment gets more visibility. In addition to parallelizing IO an...

virtuabhi on May 23, 2017 | parent | context | favorite | on: Don't use Hadoop when your data isn't that big (20...

Hope the parent comment gets more visibility. In addition to parallelizing IO and automatic management of failures, Hadoop also provides hooks to implement complex data partitioning schemes - think dynamic range partitoning, compound group partitioning, etc. Unix tools, MPI, scatter-gather are convenient only for embarrassingly parallel jobs.