This also isn't a straight either or proposition. I build local command line pip...

azylman · on Jan 18, 2015

What toolset are you using that you can run both locally and on a Hadoop cluster?

mdaniel · on Jan 18, 2015

Almost all of them?

The vocabulary of the grandparent comment implies they are using hadoop's streaming mode, and thus one can use a map-reduce streaming abstraction such as MRJob or just plain stdin/stdout; both will work locally and in cluster mode.

Or, if static typing is more agreeable to your development process, running hadoop in "single machine cluster" mode is relatively painless. The same goes for other distributed processing frameworks like Spark.

arjie · on Jan 18, 2015

I believe he mentioned it. The Hadoop streaming mode.