For "medium data", my company has found a lot of success using dask [0], which mimics the pandas API, but can scale across multiple cores or machine.
The community around dask is quite active and there's solid documentation to help learn the library. I cannot recommend dask enough for medium data projects for people who want to use python.
They have a great run down of dask vs pyspark [1] to help you understand why'd you use it.
I've been trying to change all of my Luigi pipeline tasks from using Pandas to Dask, so that I can push a lot more data through. Seems like an easy process so far, and I like the easy implementation of parallel computing.
The community around dask is quite active and there's solid documentation to help learn the library. I cannot recommend dask enough for medium data projects for people who want to use python.
They have a great run down of dask vs pyspark [1] to help you understand why'd you use it.
[0] http://dask.pydata.org/en/latest/
[1] http://dask.pydata.org/en/latest/spark.html