I've ended up using "big data" tools like Spark for only 32GB of (compressed) da...

I've ended up using "big data" tools like Spark for only 32GB of (compressed) data before, because those 32GB represented 250 million records that I needed to use to train 50+ different machine learning models in parallel.

For that particular task I used Spark in standalone mode on a single node with 40 cores, so I don't consider it Big Data. But I think it does illustrate that you don't have to have a massive dataset to benefit from some of these tools -- and you don't even need to have a cluster.

I think Spark is a bit unique in the "big data" toolset, though, in that it's far more flexible than most big data tools, far more performant, solves a fairly wide variety of problems (including streaming and ML), and the overhead of setting it up on a single node is very low and yet it can still be useful due to the amount of parallelism it offers. It's also a beast at working with Apache Parquet format.