Glad our post sparked some pretty deep discussions on the future of spark-on-k8s ! The OS community is working on several projects to help this problem. You've mentioned NFS (by Google) but there's also the possibility to use object storage. Mappers would first write to local disks, and then the shuffle data would be async moved to the cloud.
Sources: - end of presentation https://www.slideshare.net/databricks/reliable-performance-a... - https://issues.apache.org/jira/browse/SPARK-25299