Redis Data Tiering - Redis Enterprise and AWS Elasticache for Redis support data tiering (using SSD for 80% of the dataset and moving things in and out). On AWS, a cache.r6gd.4xlarge with 100GB of memory can handle 500GB of data.
Local Files
> I can also cache it on disk, but then I need to think about cache expiration myself.
Is the challenge that you need it shared among many machines? On a single machine you can put 20 million files in a directory hierarchy and let the fs cache keep things hot in memory as needed. Or use SQLite which will only load the pages needed for each query and also rely on the fs cache.
S3 - An interesting solution is one of the SQLite S3 VFS's. Those will query S3 fairly efficiently for specific data in a large dataset.
Redis Data Tiering - Redis Enterprise and AWS Elasticache for Redis support data tiering (using SSD for 80% of the dataset and moving things in and out). On AWS, a cache.r6gd.4xlarge with 100GB of memory can handle 500GB of data.
Local Files
> I can also cache it on disk, but then I need to think about cache expiration myself.
Is the challenge that you need it shared among many machines? On a single machine you can put 20 million files in a directory hierarchy and let the fs cache keep things hot in memory as needed. Or use SQLite which will only load the pages needed for each query and also rely on the fs cache.
S3 - An interesting solution is one of the SQLite S3 VFS's. Those will query S3 fairly efficiently for specific data in a large dataset.