How to Partitioning Data for Linear Scalability in Geospatial Queries?

brudgers · on July 17, 2016

This podcast talks about scaling Second Life which has a strong geographic component:

http://www.se-radio.net/2009/07/episode-141-second-life-and-...

My naive intuition is that sharding on two or more axes with some denormalization makes sense: e.g. sharding on both geospatial location and information layers. Infrequently modified elements that overlap several geospatial regions could be stored alongside each. This implies eventual consistency and high availability. On the other hand, some elements might need higher consistency and therefore have lower availability.

Which is to say that the proper architecture is one that allows accurate metrics and high levels of tuning based on actual use and application requirements.

Good luck.

SamReidHughes · on July 17, 2016

Having to query against 2 or 4 nodes is not bad, because you can and should run them concurrently, so you've still got the latency of one query. I wouldn't want to overlap data because that opens a new door for inconsistencies to occur.

ninjakeyboard · on July 17, 2016

Ya that was my fear with the duplication as well.

ninjakeyboard · on July 17, 2016

I found this - may be relevant. http://arxiv.org/abs/1509.00910