I find this really confusing, and feel like I'm missing some things. Overall the impression I get from this post is of an application optimized in very weird places, while being very unoptimized in others.
A few thoughts:
1) RAW Hollow sounds like it has very similar properties to Mnesia (the distributed database that comes with the Erlang Runtime System), although Mnesia is more focused on fast transactions while RAW Hollow seems more focused on read performance.
2) It seems some of this architecture was influence by the presence of a 3rd party CMS. I wonder how much impact this had on the overall design, and I would like to know some more about the constraints it imposed.
> The write part of the platform was built around the 3rd-party CMS product, and had a dedicated ingestion service for handling content update events, delivered via a webhook.
> ...
> For instance, when the team publishes an update, the following steps must occur:
>
> 1) Call the REST endpoint on the 3rd party CMS to save the data.
> 2) Wait for the CMS to notify the Tudum Ingestion layer via a webhook.
What? You call the CMS and then the CMS calls you back? Why? What is the actual function of this 3rd-party CMS? I had the impression it may be some kind of editor tool, but then why would Tudum be making calls out to it?
3) What is the actual daily active user count, and how much customization is possible for the users? Is it just basic theming and interests or something more? When I look through the Tudum site, it seems like it is just connected to the users netflix account. I'm assuming the personalization is fairly simple like theming, favorited shows, etc.
> Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.
It's unclear to me if this is users signing up for Tudum, the number of unique monthly visitors, the number of page views, or something else. I'm assuming that it is monthly active users, and that those users generally already have netflix accounts.
4) An event-driven architecture feels odd for this sort of access and use pattern. I don't understand what prevents using a single driving database, like postgres, in a more traditional pattern. By my count just the data from the CMS is also duplicated in the Hollow datastore and, implicitly, the generated pages. Of course when you duplicate data you create synchronization problems and latency. That is the nature of computing. I have always preferred to, instead, stick with just a few active copies of relevant data when practical.
> Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!
Compressed, uncompressed, this is a comically small amount of data. High end Ryzen processors almost have this much L3 CACHE!!
As near as I can tell, writes only flow one way in this system so I don't even know if RAW Hollow needs strong read-after-write consistency. It seems like writes flow from the CMS, into RAW Hollow, and then onto the Page Builder nodes. So how does this provide anything that a postgres read-replica wouldn't?
5) Finally, the most confusing part - are they pre-building every page for every user? That seems ridiculous but it is difficult to square some of the requirements without such a thing. If you can render a page in 400 milliseconds then congratulations, you are in the realm of most good SSR applications. This would immediately save a ton of computation because there is no need to pre-build these, why not just render them on demand?
Overall this is perplexing post. I don't understand why a lot of these decisions were made and the solution seems very over-complicated for the problem as described.
A few thoughts:
1) RAW Hollow sounds like it has very similar properties to Mnesia (the distributed database that comes with the Erlang Runtime System), although Mnesia is more focused on fast transactions while RAW Hollow seems more focused on read performance.
2) It seems some of this architecture was influence by the presence of a 3rd party CMS. I wonder how much impact this had on the overall design, and I would like to know some more about the constraints it imposed.
What? You call the CMS and then the CMS calls you back? Why? What is the actual function of this 3rd-party CMS? I had the impression it may be some kind of editor tool, but then why would Tudum be making calls out to it?3) What is the actual daily active user count, and how much customization is possible for the users? Is it just basic theming and interests or something more? When I look through the Tudum site, it seems like it is just connected to the users netflix account. I'm assuming the personalization is fairly simple like theming, favorited shows, etc.
> Attracting over 20 million members each month, Tudum is designed to enrich the viewing experience by offering additional context and insights into the content available on Netflix.
It's unclear to me if this is users signing up for Tudum, the number of unique monthly visitors, the number of page views, or something else. I'm assuming that it is monthly active users, and that those users generally already have netflix accounts.
4) An event-driven architecture feels odd for this sort of access and use pattern. I don't understand what prevents using a single driving database, like postgres, in a more traditional pattern. By my count just the data from the CMS is also duplicated in the Hollow datastore and, implicitly, the generated pages. Of course when you duplicate data you create synchronization problems and latency. That is the nature of computing. I have always preferred to, instead, stick with just a few active copies of relevant data when practical.
> Storing three years’ of unhydrated data requires only a 130MB memory footprint — 25% of its uncompressed size in an Iceberg table!
Compressed, uncompressed, this is a comically small amount of data. High end Ryzen processors almost have this much L3 CACHE!!
As near as I can tell, writes only flow one way in this system so I don't even know if RAW Hollow needs strong read-after-write consistency. It seems like writes flow from the CMS, into RAW Hollow, and then onto the Page Builder nodes. So how does this provide anything that a postgres read-replica wouldn't?
5) Finally, the most confusing part - are they pre-building every page for every user? That seems ridiculous but it is difficult to square some of the requirements without such a thing. If you can render a page in 400 milliseconds then congratulations, you are in the realm of most good SSR applications. This would immediately save a ton of computation because there is no need to pre-build these, why not just render them on demand?
Overall this is perplexing post. I don't understand why a lot of these decisions were made and the solution seems very over-complicated for the problem as described.