> Typical offers a new solution ("asymmetric" fields) to the classic problem of how to safely add or remove fields in record types without breaking compatibility. The concept of asymmetric fields also solves the dual problem of how to preserve compatibility when adding or removing cases in sum types.
That's a nice idea... But I believe the design direction of proto buffers was to make everything `optional`, because `required` tends to bite you later when you realize it should actually be optional.
My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:
> Unlike optional fields, an asymmetric field can safely be promoted to required and vice versa.
> [...]
> Suppose we now want to remove a required field. It may be unsafe to delete the field directly, since then clients might stop setting it before servers can handle its absence. But we can demote it to asymmetric, which forces servers to consider it optional and handle its potential absence, even though clients are still required to set it. Once that change has been rolled out (at least to servers), we can confidently delete the field (or demote it to optional), as the servers no longer rely on it.
> My understanding is that asymmetric fields provide a migration path in case that happens, as stated in the docs:
If you can assume you can churn a generation of fresh data soonish, and never again read the old data. For RPC sure, but someone like Google has petabytes of stored protobufs, so they don't pretend they can upgrade all the writers.
This seems interesting. Still not sure if `required` is a good thing to have (for persistent data like log you cannot really guarantee some field's presence without schema versioning baked into the file itself) but for an intermediate wire use cases, this will help.
I've never heard of Typical but the fact they didn't repeat protobuf's sin regarding varint encoding (or use leb128 encoding...) makes me very interested! Thank you for sharing, I'm going to have to give it a spin.
Varint encoding is something I've peeked at in various contexts. My personal bias is towards the prefix-style, as it feels faster to decode and the segregation of the meta-data from the payload data is nice.
But, the thing that tends to tip the scales is the fact that in almost all real world cases, small numbers dominate - as the github thread you linked relates in a comment.
The LEB128 fast-path is a single conditional with no data-dependencies:
if ! (x & 0x80) { x }
Modern CPUs will characterize that branch really well and you'll pay almost zero cost for the fastpath which also happens to be the dominant path.
Seems like a lot of effort to avoid adding a message version field. I’m not a web guy, so maybe I’m missing the point here, but I always embed a schema version field in my data.
The point is that its hard to prevent asymmetry in message versions if you are working with many communicating systems. Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema, as this causes unnecessary dependencies between the release trains of these services. At the same time, one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.
There's probably more issues I haven't mentioned, but long story short: in live, interconnected systems, it becomes important to have intelligent message versioning, i.e: a version number is not enough.
> Lets say four services inter-communicate with some protocol, it is extremely annoying to impose a deployment order where the producer of a message type is the last to upgrade the message schema
i don't know how you arrived at this conclusion
the protocol is the unifying substrate, it is the source of truth, the services are subservient to the protocol, it's not the other way around
also it's not just like each service has a single version, each instance of each service can have separate versions as well!
what you're describing as "annoying" is really just "reality", you can't hand-wave away the problems that reality presents
I think I see what you’re getting at? My mental model is client and server, but you’re implying a more complex topology where no one service is uniquely a server or a client. You’d like to insert a new version at an arbitrary position in the graph without worrying about dependencies or the operational complexity of doing a phased deployment. The result is that you try to maintain a principled, constructive ambiguity around the message schema, hence asymmetrical fields? I guess I’m still unconvinced and I may have started the argument wrong, but I can see a reasonable person doing it that way.
Yes thats a big part, but even bigger is just the alignment of teams.
Imagine team A building feature XYZ
Team B is building TUV
one of those features in each team deals with messages, the others are unrelated.
At some point in time, both teams have to deploy.
If you have to sync them up just to get the protocol to work, thats an extra complexity in the already complex work of the teams.
If you can ignore this, great!
It becomes even more complex with rolling updates though: not all deployments of a service will have the new code immediately, because you want multiple to be online to scale on demand. This creates an immediate necessary ambiguity in the qeustion: "which version does this service accept?" because its not about the service anymore, but about the deployments.
Ah, I see. Team A would like to deploy a new version of a service. It used to accept messages with schema S, but the new version accepts only S’ and not S. So the only thing you can do is define S’ so that it is ambiguous with S. Team B uses Team A’s service but doesn’t want to have to coordinate deployments with Team A.
I think the key source of my confusion was Team A not being able to continue supporting schema S once the new version is released. That certainly makes the problem harder.
> one cannot simply say: "I don't know this message version, I will disregard it" because in live systems this will mean the systems go out of sync, data is lost, stuff breaks, etc.
You already need to deal with lost messages, rejected messages, so just treat this case the same. If you have versions surely you have code to deal with mismatches and e.g. fail back to the older version.
Idk I generally think “magic numbers” are just extra effort. The main annoyance is adding if statements everywhere on version number instead of checking the data field you need being present.
It also really depends on the scope of the issue. Protos really excel at “rolling” updates and continuous changes instead of fixed APIs. For example, MicroserviceA calls MicroserviceB, but the teams do deployments different times of the week. Constant rolling of the version number for each change is annoying vs just checking for the new feature. Especially if you could have several active versions at a time.
It also frees you from actually propagating a single version number everywhere. If you own a bunch of API endpoints, you either need to put the version in the URL, which impacts every endpoint at once, or you need to put it in the request/response of every one.
I think this is only a problem if you’re using a weak data interchange library that can’t use the schema number field to discriminate a union. Because you really shouldn’t have to write that if statement yourself.
https://github.com/stepchowfun/typical
> Typical offers a new solution ("asymmetric" fields) to the classic problem of how to safely add or remove fields in record types without breaking compatibility. The concept of asymmetric fields also solves the dual problem of how to preserve compatibility when adding or removing cases in sum types.