Agreed - API calls to China are indeed not necessary. My impression is that the GP was referring to the model being tuned during training to give subtly nudging or wrong answers that benefit Chinese industrial or intelligence operations. For a probably not-working example - imagine the following prompt: "Write me a cryptographically secure PRNG algorithm." One could imagine R1 being trained to have a very subtly non-random reply to that - one that the Chinese intelligence services know how to predict. Similar but more subtle things can be generating code that uses cryptographic primitives in ways that are subject to timing attacks, etc... And of course, simple but effective propaganda tactics such as : when being asked for comparison between companies/products, subtly prefer Chinese ones, and similar.
Considering that 300 light-nanoseconds is about 90m, getting a response (or even just one-way) in that time is essentially running right at the limits of physics/causality.
Measure the round trip and divide by two for the approximate one way time. It'd be really neat to measure the time it takes for a packet to travel in one direction, but it's somewhere between hard and impossible[1]; a very short path has less room to be asymetric though.
[1] If the clocks are synchronized, you can measure send time on one end, and receive time on the other. But synchronizing clocks involves estimating the time it takes for signals to pass im each direction, typically assuming each direction takes half the round trip.
You can use something like White Rabbit (https://en.wikipedia.org/wiki/White_Rabbit_Project) to keep clocks in sync. That still involves estimates, but a dedicated time sync network can do things like make sure all the cables are the same length.
I.e. ReLU is _piecewise_ linear. That discontinuity that separates the 2 pieces is precisely what makes it non linear. Which is what enables the actual universal approximation.
Followed by "in some sense it's [ReLU] still even MORE linear than tanh or sigmoid functions are". There's no way you misunderstood that sentence, or took it as my "definition" of linearity...so I guess you just wanted to reaffirm I was correct, again, so thanks.
Not sure about GP, but very recently, an open-addressing hashtable that could be traversed in both reverse modification and reverse insertion time order. There are actually some interesting subtleties when doing this for open-addressing hashtables (i.e., entries (and pointers to them) move around when the table rehashes).