>So I don't think your tinystories example qualifies for the PRH, since it's not...

eximius · 2025-05-21T22:43:13 1747867393

As I read the paper, you would be able to detect it in a couple of ways

1. possibly high loss where the models don't have compatible embedding concepts 2. given a sufficient "sample" of vectors from each space, projecting them to the same backbone would show clusters where they have mismatched concepts

It's not obvious to me how you'd use either of those to tweak the vector space of one to not represent some concept, though.

But if you just wanted to make an embedding that is unable to represent some concept, presumably you could already do that by training disjoin "unrepresentable concepts" to a single point.