I always thought it seemed likely that most needle in a haystack tests might run...

		whereismyacc on May 15, 2024 \| parent \| context \| favorite \| on: GPT-4o's Memory Breakthrough – Needle in a Needles... I always thought it seemed likely that most needle in a haystack tests might run into the issue of the model just encoding some idea of 'out of place-ness' or 'significance' and querying on that, rather than actually saying something meaningful about generalized retrieval capabilities. Does that seem right? Is that the motivation for this test?