> Having the 4-byte prefix directly accessible (without indirection through an offset into a separate data buffer) can substantially improve the performance of comparisons returning false. This prefix can be encoded with multi-column hash keys to accelerate aggregations, joins. Sorts would likely also be significantly faster with this representation (experiments would tell for certain)
> Certain algorithms (for example “prefix of string” or “suffix of string” — e.g. PREFIX(“foobar”, 3) -> “bar”) can execute by manipulating StringView values only and not requiring any memory copying of large strings.
This document was an early proposal for adding what is now called the StringView (and ByteView) types to the Arrow format itself.
the first n bytes are likely by far the most often accessed in practices, specifically for sorting & filtering, etc. Storing them inline is likely a huge optimization for little cost.