But in both cases you face the problem of aggregating preferences of many into o...

random314 · on Sept 17, 2021

But search engines aren't voting to maximize the chances that their preferred candidate shows up on top. The mixed ranker has no requirement to satisfy Arrows integrity constraints. It has to satisfy the end user, which is quite possible in theory.

Conditions the mixed ranker doesn't have to satisfy "ranking while also meeting a specified set of criteria: unrestricted domain, non-dictatorship, Pareto efficiency, and independence of irrelevant alternatives"

PaulHoule · on Sept 17, 2021

Sure, but the problem that conventional IR ranking functions are not meaningful other than by ordering leads you to the dismal world of political economy where you can't aggregate people's utility functions. (Thus you can't say anything about inequality, only about Pareto efficiency)

Hypothetically you could treat these functions as meaningful but when you try you find that they aren't very meaningful.

For instance IBM Watson aggregated multiple search sources by converting all the relevance scores to "the probability that this result is relevant".

A conventional search engine will do horribly in that respect, you can fit a logit curve to make a probability estimator and you might get p=0.7 at the most and very rarely get that, in fact, you rarely get p>0.5.

If you are combining search results from search engines that use similar approaches you know those p's are not independent so you can't take a large numbers of p=0.7's and turn that into a higher p.

If you are using search engines that use radically different matching strategies (say they return only p=0.99 results with low recall) the Watson approach works, but you need a big team to develop a long tail of matching strategies.

If you had a good p-estimator for search you could do all sorts of things that normal search engines do poorly, such as "get an email when a p>0.5 document is added to the collection."

For now alerting features are either absent or useless and most people have no idea why.