Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The search engine doesn't do any type of re-ordering or synonym stuff, it only tires to construct different N-grams from the search query.

So if you for example compare "SDL tutorial" with "SDL tutorials". On google you'd get the same stuff, this search engine, for better or worse doesn't.

This is a design decision, for now anyway, mostly because I'm incredibly annoyed when algorithms are second-guessing me. On the other hand, it does mean you sometimes have to try different searches to get relevant results.



I like this design decision. It pays you back for choosing your search terms carefully.


I’m not against a stemmer, actually, just against the aggressive concordances (?) that Google now employs, like when it shows me X in Banach spaces (the classical, textbook case) when I’m specifically searching for X in Fréchet spaces (the generalization I want to find but am not sure exists); of course Banach spaces and Fréchet spaces are almost exclusively encountered in the same context, but it doesn’t mean that one is a popular typo for the other! (The relative rarity of both of these in the corpus probably doesn’t help. The farcical case is BRST, or Becchi-Rouet-Stora-Tyutin, in physics, as it is literally a single key away from “best” and thus almost impossible to search for.)

On the other hand, Google’s unawareness of (extensive and ubiquitous) Russian noun morphology is essentially what allowed Yandex to exist: both 2011 Yandex and 2021 Google are much more helpful for Russian than 2011 Google. I suspect (but have not checked) that the engine under discussion is utterly unusable for it. English (along with other Germanic and Romance languages to a lesser extent) is quite unusual in being meaningfully searchable without any understanding of morphology, globally speaking.


I thought you could fix that by enclosing "BRST" in quotes, but apparently not. DuckDuckGo (which uses Google) returns a couple of results that do contain "BRST" in a medical context, but most results don't contain this string at all. What's going on?


I’m not certain what DDG actually uses (wasn’t it Bing?), but in my experience from the last couple of months it ignores quotes substantially more eagerly than Google does. For this particular term, a little bit of domain knowledge helps: even without quotes, brst becchi, brst formalism, brst quantization or perhaps bv brst will get you reasonable results. (I could swear Google corrected brst quantization to best quantization a year ago, but apparently not anymore.) Searching for stuff in the context of BRST is still somewhat unpleasant, though.

I... don’t think anything particularly surprising is happening here, except for quotes being apparently ignored? I’ve had it explained to me that a rare word is essentially indistinguishable from a popular misspelling by NLP techniques as they currently exist, except by feeding the machine a massive dictionary (and perhaps not even then). BRST is a thing that you essentially can’t even define satisfactorily without at the very least four years of university-level physics (going by the conventional broad approach—the most direct possible road can of course be shorter if not necessarily more illuminating). “Best” is a very popular word both generally and in searches, and the R key is next to E on a Latin keyboard. If you are a perfect probabilistic reasoner with only these facts for context (and especially if you ignore case), I can very well believe that your best possible course of action is to assume a typo.

How to permit overriding that decision (and indeed how to recognize you’ve actually made one worth worrying about without massive human input—e.g. Russian adjectives can have more than 20 distinct forms, can be made up on the spot by following productive word-formation processes, and you don’t want to learn all of the world’s languages!) is simply a very difficult problem for what is probably a marginal benefit in the grand scheme of things.

I just dislike hitting these margins so much.


It would not be a difficult problem if they allowed the " " operator to work as they claim it does, or revive the + operator.


In English, maybe; in Russian, I frequently find myself reaching for the nonexistent “morphology but not synonyms” operator (as the same noun phrase can take a different form depending on whether it is the subject or the object of a verb, or even on which verb it is the object of); even German should have the same problem AFAIU, if a bit milder. I don’t dare think about how speakers of agglunative languages (Finnish, Turkish, Malayalam) suffer.

(DDG docs do say it supports +... and even +"...", but I can’t seem to get them to do what I want.)


Ah, OK. I don’t know anything about Russian. This is a hard problem. I think the solution is something like what you suggest: more operators allowing different transformations. Even in English, I would like a "you may pluralize but nothing else" operator.


Well it’s not that alien, it (along with the other Eastern Slavic languages, Ukrainian and Belarusian) is mostly a run-of-the-mill European language (unlike Finnish, Estonian or Hungarian) except it didn’t lose the Indo-European noun case system like most but instead developed even more cases. That is, where English or French would differentiate the roles of different arguments of a verb by prepositions or implicitly by position, Russian (like German and Latin) has a special axis of noun forms called “case” which it uses for that (and also prepositions, which now require a certain case as well—a noun form can’t not have a case like it can’t not have a number).

There are six of them (nominal [subject], genitive [belonging, part, absence, “of”], dative [indirect object, recipient, “to”], accusative [direct object], instrumental [device, means, “by”], prepositional [what the hell even is this]), so you have (cases) × (numbers) = 6 × 2 = 12 noun forms, and adjectives agree in number and gender with their noun, but (unlike Romance languages) plurals don’t have gender, so you have (cases) × (numbers and genders) = 6 × (3 + 1) = 24 adjective forms.

None of this would be particularly problematic, except these forms work like French or Spanish verbs: they are synthetic (case, number and gender are all a single fused ending, not orthogonal ones) and highly convoluted with a lot of irregularities. And nouns and adjectives are usually more important for a web search than verbs.


> BRST, or Becchi-Rouet-Stora-Tyutin is literally a single key away from “best” and thus almost impossible to search for.

Hmm I seem to be getting only relevant results, no "best", not sure what you mean. Are you not doing verbatim search?

https://www.google.com/search?q=brst&tbs=li:1


English is more the outlier in regard to Germanic languages, try German or Finnish, with their wonderful compounds :)

https://e.humanities.uva.nl/publications/2004/kamp_lang04.pd...


Well yeah, English is kind of weird, but Finnish isn’t a Germanic language at all? It’s not even Indo-European, so even Hindi is ostensibly closer to English than Finnish. I understand Standard German (along with Icelandic) is itself a bit atypical in that it hasn’t lost its cases when most other Germanic languages did.

Re compounds, I expected they would be more or less easy to deal with by relatively dumb splitting, similar to greedy solutions to the “no spaces” problem of Chinese and Japanese, and your link seems to bear that out. But yeah, cheers to more language-specific stuff in your indexing. /s


Gaaah, brain fart - you're right, of course, dunno why I included it.


Maybe list the synonyms under the query, so its easier to try different formulations.


Oh this sounds like it could be a really cool idea! This way it could also be subtly teaching users that the engine doesn't do automatic synonyms translation so it's worth experimenting; also kinda like giving the synonyms feature while still keeping user in full control.


Don't change it. It's good this way.


It could simply become an option.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: