I think you need to approach it more like grep than google. It's a forgotten art, dealing with this type of dumb search engine.
Like if you search for "How do I make a steak", you aren't going to get very good results. But a better query is "Steak Recipe", as that is at least a conceivable H1-tag.
That switch happened some years ago. I've been unlearning and relearning how to use google for what feels like at least three or four years now.
The main pain-point, though, is that a lot of long-tail searches you could've used to find different results in years past, now seem to funnel you to the same set of results based on your apparent intent. At least, it has felt that way -- I'm not entirely sure how the modern google algorithm works.
I realized this a few years ago when I observed my wife find things faster on Google than me.
I appreciate that it is easier for newcomers but I still hate it personally after years and especially that they cannot even avoid meddling with my queries even when I try to accept the new system and use the verbatim option.
> I think you need to approach it more like grep than google. It's a forgotten art
A search engine that accepted regex as the search parameter would be amazing.
I actually used this method as a field filter for a bunch of simple internal tools to search for info. Originally people were asking for individual search capabilities, but I didn't want it to become a giant project with me as the implementor of everyone's unique search capability feature request - so I just gave them regex, encoded inputs into the URL query string so they can save searches - gave em a bunch of examples to get going and now people are slowly learning regex and coming up with their own "new features" :P
But this made sense because it's a relatively small amount of data, so small that it's searched in the front end which is why it's more of a filter... I don't think pure regex would scale when used as a query on a massive DB, it would need some kind of hierachy still to only bother parsing a subset of relevant text... unless there is some clever functional regex caching algorithm that can be used.
So, you are re-implementing Altavista, Lycos and other old search engines.
They used the naive approach: you searched for "steak", and they would bring the pages which included the word "steak".
The problem is that people could fool these engines by adding a long sequence like "steak, steak, steak, steak, steak, steak" to their site -- to pretend that they were the most authoritative page about steaks.
Google's big innovation was to count the referrers -- how many pages used the word "steak" to link to that particular page.
> The problem is that people could fool these engines by adding a long sequence like "steak, steak, steak, steak, steak, steak" to their site -- to pretend that they were the most authoritative page about steaks.
I don't see a lot of people investing in SEO to boost their Marginalia results.
> Google's big innovation was to count the referrers -- how many pages used the word "steak" to link to that particular page.
Then people fooled Google into showing the White House as top result when searching for "a miserable failure".
At the moment marginalia's approach of sorting pages into quality buckets based on lack of JS seems to be working extremely well, but of course it will be gamed if it gets popular.
However, I'd rather want SEO-crafting to consider itself with minimizing JS, rather than spamming links into every comment field on every blog across the globe ;-)
Like if you search for "How do I make a steak", you aren't going to get very good results. But a better query is "Steak Recipe", as that is at least a conceivable H1-tag.