Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Good stuff. I've also been toying with doing some homegrown search engine indexing (as an exercise in scalable systems), and this is a fantastic result and great inspiration.

Definitely want to see more people doing that kind of low-level work instead of falling back to either 'use elasticsearch' or 'you can't, you're not google'.



Well just crunching the numbers should indicate what is possible and what isn't.

For the moment I have just south of 20 million URLs indexed.

1 x 20 million bytes = 20 Mb.

10 x 20 million bytes = 200 Mb.

100 x 20 million bytes = 2 Gb.

1,000 x 20 million bytes = 20 Gb.

10,000 x 20 million bytes = 200 Gb.

100,000 x 20 million bytes = 2 Tb.

1,000,000 x 20 million bytes = 20 Tb.

This is still within what consumer hardware can deal with. It's getting expensive, but you don't need a datacenter to store 20 Tb worth of data.

How many bytes do you need, per document, for an index? Do you need 1 Mb of data to store index information about a page that, in terms of text alone, is perhaps 10 Kb?


What crawler are you using and what kind of crawling speeds are you achieving?

How do you rank the results (is it based on content only) or you have external factors too?

What is your personal preferred search option of the 7 and why?

Thanks for making something unique and sorry that despite all the hype this got, you got only $39/month on Patreon. It is telling in a way.


> What crawler are you using and what kind of crawling speeds are you achieving?

Custom crawler, and I seem to get around 100 documents per second at best, maybe closer to 50 on average. Depends a bit on how many crawl-worthy websites it finds, and there is definitely diminishing returns as it goes deeper.

>How do you rank the results (is it based on content only) or you have external factors too?

I rank based on a pretty large number of factors, incoming links weighted by the "textiness" of the source domain, and similarity to the query.

> What is your personal preferred search option of the 7 and why?

I honestly use Google for a lot. My search engine isn't meant as a replacement, but a complement.

> Thanks for making something unique and sorry that despite all the hype this got, you got only $39/month on Patreon. It is telling in a way.

Are you kidding? I think the Patreon is a resounding success! I'm still a bit stunned. I've gotten more support and praise, not just in terms of money but also emails and comments here than I could have ever dreamed possible.

And this is just the start, too. I only recently got the search engine working this well. I have no doubt it can get much better. The fact that I have 11 people with me on that journey, even if they "just" pay my power bill, that's amazing.

I'm honestly a bit at a loss for words.


You have a great attitude!

And I am not kidding. I think for something that got so much attention on HN, where realistically this kind of product can only exist for now, the 'conversion' rate was very low. Billion dollar companies were made of HN threads with lot less engagement. Makes me wonder do we really want a search engine like this or we just like the idea of it?

And what are the barriers to use something like this? You say yourself that you are using Google most of the time. Is jumping to check results on this engine going to be too much friction for most uses?

Can something like this exist in isolation? What kind of value would it need to provide for users to remember using it en-masse as an additional/primary vertical search like they do for Amazon?

Just thinking out-loud as I am also interested in the space (through http://teclis.com).


I think in part it may just be because I'm not trying to found a start-up, and I'm not trying to get rich quick. If I were, I would have dealt with this very differently. My integrity and the integrity of this project is far more important than my bank balance. Not everyone feels that way, and I can respect that, but I do.

Ultimately I think running something like this for profit would create really unhealthy incentives to make my search engine worse. Any value it brings, right now, it brings because it isn't trying to cater to every taste and every use case.

I also hate the constant "don't forget to slap the like and subscribe buttons"-shout outs of modern social media, even though I'm aware they it is extremely efficient. If I went down that route, I would become part of the problem I'm trying to help cure. I do feel the sirens' call though, it's intoxicating getting this sort of praise and attention.

I want this to be a long-term project, not a some overnight cinderella story.

In the end, my search engine is never going to replace google. It isn't trying to, it's trying to complement it. It's barely able now, but hopefully I can make it much better in the months and years to come.


I think it's good not to have to depend on financial compensation for every single thing in your life, if you can be comfortable or do well otherwise.

This allows quite a bit of its own kind of freedom even if maximum financial opportunity is not fully exploited. Perhaps even because you are not grasping for every dollar on the table at all times.

You can do things without having to know if they will pay off, and if it turns out big anyway you can make money as a byproduct of what you do rather than having pure financial pursuit be the root of every goal.


I agree with everything you say. The 'subscribe and like buttons' would not help your conversion with HN readers, on the contrary. Trying to run this for profit also would not help your conversion with this audience.

So given your setup is already ideal for 'conversions' for this population (low profile, high integrity, no BS) I was simply genuinely surprised that only 11 people converted given enormous visibility/interest this thread had. Hope that makes sense.


I think it simply takes time to build trust. The threshold to sending someone money is high. I probably wouldn't send someone money based on a proof of concept and lofty ambitions alone.

I'd absolutely consider sending someone money if they kept bringing something of value into my life. If I want more people to join the patreon, I'll just have to earn their trust and support.


The day Google first appeared on the full internet it was excellent of course because it had no ads.

Plus another excellent feature was you would get the same search results no matter who or where you were for quite some period of calendar time.

If something new did appear it was likely to be one of the new sites that was popping up all the time and it was likely to be as worthwhile as its established associates on the front page.

You shouldn't need to crawl nearly as fast if you can compensate by treading more suitably where those have gone before.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: