Hacker Newsnew | past | comments | ask | show | jobs | submit | y4mi's commentslogin

as the beginning of the article says: it all depends on what qualifies as a book

if you're including epub novels, that have never had a print done... then yeah, at least is gonna be the operating word, likely multiple times considering just how many fiction books are being produced on webnovel sites and then published as epubs for their fans to buy and support the author.


It also depends on what qualifies as 'published'. If a bot generates 100 nonsensical children's books with an LLM, lists them all on Amazon, and then takes them down a day later before anyone notices any of them (see https://news.ycombinator.com/item?id=40779643), then were they ever really published?


I - ignorantly - wonder what the whole point of that might be? - they cannot possibly sell that many in the intervening time ...


Padding numbers? ‘Our shitty bot has successfully* published* hundreds of books!’


That is awful. (Though you are probably entirely correct ...)

(The only one's benefiting from that are the slop "shovels" purveyors to the AI-slop gold rush ...)


> Methods: git merge --ff-only or git rebase && git merge (extreme clean freaks add the --squash option)

>> Pros: Linear history, git log is easy to read, git revert requires no thought.

Does it require no thought because it's fundamentally impossible? if you're doing -ff without squashing it's gonna get hard to figure out which commits you'll have to revert I think. All histories get merged into a single stream after all.


I sometimes wish companies were required to report their opening hours when registering and the state made this information available through a public api.

It's not really possible as not all place have scheduled opening and closing times... But a dude can dream, right?


Just a thought... opening hours can usually be scraped from their website (totally unfounded guess: that's how Google does it), so what seems to be missing here is a free tool -- and probably a free central database of pre-scraped data -- to collect this information for others to use together with OSM data. The process will be a bit different from OSM because data is scraped, not edited.

Edit: To add, if such a tool needs site-specific configuration to work, then that configuration could again be community-edited in a style similar to OSM data.


> totally unfounded guess: that's how Google does it

I'm pretty sure Google doesn't do that. At least not as the primary information source.

I don't work for Google, so I could be mistaken, but I've provided several information updates before. Google gave me the "local guide" after my first accepted update and gave me gamified incentives to update and verify local information about businesses afterwards until I disabled them again. (Like notifications when I'm at a store to verify if there is a parking lot and things regarding to disabilities etc).

The questions are usually really innocuous and quick to answer, which pretty much makes the player into free labor for Google, one minute at a time.

I was required to provide some proof on some information like opening times and that's probably how they're linking back to the official website. But it happens through user input not automated scraping i believe.


> C/Rust/Go version

Minor nitpick: JS/V8 beats Go in most benchmarks that are relevant for search.

JS is the outlier here however, because of the insane amount of optimizations that made it perform so unreasonably well, despite being an interpreted language.


I've never seen such a benchmark. Which one is it?


It obviously depends on how you've implemented the feature, but there are a lot of cpmparitive benchmarks if you bother to put "go vs JS Performance benchmark" into the search bar and press enter. The language itself doesn't make code performance however, so you'll have good and bad implementations in any language you go with.

I.e. specifically regex, which is highly relevant in searching through strings: https://github.com/mariomka/regex-benchmark

And the always interesting techempower Project, which leaves the implementation to participants of each round. https://www.techempower.com/benchmarks/#section=data-r21&tes...

Choose whatever category you wish there, js is faster then go in almost all categories there.

Even though I said it before, I'm going to repeat myself as I expect you to ignore my previous message: the language doesn't make any implementation fast or slow. You can have a well performing search engine in go and JS. The performance difference will most likely not be caused by the language with these two choices. And the same will apply with C/Rust. The language won't make the engine performant and creating a maximally performant search engine is hard. But a theoretically perfect implementation would likely be fastest in C/Rust, followed by the usual suspects such as Go/Java/C#/JS and finally ending with all other interpreted languages such as ruby and python


Regex doesn't feature in this kind of library, at least not in mine. It's a token by token comparison/weighing, for which regular expressions are unsuitable. Mine looks at string prefixes and does a Levenshtein comparison on the results, and I think OP's project does that too.

The techempower benchmarks seem geared towards http backend server frameworks. There's only one Javascript framework that scores well, "just-js", but that's a bit low-level for a framework. I don't think it says much about text search performance.

> the language doesn't make any implementation fast or slow

Not ignoring that, but I think it's half true (you can't have a well performing app in native Python, basically, but a bad implementation will also cost a lot of performance). JS is fast enough for most tasks, that's true.


> the language doesn't make any implementation fast or slow

I think some languages are much easier to make things fast in than others. Even if the theoretical limit is the same (or nearly the same) in all languages.

Messing up somewhere in Javascript with async isn’t unlikely.


It's usually only a "beautiful extendable maintainable paragon of elegance and purity" in the eyes of the original architect. Everyone else sees it as leaking abstraction with bolts everywhere to keep the original idea somewhat working... Or just a massive pile of technical debt.

Code quantity is always dependent on how well the person making this judgment understands the software. While I'm sure that everyone will agree that there are some clearly better ways of doing things, they sure as hell won't all agree on what these clearly better ways are.

One person's pile of garbage is the next person's perfect implementation with easy to understand procedural logic.

Please take note that I'm explicitly not saying that any implementation is better then another. I'm just trying to convey that the term technical debt very much depends on the mindset of the person looking at the implementation


Also "knowing" and actually knowing are different things.

There had been research done as far back as that which came to the conclusion that it causes climate change. But that doesn't mean that the governing body at that time was aware of the research and if so, really believed it or just thought that the researchers are massively exaggerating.

I know that the research was done on the behalf of the government, but that doesn't mean that everyone governing/deciding on these issues had access to/were aware of the results.


> really believed it or just thought that the researchers are massively exaggerating.

Particularly in a period of time, the '60s/'70s, where a lot of things were happening and lots of stuff was shouted in the street as "inevitable" (the proletariat rising, the age of Aquarius, the arrival of aliens, or even good ol' biblical apocalypse...).


I was reminded this week of the "Peak Oil" craze around ten years ago. No one really talks about that anymore.


Aren't card PINs only 4 numbers long? That's almost 10k possible combinations I believe, pretty trivial to put together.

Checking which corresponds to what card is the hard step because you need access to an acquirer to my knowledge, and you'll lose that access quiet quickly if you attempt too many incorrect combinations.


You can use more than 4 digits if you want a more secure PIN.

EMV (the card standard used by all modern chip/contactless cards) supports PINs between 4-12 digits in length.


I have been wanting to try a 4< digit pin, but I expect payment terminals to go bonkers because they don’t accept it. Have any of you a card pin longer than 4?


Six-digit pin works well for me in European countries - Czechia, Germany, Austria, Spain, Italy.


My girlfriend used a 5-digit PIN for over 10 years in the UK and never had any issues that I can recall.

I’d change mine too except I use the PIN so infrequently (99% contactless now days) I’m worried I’d forget the new one!


Just try and be surprised - no issues.


just don't try to use it in some remote exotic place, 4 places are often hardcoded but you may still be able to withdraw/pay, or not


Mine is eight digits, never tried it outside Canada yet.


All pins in my country are 5 digits. Which can be annoying for four-d visitors (depends on the bank, and I've not heard of problems for a while).


Damned, with 5 digits, the cost of storage alone is a deterant for a rainbow table


Because i struggled to understand what you meant I'd like to rephrase it:

It matches only if the regex is applied to a singular word. It's not going to match if there is a sentence or any apostrophe etc, which is implied to be valid input because it supposedly matches "all words".


That’s easy to misunderstand. And if humans have trouble with it, then I won’t blame the poor AI.

Getting an AI to ask clarifying questions would be useful, of course…


Please don't advocate for violence, even if it's only in jest.


Why not? If you want someone to (not) do something, you need to provide a compelling rationale. Otherwise you're trying to compel some based on your personal preference.

Personally I think wild hyperbole is funny and effective at making a point, so my preference is that the poster continue.


To me, calls for violence are exactly the dividing line between free speech and restricted speech. Show extreme judgement when invoking violent imagery.


> If you want someone to (not) do something, you need to provide a compelling rationale.

OK. Advocating violence is a crime. And text-based communication is notoriously difficult medium to express sarcasm and irony. Crimes are bad, mmmmkay?


Nitpick - advocating violence, specially as a personal vida is not a crime in most countries; promoting violence is, and it depends on the violence -death penalties and stonings are state-sponsored acts of violence, perfectly legal within their applicability in many parts of the world. Also, crimes arent necessarily "bad", nor requinte violence. abortion in some American states is a crime, and stealing cars is also a crime.


Advocating violence is not necessarily a crime. For example, in the US that violence must be imminent and likely (see Brandenburg v. Ohio).


> Advocating violence is a crime

The Dunning-Kruger effect is real.

You're probably thinking of assault. Assault requires an imminent threat of physical harm and the apparent ability to inflict the harm.

Mike Tyson threatening to punch someone in a bar is assault. Saying that you're going to set all patent trolls on fire is not, unless you have Human Torch like superpowers.


> I prefer to rely on a proxy to perform TLS instead of the browser.

That's one step forward and about 30 steps backwards if you're actually doing that for security. Proxies silently accept broken TLS configuration all the time and serve then to you as https secured. You're unlikely to encounter invalid https configurations nowadays, so you likely won't ever notice, but it's definitely less secure to break the TLS connection in the proxy


> Proxies silently accept broken TLS configuration all the time

I don't want the browser to enforce TLS configuration; the proxy could be configurable to set it how I want it to accept or not accept broken TLS configurations.


Would be interested to see a list of those "about 30" steps. Surely, the number is neither made-up nor arbitrary.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: