OCR works well on regular text because it can use language stats for error corre...

dheera · on June 3, 2023

Ping the URL in the background, if it returns a 404, iteratively change the least-confident character recognitions to the next best guess until it returns a 200 and then return that to the user. For example if it doesn't know whether a character is "I" or "l", and another character is a "0" or an "O", try all 4 combinations and show the correct one before the user even realizes.

If a piece of an URL looks like a word, factor that into your confidence scores as well, e.g. "somefoo.com/illusion/" is far more likely to be the correct URL, especially for something printed on paper, than "somefoo.com/iIIusion/", "somefoo.com/iIlusion/", or "somefoo.com/ilIusion/"

You don't even to do a full GET request, you can just use HEAD to save on mobile bandwidth.

For URLs that are printed on paper (e.g. at a restaurant table), have the backend do this and cache the correct URLs based on geolocation.

Taywee · on June 3, 2023

That's a pretty fragile and fault-prone method, and is extremely prone to typo squatting. I'm glad something like this isn't popular today, because it would very quickly be capitalized on by scammers, who wouldn't even have to rely on humans making typos anymore.

If it doesn't have error checking and checksums, it shouldn't be relied upon by the general public.

dheera · on June 3, 2023

Why would typo squatting be part of the threat model?

Let's say you're at a restaurant, they have one of those stupid QR codes to view the menu (I really wish they would print the damn menus, but that's another story ...). Why not just an URL of "myrestaurant.com/menu" and you just scan that URL? Who would typo-squat an alternate menu and paste it on the restaurant tables?

If there's a venue in which typo-squatting is a potential threat, QR-squatting would be an even bigger threat because even if only 5% of people have sharp enough eyes to spot a typo squat, probably only 0.0001% of people are good at decoding QR codes in their brain and would spot a QR squat. The typo squat is easier to spot.

If you really wanted, we could also establish an optional checksum protocol for URLs e.g. http://myrestaurant.com/menu#cs=123

If the checksum fails, the phone can try (a) all possible 1-2 character edits for error correction, which should take only a couple milliseconds (b) auto-Google the URL to see if there is a fixed version of the URL (c) error out if both of the above fail.

TimothyBJacobs · on June 3, 2023

They’re saying that an attacker would register variations of similar looking domains relying on the OCR misreading the correct URL. They aren’t suggesting that an attacker would paste the invalid URL in real life menus.

dheera · on June 5, 2023

The whole point of QR codes is to paste URLs in real life. For non-real-life use cases even QR codes are unnecessary and people would just use links. For that, yes, typo-squatting and phishing are issues.

I'm suggesting that for real life we could just use URLs and OCR instead of QR codes.

jraph · on June 5, 2023

> I'm suggesting that for real life we could just use URLs and OCR

No, because that's not reliable, and way more complicated and error-prone.

It's the same reason bar codes are used instead of directly the id they code.

Your earlier solution doesn't convince me. It's interesting, but not practical, and reading the URL should work even if internet is off or the server is down.

(Having a checksum next to the URL may start making OCR practical though)