Typically by looking at „runs“ of black and white pixels (after image has been binarized/thesholded). For a QR code marker, you would look for a 1-1-3-1-1 pattern, which would indicate that this image row crosses a QR code finder pattern.
But do they do this at different angles? If so, which? How do they deal with (possibly partially) obscured alignment markers? How do they determine the angle and size of squares? How does one sample each square (center point, average of square, min, max...)? Should I convert my image to grayscale first? If so... which of the RGB-to-grayscale methods should I use for best results? Should I pre-process the picture to enhance sharpness and reduce focus artifacts? Should I bruteforce each frame of the video stream in look for the alignment markers? Etc. etc.
I think I have the general idea on how all that works, but... isn't there an official spec/algorithm on how one is supposed to do this? The devil is in the details.
To take a “draw the rest of the fucking owl” approach to explaining this. Picking up QR codes is just like picking up normal barcodes, or decoding any sort of structured signal from a noisy environment (the kinda thing that at the heart of all RF comms).
Basically the entire of subject area of electronics and signal processing exists to find clever ways to solving all the problems you mention. So for people that do for a living, there’s a huge set of standard algorithms and approaches you can apply to solve the problem of “detecting QR codes”, in the same way there’s standard approaches to sorting a list, or building a B-tree, or creating a query planner.
It’s realistically not possible to summarise the process into a HN comment, because that would require explaining decades of signal processing research in a handful of sentences.
To give a flavour for solving this problem, QR code alignment markers are designed to be easy to detect, regardless of angle or being partially obscured. Their large simple pattern means you can use crude algorithms to find all alternating back and white patterns in your image, then analyse those patterns to detect which ones are noise and which ones might be alignment markers. Stuff like the regular spacing of the alignment patterns, plus the timing strips between them, give you anchors to rapidly check if your looking at a set of actual alignment markers, or just things that happen to have the same shape as alignment markers. At which point you can start the expensive process of attempting to decode the QR data.
As for what algos to use to go from colour to greyscale, well that’s up to you to figure out. Building a crude QR code decoder is “easy” building a fast robust one is hard, and quite valuable. Nobody is gonna give you that kinda secret sauce for free, not when they can charge you for the hundreds of hours of R&D involved.
There’s an official spec for QR codes, it might even give you basic guidance on how build a very basic QR code decoder. But asking for a detailed spec on a fast robust QR decoder is like asking the IEEE for a detailed spec on how to build a 10Gbs Ethernet controller. It ain’t gonna happen, because building such a thing is hard, even once you know what signals you’re decoding.
I get what you mean, but the same situation applies when describing the algorithm to decode QR codes (it involves "deep" stuff like Reed-Solomon) and yet info on that is far more prevalent.
Why is that the case?
Nobody expects a detailed tutorial, but it's hard to even come by a list of techniques that are known to work (or maybe I'm just trash at search).
You can find plenty of information out there on various standard image and signal processing algorithms. Basic stuff like 2D convolutions form the backbone of many image processing techniques, and they’re well documented.
But just like Reed-Solomon encoding is a very specific algorithm, with many different applications, of which QR codes is one. What you’re looking for is similar, it’s a huge set of very specific algorithms with huge set of applications, of which one is QR codes.
QR codes are simply too niche for anyone to have put together a pubic document on what exact image processing techniques you might use to decode them. If you want to teach people about 2D signal processing, then QR codes are probably not a good starting point.
I would also argue that Reed-Solomon encoding is not “deep” anything. It’s one of many different 1D signal error correction algorithms that exist. For people who work on signal processing as day job, Reed-Solomon is about as “deep” as quick-sort is to programmers.
As for why quick-sort is well documented with many open implementations, and QR codes aren’t. I would argue it’s simple due to the industry they developed in. QR codes started in manufacturing, being used for inventory management, almost certainly by embedded and electronic engineers. All of those industries pre-date the open source movement by decades, trade secrets are still important for them, so they’re not naturally inclined to share IP.
What you can do is read a paper that’s relevant to your problem (let‘s say QR code perspective correction), and then dig through the references. Good papers usually have plenty of good references.
If you have a rough idea how it works, include these in your google queries to get results that go into details ;)
Some more things to include:
- perspective correction may be needed (the surface with the QR code is not parallel to the image plane, so pixels further away get smaller)
- if the algorithm takes parameters, such as color filters / black-vs-white thresholds, try with different parameter sets in sequence. The user is pointing the camera at the code for a relatively long time, compared to the time it takes to process an image.
- pixely images should produce a recognizable pattern in the fourier transform of the image. This gives you the size AND rotation at once.
Yes exactly. Do I need to look for 1-1-3-1-1 markers in _all_ 359 directions? What about warping/distortion of the surface etc? All the details is what I'm interested in seeing.
The 1-1-3-1-1 (with tolerance) works at angles as well, as long as the code is not perspectively warped.
You are right in assuming that there‘s a lot more to it than that. Localization and decoding (usually these are considered separately) 2D codes is an active research field, and there are a lot different approaches.
Disclaimer: I an the author of STRICH (https://strich.io) and have dived relatively deep into the topic.