The only annoying thing with QR codes is that they're limited to text formats, so I came up with a little tweak to support encoding ad-hoc hierarchical binary data into QR codes awhile back: https://www.technicalsourcery.net/posts/qr-superpowers/
QR codes do support arbitrary binary data in 8 bit mode. Programs such as qrencode also allow the creation of 8 bit QR codes. It's almost always the QR decoder that's limited to text formats because they assume the data is text.
I submitted some patches to ZBar to improve this. See also my answer to this stack overflow question:
> The default interpretation for QR Code is ECI 000020 representing the JIS8 and Shift JIS character sets.
> 8.3.4 8-bit Byte Mode
> The 8-bit byte mode handles the 8-bit Latin/Kana character set in accordance with JIS X 0201 (character values 00HEX to FFHEX).
> In this mode data is encoded at a density of 8 bits/character.
> 8.4.4 8-bit Byte Mode
> In this mode, one 8 bit codeword directly represents the JIS8 character value of the input data character as shown in Table 6, i.e. a density of 8 bits/character.
> In ECIs other than the default ECI, it represents an 8-bit byte value
directly.
So it's just an 8 bit value that may or may not represent a JIS8 character. The referenced table 6 is a lot like the table in the article linked above but for JIS8 instead. There are reserved/unused characters but I didn't get the impression that they were invalid. It seems contradictory to me to prohibit the use of bytes which may be in use in the future. Is this incorrect?
Gives me the same impression. Plenty of "should not"s and even historical examples where people did something with those characters anyway. So it didn't seem like a big deal to me that binary data might contain those bytes. It even says many unassigned characters were assigned in newer standards which also happens often in Unicode.
> Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1.
> Thus, QR codes cannot be used to encode arbitrary binary data directly.
The implication being that it would work if only we could fix those decoders so they stop trying to convert data to text.
That's been my exact experience. The binary decoding problems in zbar were due to character encoding conversion attempts. I just made it output the bytes unchanged and it worked perfectly with arbitrary binary data. I also studied zxing-cpp and it does both: it converts the binary data to text and keeps a copy of the original bytes. There's even comments saying the standard isn't clear.
Base45 is an inefficient way to store binary data, as are all of the BaseXYZ encodings that by all rights should already be relegated to the dustbin of history in the 21st century...
Unlike many baseXX encodings, base45 has an advantage of actually being undone by the alphanumeric mode. Granted, it is still unfortunate that base45 has its uses (because many QR decoders do not properly handle 8-bit data) but inefficiency is not its deficiency.
Oh yes sorry you're right. When using alphanumeric mode you only get a small amount of binary encoding bloat (3%) with base45.
But even with binary data encoded to baseXX, you still need to decide what the byte stream means. I was mainly looking at how to efficiently encode a hierarchical document containing the most common data types (that anyone can decode and inspect) into a QR code.