The only annoying thing with QR codes is that they're limited to text formats, s...

matheusmoreira · on Sept 14, 2022

QR codes do support arbitrary binary data in 8 bit mode. Programs such as qrencode also allow the creation of 8 bit QR codes. It's almost always the QR decoder that's limited to text formats because they assume the data is text.

I submitted some patches to ZBar to improve this. See also my answer to this stack overflow question:

https://stackoverflow.com/a/60518608/512904

I don't understand what you mean by this:

> Although the QR format does have an encoding called “byte mode”, each byte in this mode represents an ISO 8859-1 character, not binary data.

I've read the standard but I didn't get the impression that the 8 bit encoding mode was assumed to be text. Can you cite the part that says this?

tiagod · on Sept 15, 2022

base45 is preferred (it was used for the EU COVID certificates) as the binary mode isn't reliable.

matheusmoreira · on Sept 15, 2022

It seems okay at least to me.

I looked up what the standard says:

> The default interpretation for QR Code is ECI 000020 representing the JIS8 and Shift JIS character sets.

> 8.3.4 8-bit Byte Mode

> The 8-bit byte mode handles the 8-bit Latin/Kana character set in accordance with JIS X 0201 (character values 00HEX to FFHEX).

> In this mode data is encoded at a density of 8 bits/character.

> 8.4.4 8-bit Byte Mode

> In this mode, one 8 bit codeword directly represents the JIS8 character value of the input data character as shown in Table 6, i.e. a density of 8 bits/character.

> In ECIs other than the default ECI, it represents an 8-bit byte value directly.

So it's just an 8 bit value that may or may not represent a JIS8 character. The referenced table 6 is a lot like the table in the article linked above but for JIS8 instead. There are reserved/unused characters but I didn't get the impression that they were invalid. It seems contradictory to me to prohibit the use of bytes which may be in use in the future. Is this incorrect?

https://en.wikipedia.org/wiki/JIS_X_0208#Unassigned_code_poi...

Gives me the same impression. Plenty of "should not"s and even historical examples where people did something with those characters anyway. So it didn't seem like a big deal to me that binary data might contain those bytes. It even says many unassigned characters were assigned in newer standards which also happens often in Unicode.

The RFC for base 45 is interesting too:

https://datatracker.ietf.org/doc/rfc9285/

> Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1.

> Thus, QR codes cannot be used to encode arbitrary binary data directly.

The implication being that it would work if only we could fix those decoders so they stop trying to convert data to text.

That's been my exact experience. The binary decoding problems in zbar were due to character encoding conversion attempts. I just made it output the bytes unchanged and it worked perfectly with arbitrary binary data. I also studied zxing-cpp and it does both: it converts the binary data to text and keeps a copy of the original bytes. There's even comments saying the standard isn't clear.

lifthrasiir · on Sept 14, 2022

You probably want to learn about base45 [1] instead, which is exactly designed for the alphanumeric mode of QR code.

[1] https://www.rfc-editor.org/rfc/rfc9285.html

kstenerud · on Sept 15, 2022

Base45 is an inefficient way to store binary data, as are all of the BaseXYZ encodings that by all rights should already be relegated to the dustbin of history in the 21st century...

lifthrasiir · on Sept 15, 2022

Unlike many baseXX encodings, base45 has an advantage of actually being undone by the alphanumeric mode. Granted, it is still unfortunate that base45 has its uses (because many QR decoders do not properly handle 8-bit data) but inefficiency is not its deficiency.

kstenerud · on Sept 15, 2022

Oh yes sorry you're right. When using alphanumeric mode you only get a small amount of binary encoding bloat (3%) with base45.

But even with binary data encoded to baseXX, you still need to decide what the byte stream means. I was mainly looking at how to efficiently encode a hierarchical document containing the most common data types (that anyone can decode and inspect) into a QR code.

matheusmoreira · on Sept 15, 2022

> because many QR decoders do not properly handle 8-bit data

We should fix that! Which QR decoders still lack support for binary data decoding?

I submitted patches to zbar to add a binary decoding mode, it's already available in new versions. ZXing and zxing-cpp also have support.

Maybe it's the camera apps that lack support for this.

arcastroe · on Sept 14, 2022

How does this compare against simply encoding the binary data in base64?

pkulak · on Sept 14, 2022

30% more efficient, I'd guess.