base45 is preferred (it was used for the EU COVID certificates) as the binary mo...

matheusmoreira · on Sept 15, 2022

It seems okay at least to me.

I looked up what the standard says:

> The default interpretation for QR Code is ECI 000020 representing the JIS8 and Shift JIS character sets.

> 8.3.4 8-bit Byte Mode

> The 8-bit byte mode handles the 8-bit Latin/Kana character set in accordance with JIS X 0201 (character values 00HEX to FFHEX).

> In this mode data is encoded at a density of 8 bits/character.

> 8.4.4 8-bit Byte Mode

> In this mode, one 8 bit codeword directly represents the JIS8 character value of the input data character as shown in Table 6, i.e. a density of 8 bits/character.

> In ECIs other than the default ECI, it represents an 8-bit byte value directly.

So it's just an 8 bit value that may or may not represent a JIS8 character. The referenced table 6 is a lot like the table in the article linked above but for JIS8 instead. There are reserved/unused characters but I didn't get the impression that they were invalid. It seems contradictory to me to prohibit the use of bytes which may be in use in the future. Is this incorrect?

https://en.wikipedia.org/wiki/JIS_X_0208#Unassigned_code_poi...

Gives me the same impression. Plenty of "should not"s and even historical examples where people did something with those characters anyway. So it didn't seem like a big deal to me that binary data might contain those bytes. It even says many unassigned characters were assigned in newer standards which also happens often in Unicode.

The RFC for base 45 is interesting too:

https://datatracker.ietf.org/doc/rfc9285/

> Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1.

> Thus, QR codes cannot be used to encode arbitrary binary data directly.

The implication being that it would work if only we could fix those decoders so they stop trying to convert data to text.

That's been my exact experience. The binary decoding problems in zbar were due to character encoding conversion attempts. I just made it output the bytes unchanged and it worked perfectly with arbitrary binary data. I also studied zxing-cpp and it does both: it converts the binary data to text and keeps a copy of the original bytes. There's even comments saying the standard isn't clear.