Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

tbh it is lame for any program reading a text file to not support BOM. It's just one if.




There isn't really any one "text file" though, the kernel looks for the first two bytes to match what "#!" corresponds to in ASCII.

https://www.youtube.com/watch?v=J8nblo6BawU is some great watching on how "Plain text isn't that simple"


UTF-8 is a text format with no BOM. Just like ASCII doesn't support a BOM. The BOM is a UTF-16 or UTF-32 thing, so "UTF-8 with BOM" is a binary file that happens to contain some UTF-8 strings as well. Since it's not a text file, it makes sense that utilities expecting text files don't handle it.

Eh? A utf8 file starting with ZERO WIDTH NO-BREAK SPACE is not a text file? How do you figure that?

If it starts with 0xFE 0xFF, but is otherwise UTF-8 instead of UTF-16, it's a binary file. If it starts with 0xEF 0xBB 0xBF, it's a text file with a ZERO WIDTH NO-BREAK SPACE at the start.

> If it starts with 0xFE 0xFF, but is otherwise UTF-8 instead of UTF-16, it's a binary file

Sure, but who does this? All the Microsoft tooling writes 0xEF 0xBB 0xBF if you output utf8 with a BOM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: