Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How is Unicode in any way related to JSON? JSON should just encode whatever dumb data someone wants to transport.

Unicode validation/cleanup should be done separately because it's needed in multiple places, not just JSON.



The contents of JSON strings doesn’t admit random binary data. You need to use an encoding like Base64 for that purpose.


JSON is text. If you're not going to use unicode in the representation of your text, you'll need some other way.


The current JSON spec mandates UTF-8, but practically speaking encoding is a higher-level concept. I suspect there are many server implementations that will respect the Content-Encoding header in a POST request containing JSON.


So?

All the letters in this string are “just text”:

    "\u0000\u0089\uDEAD\uD9BF\uDFFF"
JSON itself allows putting sequences of escape characters in the string that don’t unescape to valid Unicode. That’s fine, because the strings aren’t required to represent any particular encoding: it’s up to a layer higher than JSON to be opinionated about that.

I wouldn’t want my shell’s pipeline buffers to reject data it doesn’t like, why should a JSON serializer?


I actually agree, now that I understand what you're talking about.


JSON (unfortunately) requires strings to be Unicode. (JSON has other problems too, but Unicode is one of them.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: