Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> except things that do require you to assume it's valid UTF-8

That's the point.



But no one has demonstrated an actual operation that requires valid UTF-8. The reasoning is always circular: "I require valid UTF-8 because someone else requires valid UTF-8".

Eventually there should be an underlying operation which can only work on valid UTF-8, but that doesn't exist. UTF-8 was designed such that invalid data can be detected and handled, without affecting the meaning of valid subsequences in the same string.


> UTF-8 was designed such that invalid data can be detected and handled, without affecting the meaning of valid subsequences in the same string.

But there is not a canonical response to invalid data. So literally every operation that might need to make a choice of what to do when presented what invalid data should either (a) accept a parameter asking what to do on error and potentially fail or (b) take a parameter type that forces errors to be handled in advance.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: