Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The main problem is using text as a common format between different applications.

First: text is not well defined. Is it ASCII? Is it UTF-8? Some programs can spew UTF-32 with proper locale configured, it's a mess.

Second: encoding and decoding of objects to text is not defined at all. Those problems with filenames is just one example. Using newline as a separator is a natural thing that is easy to implement, yet it is wrong.

In my opinion two things should be done:

1. Standardise on UTF-8. No other encodings allowed.

2. Standardise on JSON. It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

So any utility must read and write JSON objects with some standard env set. And shells can be developed with better syntax to deal with JSON. This way you can write something like

`ps aux | while read row; do echo ${row.user} ${row.pid}; done`



>It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.


POSIX does actually define what a "text file" is, but the definition is a bit unusual:

See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

> 3.387 Text File

> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

(and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.


Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?


POSIX defines a line as:

> 3.185 Line

> A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.


An empty file is not hard to make. It's just a matter of creating the file and not writing to it.


Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".


Ahh I see what you're saying. I misunderstood at first.


What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.


I think that I hit that with Java:

    % java -Dfile.encoding=UTF-32 Test | hexdump -C
    00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
    00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
    00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
    00000030  00 00 00 0a                                       |....|
    00000034

From quick googling it seems that glibc does not support it, so it should not happen.


> it seems that glibc does not support it

`iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...


> That can only be a bug - UTF-32/UCS-4 never saw external use

I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

And what is a real reason to disallow this?


Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.


I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.


True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?


The primary purpose of command line program output is to convey information to a human, not to other programs.

Command line scripting is supposed to be adhoc and hack.


There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.


I disagree that it supposed to be adhoc and hack. Look at PowerShell.


That under limited OSes such as DOS. Under Unix, piping has been the philosophy.


JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?


> The main problem is using text as a common format between different applications.

If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

That idea is what made it possible for you to type that sentence in the first place.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: