The main problem is using text as a common format between different applications...

poincaredisk · 2024-10-29T11:02:14 1730199734

>It is good enough to serve as universal exchange format, tools like `jq` exist for some time now.

Please don't use that underdefined joke of a spec. Define "PosixJson" and use that instead. Right now it's not even clear what the result of parsing {"a": 1234678901234567890} is. Is this a parse error? A bigint? A float/double? Quiet wraparound? Something else? I've seen all these behaviors in real world JSON implementations across different languages.

aloisklink · 2024-10-29T10:59:19 1730199559

POSIX does actually define what a "text file" is, but the definition is a bit unusual:

See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

> 3.387 Text File

> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

(and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

WJW · 2024-10-29T11:48:13 1730202493

Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?

Ukv · 2024-10-29T14:19:31 1730211571

POSIX defines a line as:

> 3.185 Line

> A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

rascul · 2024-10-29T12:31:31 1730205091

An empty file is not hard to make. It's just a matter of creating the file and not writing to it.

WJW · 2024-10-29T12:56:08 1730206568

Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".

rascul · 2024-10-29T13:43:30 1730209410

Ahh I see what you're saying. I misunderstood at first.

arghwhat · 2024-10-29T10:10:39 1730196639

What cursed madness have you hit that spits out UTF-32 under normal conditions?! That can only be a bug - UTF-32/UCS-4 never saw external use, and has only ever been used for in-memory fixed-width character representation, e.g. runes in Go.

You never have to worry about whether you're dealing with ASCII vs. UTF-8, but rather if you're dealing with UTF-8 vs. ISO-8859-1, or worse, Shift JIS or similar.

vbezhenar · 2024-10-29T10:55:28 1730199328

I think that I hit that with Java:

    % java -Dfile.encoding=UTF-32 Test | hexdump -C
    00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00 6c  |...H...e...l...l|
    00000010  00 00 00 6f 00 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
    00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00 64  |...o...r...l...d|
    00000030  00 00 00 0a                                       |....|
    00000034

From quick googling it seems that glibc does not support it, so it should not happen.

Netch · 2024-11-02T11:37:10 1730547430

> it seems that glibc does not support it

`iconv` does, and this is enough in common. Among with tons of eerie EBCDIC/whatever...

Netch · 2024-11-02T11:35:46 1730547346

> That can only be a bug - UTF-32/UCS-4 never saw external use

I regularly use `iconv -t utf-32be | hd` to look what a bizarre sequence is denoting yet another weird symbol like an itchy hedgehog.

And what is a real reason to disallow this?

ezoe · 2024-10-29T09:33:47 1730194427

Don't even assume UTF-something is the only character encoding. There are so many existing character encodings before Unicode. It's still widely used.

oneeyedpigeon · 2024-10-29T10:21:35 1730197295

I think a lot of tools should support json as well as plain text. Probably the latter by default, and the former with a "-o json" or similar option. I'm fine with wc giving me `5`, I'd prefer that to `{ "characters": 5 }`.

anal_reactor · 2024-10-29T08:42:07 1730191327

True, but this would be immensely difficult to pull off, because how do you convince other people to write programs that produce actual working JSON?

nly · 2024-10-29T09:33:58 1730194438

The primary purpose of command line program output is to convey information to a human, not to other programs.

Command line scripting is supposed to be adhoc and hack.

consteval · 2024-10-29T13:38:57 1730209137

There are exchange formats that are well-defined enough to be useful to many computers while also being readable enough to be traversed by human eyes. There's no reason to everything ad-hoc, you don't get much by that. You also control the shell itself - there's no reason you can't display object representations in a pretty way.

mdavid626 · 2024-10-29T09:39:23 1730194763

I disagree that it supposed to be adhoc and hack. Look at PowerShell.

anthk · 2024-10-29T12:54:36 1730206476

That under limited OSes such as DOS. Under Unix, piping has been the philosophy.

matrss · 2024-10-29T11:11:03 1730200263

JSON itself is bad for a streaming interface, as is common with CLI applications. You can't easily consume a JSON array without first reading it in its entirety. JSONL would be a better fit.

But then, how well would it work for ad-hoc usage, which is probably one of the biggest uses of shells?

pif · 2024-10-29T09:57:16 1730195836

> The main problem is using text as a common format between different applications.

If you can't get the immensity of the cleverness of Unix foundations, you should not talk about them.

That idea is what made it possible for you to type that sentence in the first place.