Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used the URL gemini://adele.pollux.casa/gemlog/2025-08-04_why_I_prefer_human-readble_file_formats.gmi (the one linked to directly does not work on my computer).

I prefer binary file formats (including DER) for many things, and I will respond to the individual parts as well as my own comments.

> With human-readable formats, you're never locked out of your own data. Whether you're on a fresh Linux installation, a locked-down corporate machine, or troubleshooting a system with minimal tools, you can always inspect your configuration files, data exports, or documentation with nothing more than `cat`, `less`, or any basic text editor.

This is helpful especially for documentation.

However, not all of the formats for data are going to be that easy to inspect in this way, and the meaning is not always clear even if it is a text format. Additionally, even if it is clear when reading does not necessarily mean that it is convenient to modify it.

Furthermore, the use of text formats means that escaping may be needed, which can complicate the decoding and encoding, and can also be the "leaning toothpick syndrome".

> A JSON configuration file works the same way whether you're viewing it in VS Code, vim, or even a web browser. This universality means fewer barriers to collaboration and fewer "it works on my machine" moments.

Although it can be displayed the same way (especially if it is in purely ASCII format), it can be difficult to read if packed together and inefficient if formatted nicely, and there is the issue with escaping that I mentioned above. When editing JSON, there is also that JSON does not have comments and that optional trailing commas are not allowed, which can make it inconvenient to modify.

JSON, CSV, etc have their own limitations though, which can be problems for some uses (e.g. storing binary data together with text, storing non-Unicode text data, and others). Sometimes this will then extend to the formats made using them, because they had not considered that.

> Digital archaeology is real, and proprietary formats are its enemy. How many documents from the 1990s are now trapped in obsolete file formats?

It is not quite that simple. I had sometimes found them easier to figure out than some modern text-based formats.

> Ok, sometimes, there are some character encoding conversions needed (you see, CP-1252, EBCDIC, IBM-850, ISO8859-15, UTF-8), but, these operations are easy nowadays.

That is also often done badly. There are good ways to handle character encoding, including ways that do not involve conversion, but they are not as commonly supported by some modern programs.

> Need to bulk-update settings? Write a simple script or use standard text processing tools.

This is not always good, depending on the format and on other things. (SQL might work better for many kind of bulk updates.)

> You don't need expensive software licenses, proprietary APIs, or vendor-specific tools to work with your data.

You don't need those things for many binary formats either. Sometimes you might, but if it is designed well then you shouldn't need it.

> The entire Unix toolchain, `grep`, `sed`, `awk`, `sort`, `cut`, becomes your toolkit. Want to extract all email addresses from a CSV? `grep` has you covered.

For some simple formats, especially TSV, it might work, but not all text-based formats are like that.

> JSON, XML, CSV, these formats have multiple independent implementations, comprehensive specifications, and broad community support.

Yes, but so do many binary formats, such as DER (I wrote my own implementation in C).

> Version control systems like Git are optimized for text, and human-readable formats take full advantage of this optimization. Line-by-line diffs become meaningful, showing exactly what changed between versions. Merge conflicts are resolvable because you can actually read and understand the conflicting content.

It is true that such version control systems are made to work with text formats, but that does not mean that it should have to be. Also, it does not mean that it handles the details of all possible text-based formats.

For CSV it might work, but different formats, whether text or binary, cannot always resolve merge conflicts so easily, or identify the differences in a better way so easily, due to various things, such as the way that blocks might work in a format, and that some formats use indentation-oriented syntax, lack of trailing commas in JSON, etc. Automatic merging does not necessarily have the correct result either, and must be corrected manually, regardless of it is a text format or a binary format.

In showing what is changed also, you might want to know what section it is in, in a text-based format that works in that way. When making the diffs and merges for specific formats, they could be made for binary formats also.

> Text-based formats are often surprisingly compact, especially when compressed. They parse quickly, require minimal memory overhead, and can be streamed and processed incrementally. A well-structured JSON file can be more efficient than a complex binary format with similar information density.

Although compression can help, it just adds another complexity to the format. The parsing can require handling escaping, and the streaming can depend on the specific uses. A binary format does not have to be complex, and can store binary data directly.

> Command-line processors like `jq` for JSON or standard Unix tools can handle massive files with minimal resource consumption.

Sometimes it can apply, although programs can be made for other formats as well. I had some idea how to make something to work for DER format. For some formats, there already are.

> These formats also represent a philosophy: that technology should serve human understanding rather than obscure it.

Using JSON or XML will not solve that. What helps is to have better documentation.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: