Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Except that the "comma" was a poor choice for a separator, the CSV is just a plain text that can be trivially parsed from any language or platform. That's its biggest value. There is essentially no format, library, or platform lock-in. JSON comes close to this level of openness and ease, but YAML is already too complicated as a file format.


The notion of a "platform" caught my attention. Funny story: About five years ago, I got a little nostalgic and wanted to retrieve some data from my Atari XL computer (8-bit) from my preteen years. Back then, I created primitive software that showed a map of my village with notable places, like my friends' homes. I was able to transform all the BASIC files (stored on cassette tape) from the Atari computer to my PC using a special "SIO2PC" cable, but the mapping data were still locked in the BASIC files. So I had the idea to create a simple BASIC program that would run in an Atari 8-bit PC emulator, linearize all the coordinates and metadata, and export them as a CSV file. The funny thing is that the 8-bit Atari didn't even use ASCII, but an unusual ATASCII encoding. But it's the same for letters, numbers, and even for the comma. Finally, I got the data and wrote a little Python script to create an SVG map. So yes, CSV forever! :)


And the best thing about CSV is that it is a text file with a standardized, well known, universally shared encoding, so you don't have to guess it when opening a CSV file. Exactly in the same way as any other text file. The next best thing with CSV is that separators are also standardized and never positional, you never have to guess.


Technically there is a CSV standard in IETF RFC 4180, although compliance isn't required and of course many implementations are broken.

https://www.ietf.org/rfc/rfc4180.txt


Almost missed the sarcasm :)


Try exporting things from Excel to CSV on a Mac with non-us locale.

Some genius at Microsoft decided the exporting to CSV should follow the locale convention. Which means I get a "semicolon-separated value" file instead of a comma-separated one, unless I change my local to us.

Line breaks are also fun...


JSON has the major annoyance that grep doesn't work well on it. You need tooling to work with JSON.


As soon as you encounter any CSVs where field values may contain double quotes, commas, or newlines, you need tooling to work with CSV as well.

(TSV FTW)


TSV is superior to CSVs, and it still angers me that Excel doesn't offer it as a standard input option, but your examples are fairly easily handled by eye in a text file.

Tools definitely make it faster and more reliable.


One of my first tasks as a junior dev was replacing an incorrect/incomplete "roll your own" CSV parsing regex (which broke in production) with a library.


ASCII FS GS RS US ... just make decent font entries for them.


And keys on the keyboard.


Yes! But nobody ever came up with decent font entries that would look snappy on keys. Not even IBM (or Data General or Burroughs or whoever) I guess.


For this I use gron [0]. It's very convenient.

[0]: https://github.com/tomnomnom/gron


grep is a tool. jq is a good tool for json.


grep is POSIX and you can count on it being installed pretty much anywhere. That’s not the case for jq.


Do people contain themselvs to a POSIX conformant grep subset in practice, or do you mean GNU grep that probably doesn't behave according to spec unless POSIXLY_CORRECT is set?


"Anywhere" does not include Windows environments, which are over half the work computers out there.


If a workstation has Git installed on it, which I’d think would be the case for substantial number of engineers out there (…not just software engineers), grep is there due to Git BASH.


Arguably, "comma as a separator" is close enough to comma's usage in (many) written languages that it makes it easier for less technical users to interact with CSV.


Easier as long as they don't try to put any of those written languages in the CSV

Commas and quotation marks suddenly make it complicated


100%.. xml also worked here too..

YAML is a pain because it has every so slightly different versions, that sometimes don't play nice.

csv or TSV's are almost always portable.


I'd say that is not its biggest issue. The way to escape things is by far its biggest issue, a passwd like \, \", \\ would have been far easier.


What separator would be better?


The comma makes it more human-readable. What separator would you suggest?


So ASCII actually had dedicated characters for this, 0x1C-0x1F. The problem is that they are non-printing.

Unicode has rendered analogs, U+241C-U+241F, but they take more bytes to encode, which can significantly increase file size in large USV files.

So my ideal would be to use ASV files rendered as USV in editors.

https://github.com/SixArm/usv


The benefits are that ASV / USV files are trivial to parse with simple string splitting since you don't have to worry about nesting and quoting.

Here's an example of what a USV looks like:

Folio1␟␞ Sheet1␟␞ a␟b␟␞ c␟d␟␞ ␝ Sheet2␟␞ e␟f␟␞ g␟h␟␞ ␝␜ Folio2␟␞ Sheet3␟␞ a␟b␟␞ c␟d␟␞ ␝ Sheet4␟␞ e␟f␟␞ g␟h␟␞ ␝␜


The comma is too prevalent in the data to be a suitable separator. A semicolon would be a better choice.


"|" looks pretty good (and is relatively rarely-used).


|| separated for life




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: