Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They claim a 3GB/s improvement versus previous version of sep on equal hardware — and unlike “marketing” benchmarks, include the actual speed achieved and the hardware used.


Do note that this speed even before the 3GB/s improvement exceeds the bandwidth of most disks, so the bottleneck is loading data in memory. I don't know of many applications where CSV is produced and consumed in memory, so I wonder what the use is.


"We can parse at x GB/s" is more or less the reciprocal of "we need y% of your CPU capacity to saturate I/O".

Higher x -> lower y -> more CPU for my actual workload.


Slower than network! In-memory processing of OLAP tables, streaming splitters, large data set division… but also the faster the parser, the less time you spend parsing and the more you spend doing actual work


This is honestly something that caught me off-guard a bit. If you have good internal network connectivity, small queries and your relational database has the data in memory, it can be faster to fetch data from the DB via the network than reading it from disk.

Like, sure, I can give you an application server with faster disks and more memory and you or me are certainly capable of implementing an application server that could load the data from disk faster than all of that. And then we build caching to keep the hot data in memory, because that's faster.

But then we've spent very advanced development resources to build a relational database with some application code at the edge.

This can make sense in some high frequency trading situations, but in many more mundane web-backends, a chunky database and someone capable of optimizing stupid queries enable and simplify the work of a much bigger number of developers.


You can also get this with Infiniband, although it is less surprising, and basically what you’d expect to see.

I did once use a system where the network bandwidth was in the same ballpark as the memory bandwidth, which might not be surprising for some of the real HPC-heads here but it surprised me!


Decompression is your friend. Usually CSV compresses really well.

Multiple cores decompressing LZ4 compressed data can achieve crazy bandwidth. More than 5 GB/s per core.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: