Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I agree. From here: https://news.ycombinator.com/item?id=38865942

<< Just clarified this in the README: * Input value ranges are as follows:

- Station name: non null UTF-8 string of min length 1 character and max length 100 characters

- Temperature value: non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit >>

If we assume worst case entries: -99.9;<100 char station name>

That is 106 chars plus newline (1 or 2 chars). That could be 108 GB of data! And, I want to be so picky here: The README says "UTF-8" and "max length 100 characters". Eh, we could start a holy war here arguing about what does a character mean in UTF-8. Let's be generous and assume it means one (8-bit) byte (like C). I have good faith in the original author!

Some Googling tells me that the current fastest PCIe Gen5 NVMe drive is Crucial T700 w/ Sequential read (max., MB/s) @ 12,400. That is 8.7 seconds to read 108 GB of data. A few other comments mentioned that the sample file is only 12 GB, so that is 1 second, and 100% can be cached by Linux kernel for 2nd and later reads. However, if the test file is greater than 32 GB, you are definitely screwed for file system caching. You will have to pay a multi-second "I/O tax" to read the file each time. This seems to far outweigh any computation time.

Thoughts?

Last, I still think this is a great challenge. I think I just found my next interview question. This question is easy to understand and program a basic implementation. Further discussion can be fractally complex about the myriad of optimizations that could be applied.

No trolling: Can you imagine if someone tries this exercise in Python? There are a goofy amount of optimizations to deal with huge data in that language thanks to SciPy, Pandas and other friends. I am sure someone can write an insanely performant solution. (I know, I know: The original challenge says no external dependencies.)



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: