Hacker News new | past | comments | ask | show | jobs | submit login

I would say C makes this sort of thing far more likely because it's usually a ton of effort to obtain suitable containers. In C++ or Rust they have plenty of things like `unordered_set`/`HashSet` built in, so people are much more likely to use it and not go "eh, I'll use a for loop".

In this case Git already had a string set, but it's still not standard so there's a good chance the original author just didn't know about it.






> In this case Git already had a string set, but it's still not standard so there's a good chance the original author just didn't know about it.

The original commit was made in January 2009 (https://github.com/git/git/commit/b2a6d1c6868b6d5e7d2b4fa912...), strmap was added in November 2020 (https://github.com/git/git/commit/ae20bf1ad98bdc716879a8da99..., strset was added a few days later: https://github.com/git/git/commit/1201eb628ac753af5751258466...). It was first proposed in 2018 (https://lore.kernel.org/git/[email protected]... the proposal specifically mentions it fixing possibly quadratic sites).

As noted in the comment, git did have a sorted string list with bisection search, and that's from 2008 (and it actually dates back to 2006 as the "path list" API, before it was renamed following the realisation that it was a generalised string list). Though as the hashmap proposal notes, it's a bit tricky because there's a single type with functions for sorted and functions for unsorted operations, you need to know whether your list is sorted or not independent of its type.


I've seen this exact problem in C code many times in my life, especially in kernel space where data structures and memory allocations are fun.

Ironically, this is much _faster_ for small sets. Sometimes the error is intentional, because the programmer believes that all inputs will be small. IME, those programmers were wrong, but that's the inverse of survival bias.


And in fairness I can understand not considering someone might eventually be bundling a repository with tens of thousands of refs back in early 2009.

Even now, the contrast between repository sizes is wide. Most repos contain 1000s of references, which while not the best to run O(N^2) algorithm, is still okay. But as a Git forge, you also see a share of repositories which contain millions of references.

Yeah, not something that WG14 will ever care about.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: