Hacker News new | past | comments | ask | show | jobs | submit login

I don't disagree with you at all. My point was more like what another commenter said, that software adheres to a strict and very finite set of rules, the real world is way more complicated than that. It's so trivially easy to find real world counterexamples to just about any software that it's a barely interesting exercise (IMO). So you define a reasonable subset and work with that. And the reasonable subset is probably defined by positive/negative outcomes.

It would have been cool if the blog post discussed those outcomes so we can reason about it properly, otherwise it's just a list of claims at face value. If the programmer making an assumption means a screen at a gate says the wrong boarding time when there's a human there controlling the boarding, then not the end of the world. But if the programmer making an assumption causes 1/10000 flights to crash, then that's interesting and worthwhile calling out. It's just endless speculation without a proper outcome to tie it down.






At a general level I think these lists make developers more aware of uniqueness and constraints.

When designing data I think these questions (skepticisms) should be front of mind;

1) natural values are not unique.

2) things identified by number are best stored as a string. If you're not going to do math on it, it's not a number. That "customer number" should be treated as "customer id" and as a string.

3) be careful constraining data. Those "helpful checks" to make sure the "zip code is valid" are harmful not helpful.

4) those tiny edge cases may "almost never happen" but they will end up consuming your support department. Challenge your own assumptions at every possible opportunity. Never assume anything you "know" is true.

It's hard to measure time saved, and problems avoided, with good design. But it's easy to see bad design as it plays out over decades.

And (especially today) never optimize design for "size". Y2K showed that folly once and for all.


> 2)

This implies denormalization, which is rarely needed for performance, despite what so many believe. Now you’ve introduced referential integrity issues, and have taken a huge performance hit at scale.

> 3)

I mean, maybe don’t try to use a regex on an email address beyond “is there a local and domain portion,” but a ZIP code, as in U.S. only, seems pretty straightforward to check. I would much rather have to update a check constraint if proven wrong than to risk bad data the rest of the time.

> never optimize for size

Optimize for size when it doesn’t introduce other issues. Anyone working on 2-digit years could have and likely did see that issue, but opted to ignore it for various reasons (“not my problem,” etc.). But for example, _especially_ since Postgres has a native type for IP addresses, there is zero reason to store them as strings in dotted quad. Even if you have MySQL, store them as a UINT32, and use its built-in functions to cast back and forth.


>It's so trivially easy to find real world counterexamples to just about any software that it's a barely interesting exercise (IMO).

These lists hopefully make programmers aware that a lot of their assumptions about the real world might be wrong, or at least questionable.

Examples are assumptions on the local part of email addresses without checking the appropriate RFCs. Which then get enshrined in e.g. JavaScript libraries which everyone copies. I've been annoyed for the last 30 years by websites where the local part is expected to be composed of only [a-z0-9_-] although the plus sign (and many other characters) are valid constituents of a local part.

Or assumptions on telephone numbers. Including various ways (depending on local culture) of structuring their notation, e.g. "123 456 789" versus "12-3456-89" where software is too dumb to just ignore spaces or dashes, or even a stray whitespace character copied by accident with the mouse.

And those forms where you have to enter a credit card (or bank account number) in fields of n characters each, which makes cut/copy/paste difficult because you notes contain it in the "wrong" format.

So while some examples may count as "just usability" it all stemps from naive assumptions by programmers who think one size fits all (it doesn't).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: