Unfortunately Github doesn't have a way to render symbols for whitespace, but you can tell by selecting the spaces that the previous version had leading tabs. Linus changed it so that the tokens `default` and the number e.g. `12` are also separated by a tab. This is tricky, because the token "default" is seven characters, it will always give this added tab a width of 1 char which makes it always layout the same as if it were a space no matter if you use tab widths of 1, 2, 4, or 8.
Indentation by steps of 3 spaces is common in old Fortran 77 code. This is LAPACK for example. Scroll down to about line 400 to see actual code, not comments.
Tab width of 3 is actually really nice in languages like Lua or Ruby that have `end` as a delimiter. It makes nested blocks "step" down very aesthetically.
Do all text editors render "tab" with variable width, taking you to the next tab stop? For some reason up until now I was thinking that was always a word processor thing and I incorrectly assumed (perhaps because we usually don't use tabs for their original purpose) that a tab character was always rendered as constant width.
It's a typewriter thing and meant for writing tables, so tab has always advanced the cursor to the next "tabular" stop. What would be the use case of having them fixed width?
I don't know of any text editors that don't render variable width tabs based on the tabstop position. In programming environments we usually only use tabs at the beginning of the line for indentation, it's a bit more rare to use them for alignment due to inconsistent tabstop settings.
If everybody used a sufficiently smart editor, and used sufficiently smart codegen, and we removed all the existing breakage from our tools and infrastructure, yes. I wish I lived in that world.
TBH my money is on that never happening, but maybe we will skip to an even higher level, where the canonical format for source code is the AST rather than plain text - indentation, braces, line length, all the things that humans care about but the computer doesn’t become render settings.
This gets to a dimension of the problem that is often overlooked: Git web viewers, like every other code viewer we use carries its own notion of the position of the tab stops.
Notably, this includes CLI shells, connected to a "terminal emulator", where what is being emulated is an ancient piece of hardware:
A far-downstream consequence of this is that source code formatted to an assumption of tab stops at other than 8-column intervals, as is not uncommon in Javascript, produces unreadable CLI output from diff, git-diff, ...
Parser fails to parse data --> Fixed by modifying ingested data
Just that the ingested data is a part of the Linux kernel codebase.
Quite some hubris to proceed and apply such a "fix" by making a commit to the Linux kernel...
Arguably it would be fine if the community benefited from the parser. After all it's just a custom, undocumented format that lives specifically in this one repo.
But I totally sympathise with Linus' annoyance when the issue is with an external tool, and the author didn't explain which tool or give any reason why it's hard to fix that tool.
You cant use “hubris” as a pejorative and go on to claim that Linus is the good guy. There’s clearly hubris on both sides, and only one of the sides made a big deal about it.
Developers change things for parsers all the time. That’s, like, coding.
It's both Linus's plain arbitrary right, and his plain job, his defined role and office, to make exactly such decisions for this project. That's not hubris. It's just a role that affects a lot of people.
What makes it hubris on one side is "Who do you think you are making such a change to the Linux kernel that everyone else will have to accept?"
The reason things like that are phrased as questions is to allow for the possibility that there might be an answer.
For one of these parties, the question is rhetorical.
For one of these parties, the question is not rhetorical.
You're misunderstanding the statement (probably because of hubris). Linus has a long tail of hubris-informed acts that have been blessed off, accepted, and rationalized. No one is saying this particular rejection is one of those: Just that if it's ok for Linus to be full of hubris (and it incontrovertibly is) and then it cant be a criticism of this guy either.
Hubris is overreaching. Linus is not overreaching. His reach actually is farther than yours or mine, and this act and most others are not hubris. It's not even being full of himself.
It's hubris to think that the issue with a parser to identify whitespaces properly is warranting to change code in the KERNEL of arguably the most widespread operating system in the world.
And THEN not even providing more justification for this in the description.
I'm not claiming anywhere that Linus is the good guy. Or bad guy.
In this case I agree with his stance that whatever this parser is, it should better fail harder in order to get it fixed.
And if you know how he reacts when he's making a big deal of something, you know that this one isn't one of those times...
There's only two ways to write code where the lines of text stay legible regardless of tab size configuration:
1. Use all spaces
2. Use tabs for indentation and spaces for alignment
Unfortunately, only individual developers seem competent on their own to do #2, so everyone who cares about readability inevitably practices #1 by default.
A special case of (2) that is easy to do is to use tabs* for indentation and not do column alignment at all. To be clear, by "column alignment" I think we are both referring to patterns like this:
This is, e.g., what Go uses for struct fields, and what some Python style guides use for hanging function definitions. Regardless of tabs/spaces preference, both of these are independently bad because they churn diffs unnecessarily: if you change `afterward` to `afterward2` then you need to change all the nearby lines, and likewise if you change `my_long_function` to `my_longer_function`. Some formatters, like Black, Prettier, and (mostly) Rustfmt, avoid this pattern entirely, and they are better for it.
* You can do this and still use spaces if you prefer, too.
What? How would it? Do you understand what is meant by that?
The problem being referred to here is that a change on one line causes changes to unrelated lines. It makes it harder to pinpoint which line was changed intentionally in a diff and which ones were just caused by tab issues.
Again, this makes absolutely no sense to me. I've never experienced any of the problems you're facing and I can't imagine how they'd even cause problems.
It's not clear why you're still defending this position. Feels like you made a not well considered, off the cuff remark about something. Rather than realize you were wrong and take the L, you're still doubling down? Is it that important to you?
That doesn't help the problem here. Part of the revision is still relevant (say, adding a new, longer field name to a struct) and the rest is just extra spam. But you wouldn't want to ignore the entire revision.
> Use tabs for indentation and spaces for alignment
I'm not surprised that this isn't something that projects have been able to adopt successfully very often because I've never found it very intuitive that those are separate things. In what way is "indentation" not also a form of "alignment"?
The other problem with it is that it assumes that people have visible whitespace on, and that their tools even have that option to show whitespace, otherwise it’s like navigating the Fuchsia Gym.
I don’t mind if people use tabs but mixing the two is not great.
The worst mixing I've seen: aligning on 4 spaces, but using tabs as a placeholder for 8 spaces. So first stop is 4 spaces, then tab, then tab and 4 spaces. Chaotic evil version of tabs vs spaces battle.
I’m not sure what relevance this has to files that use a mix of tabs and spaces because I’m not aware of any formatting tool that can reproduce that convention.
Knowing whether a space in a space-formatted document is for alignment or indentation is AI complete.
Aligning a character on one line with an arbitrary character on another line is purely a choice of style, not a requirement.
It is perfectly doable to do only tabs, but many end up mixing in spaces.
The curse of space-only files is in people that manage commit indentation errors, breaking auto-detection in some editors, which propagate to even more indentation errors... All it takes is an inattentive reviewer, or review-less merge.
Readability of the code is not mere style, and can directly translate into errors being more visible. Compare:
a_variable = (
'lorum ipsum dolor sit amet ' +
'my poor memory has left me quite upset ' +
'for i cannot remember what word comes next '
'in this long descriptive text' +
'surely this is bound for the incinerator ' +
'but remember any haiku can end, refrigerator.'
)
But now if I choose to align certain characters:
a_variable = (
'lorum ipsum dolor sit amet'
+ ' my poor memory has left me quite upset'
* ' for i cannot remember what word comes next'
' in this long descriptive text'
+ 'surely this is bound for the incinerator '
+ ' but remember any haiku can end, refrigerator.'
)
… the errors in the first version are now plainly obvious. (Both the missed space, as well as the missed +.)
(This is an example. Yes, there are languages for which you don't need the +. There are some for which you do, however. There are also some that resist having the + moved about: for example, in Javascript, the parens become required, or you'll trigger the horrid auto-semicolon "feature".)
Hmm? You have not changed alignment, as all your lines of code start on the same integer indentation boundary. You just moved the operator to the beginning of the line, and changed the bug between the two examples.
Either you misunderstood what we mean by alignment (single whitespace separating operators and operands are not alignment), or maybe you tried to indent the first string literal further to align the string literal start boundaries. I sincerely hope it wasn't the latter...
Putting the operator on the second line, at the same indentation level, is a perfectly fine, indentation- and alignment-agnostic stylistic choice, and has its uses.
I have changed the alignment of both ' ' and '+' characters between the two examples, from not being aligned to being aligned.
> as all your lines of code start on the same integer indentation boundary
Indentation is not the only thing "style" guides enforce.
> and changed the bug between the two examples.
The bug should be the same in both examples, though I do see I transposed a '+' into an '*'. That wasn't intentional; the two are supposed to form the same AST, but with purposeful alignment in the second making bugs in both far more visible.
(You seem to be using "alignment of characters" to mean "alignment with indentation", which is more limiting than I would take "Aligning a character on one line with an arbitrary character on another line" to be.)
Monospace is weird legacy technology and a path dependency that people are holding on to for not so great reasons.
If monospace was actually a good idea and not a technical limitation of computers from the 80s, books, newspapers, and your comment on this website would be displayed in monospace.
I dunno man, sometimes things are good in different contexts. Monospace makes stuff really easy to align when programming, and while I do like the idea of elastic tabstops and have tried it out before, in practice they don't work so well.
Frankly, I've never desired for things to be aligned in source code. Are you, like, embedding tables of numbers in your code, or just explicitly aligning the equals signs in short = 42 and much_longer = 7?
> Use tabs for indentation and spaces for alignment
The pains is, most website or editor never handled that well enough. You end up have mixed tab/space at unexpected position and never knew about it.
Just banning the tab is probably not the most 'correct' option to fix it. But it is the most feasible one to get the job done. Because fixing all the tool, editors and websites is nearly impossible for an average man.
Thankfully there's a trend of excessive merge conflicts caused by reformatters.
Reformat-on-every-commit really only works for highly-centralized, tightly-coupled, monorepo-using monolithic organizations. Basically the exact opposite of kerneldev. For those folks reformat-on-every-commit works great.
What do you mean by reformat? Any decent code formatter keeps a consistent style. Getting conflics only happens if you missconfigure your editor or don't have checks to catch invalid formatting before merging to remote.
"Our Python codebase is old af and has inconsistent styling. Should we start using black/ruff to format it?"
"Hmm that's maybe a bit too intrusive. Could autopep8 be enough? Since it follows the recommendations of the PSF, so it's a more 'official' way of formatting, and it should be more future-proof [famous last words]."
"Hey, I heard this pyxkcd927 formatter just came out, and it also automatically fixes linter warnings [meaning, different output]. Any thoughts on migrating to this?"
“If you want to change the formatting, warn people to merge any big PRs that are ready. Then, you fix any merge conflicts that result in the outstanding feature branches. Do this twice; once as a dry run where you get the tests to keep passing, and once for real while the team pauses merges for a short window.”
In general, any software engineering thing that causes multiplicative work for the rest of the org should be handled this way.
For instance, at a prior job, it cost 100+ engineers a week or so of productivity to rename master to main because tooling and automation had hardcoded master all over the place.
It cost the person that initiated the change an hour or so, and I’m sure it helped their promotion case.
Of course, accounting didn’t compute the cost of this to the business ahead of time. If they had, I’d hope they would have insisted we spend the money on recruiting and hiring more diverse candidates instead.
It's much worse for monolithic organizations, because they develop complex software and reformatting scrambles code history and it becomes difficult to untangle business logic.
Your comment gave me a mental image of looking through file change history in 3d, each version of the document layered on top of another translucent in 3d and scrolling through them with the wheel
actually, what we want is code management tools that work with tokenized code and do not depend on formatting. i want a diff tool that shows me exactly which tokens have changed, and which haven't, regardless of how they are laid out. when we get that, then we should get even less merge conflicts.
The fact that diffs can be used to drive a 3-way merge is in fact an accidental property that arises due to the sheer crudeness of the line-based diff format. As soon as you start using more-sophisticated diff formats, solutions to "the diff problem" no longer lead directly to solutions to "the merge problem".
And I'd prefer a system with an AST as the canonical representation, allowing each programmer to set their own display formatting options entirely independently of the underlying codebase.
Realistically, either of these options would only really work for tools and codebases designed with these workflows in mind, as they'd be useless for inputs to inherently text-based tools like the C preprocessor.
allowing each programmer to set their own display formatting options entirely independently of the underlying codebase.
That's the easy case.
The hard case is situations where the formatting (whitespace, etc) is not simply some deterministic function applied to the AST. For example: comment placement and tabular layout of repetitive constant definitions. Or situations where vertical alignment is used to call out parallel structure.
Mathematicians have been using horizontal alignment to call out important structure in equations (both written and typeset) for hundreds of years. Here's are two examples if you don't know what I'm talking about:
The people who stubbornly insist that programmers have no need for this sort of expressiveness seem like innumerate cavemen, frankly. They cheapen our field, reducing it from informatics to mere keyboard labor.
The urge to run code formatters is a "language smell" indicating that your language has way, way, way too much odd and irregular syntax. See Haskell for a counterexample. Sure, there are a few code formatters for Haskell, but nobody really uses them. The language's syntax is so minimal and clean that there just aren't all that many preference-based choices to be made.
That's only one formatter; the latter is a fork of the former... which is the in-house formatting tool of one particular consulting firm. It has only existed since 2018, and is said to be mainly a response to golang/rust people trying to impose their culture on the Haskell world.
Pointing at ormolu is kinda like saying that formatters are popular in the Java world because google-java-format exists. The fact that that project exists says more about the company that created it than the community around the language it operates on.
My bad, I should have added "/s" (because Cunningham's Law). It was a reference to Futurama, where a problem was not solved at all.
But on a more serious note, in my experience I've not had any issues with Go or Rust codebases (for example). Not using their formatters is heavily frown upon, so I haven't really seen any reformat happen at all; not in my bubble at least[1].
Other languages, on the other hand? Yeah, good luck with trying to have consistent formatting. Even if a project has formatting rules "enforced", there's always (always) going to be an exception, bikeshedding, etc.
[1]: Unless it's someone obviously very junior. The few times I've noticed badly formatted code in Go, has been in random repos from someone who clearly didn't have that much programming experience in general (looking at how code was written).
Indent with spaces: When the ASCII art needs to line up.
Indent with tabs: To save time time adding/deleting characters or squinting at exact boundaries.
Better than both: Files are saved with spaces, but the code-editor is smart-enough to act as if the multiple spaces are a tab during editing or when setting a custom tab-width.
I have yet to see an editor that will truly work with spaces that way. Sure, the tab key will insert spaces, and the backspace will remove multiple spaces if you’re at a whitespace boundary, but I still have a much smaller click target when selecting the beginning of code (i.e. 1/4 the amount of space to click to get the cursor directly before the first non whitespace character, or when dragging to select the code on a line).
I always find that I’m clicking a bit too early in the “tab,” not selecting where I want, and then backspace is treating it like normal spaces again so things aren’t aligned.
I’d also love if it could render “tabs” of spaces at different widths, so that if one person wants to look at 2-space code, and another finds it more readable as 8-space, they have that option.
And anyway, all of this is only to help when a codebase is spaces when you want tabs, because I don’t see much benefit to saving out spaces if you’re going to go to all of this effort to emulate tab behavior.
While you can definitely get a speed up from going keyboard only and mastering hotkeys, you are way faster at selecting some random block of code with the pointer and retargeting to somewhere else to paste etc until you have them solid.
For everything I write, that marginal speed improvement isn’t even close to worth the learning cost of 100% ditching the mouse. I’m bottlenecked on deciding what I want to express, almost never how quickly I can enter it into the computer. I just want it to not be a “clunky” experience, i.e. once I decide to select a block of code to move it somewhere, I want that to be painless.
For that, tabs give a nice large margin for error on the click targets, which means I can move faster.
Edit: sorry someone else already said it in another part of the thread
The reason to use spaces is it keeps any alignment of multiline statements the same. You're not supposed to be able to adjust it because adjusting it means you can't do alignment, unless you use mixed tabs and spaces.
In my opinion, you can generally write very nice looking code even if the indentation of the multiline statement isn’t exactly matching the brace or whatever.
But as you say, you can still use spaces for alignment. I find it to be kinda meh because if the alignment has to change (like adding a longer variable name) you end up touching lines you didn’t otherwise have to in your diff.
Also, if we’re talking about editors doing nice things with tabs, I think you could make an editor look at the tab in and align things exactly how you want in rendering for a multiline statement.
I found one attempt to quantify the issue [0] which found differences which I don't think are very compelling, especially when you consider how insanely cheap disk space is compared to even a tiny time-savings in developer workflow.
Here, some napkin math just to put the orders-of-magnitude in perspective:
1. Assume the entire kernel repo is 1.5GB.
2. Assume your format shrinks it to 1.0GB.
3. Assume a $50 SSD with 1024 GB capacity.
4. ((1.5-1.0) / 1024) * $50 ≈ $0.025
5. Assume an *underpaid* developer at $20/hr.
6. $0.025/$20 * (60*60) = 4.5 seconds
So the disk-space benefits of reformatting to get the kernel-repo down by a (huge) -33% is wiped out if the (underpaid) developer ever spends more than +5 seconds dealing with the formatting.
Because the prevailing convention in many languages is to use spaces. In a vacuum, I slightly prefer tabs to spaces for indentation. But I think being consistent with most other code in a language or project is more important than my personal preference.
Because spaces are guaranteed to layout the text to a predictable format, whereas tabs are not since the size can be adjusted and tab-stops don't necessarily line up with obvious word boundaries on the lines above.
The correct way to use tabs is to indent with tabs and align with spaces, that way everyone can adjust the tab size to their liking without breaking alignment.
Then you've got mixed tabs and spaces, which seems like it's going to inevitably lead to confusion when someone who isn't aware of the convention starts to make changes. Two different invisible characters in a single file is a recipe for madness.
I've been writing (and reading) a lot of Lisp lately, and have seen some projects adopt indentation idioms which offset blocks from the regular tab-stop intervals. A standard tab-stop might be +4, but then arguments might be lined up with each-other at +6 (+1.5 stops):
(list 'a
'b
'c)
Any of those arguments could be forms, which might return to the standard tap-stop of +4 but off by 2 (let's call that `+4(+2)`):
(list 'a
(with-foo quz
(bar quz))
'c)
Lists written with the `quote` syntax might do the same thing, but produce an offset of only 1:
'(a
(b
(c)))
The end result is that changing a code's indentation level (like when raising an anonymous function into a top-level definition) means adjusting the spaces (which were supposed to only be for alignment) if any preceding forms had introduced an offset. The alignment doesn't change, but it's relation to regular tab-stops does. Sometimes, this is an adjustment of zero, but you still need to think about it. In an all-spaces environment you could use vim's visual mode to select the block as a rectangle and paste it into the top level, but with "indent/align" semantics this same situation regularly requires maintenance (by inserting or removing spaces, in addition to manipulating indentation via tabs) to preserve both proper alignment and the integrity of the semantic distinction[1].
I think that elastic tab-stops fix this by correctly modeling indentation is it should be conceptualized, platonically divorced from the characters encoding it. I think that spaces are better than tabs when white-space isn't significant (and personal style might disregard tab-stops), but recognize that tabs are more accessible and worth using when tab-stops are enforced as a byproduct of significant white-space. I repeat that elastic tab-stops are awesome.
1: Someone correct me if vim can actually just do that with tabs just fine, i dont really know, it late~
That doesn't work when alignment depends on the tab size like aligning an indented line below an unindented line like long assignments
Type myvar = something_very_long +
that_needs_alignment;
The only way to guarantee such complex alignment is using spaces. It renders everywhere exactly the same. So reviewing stuff on web-based environments gets easier.
“correct way”? This is just your opinion. Everyone has one. Want to read mine? “A text editor should never replace what I type”. Want another opinion? “An editor should always distinctively display white-space characters”.
If someone finds it more readable to have their tabs smaller to fit on a small screen, or larger to fit on a large screen, why deprive them of that?
It’s perfectly fine to still have a canonical width if you want to use it for other reasons, like Linux with their “tab is 8 spaces when deciding whether a line is too long” thing.
May as well mandate a specific font and color scheme if the goal is to control how other people view code.
Tabs are self-consistent, which is usually all that matters. If for some reason you need greater control (E.G. ASCII-art spanning multiple indentation levels), then that should be documented instead.
Honestly, my reason for liking spaces is "I know what every invisible character in my code is". Since I obviously cannot stop using spaces altogether, I use them everywhere.
Also at work we all use 4 spaces for indentation. Nobody wants a smaller number. We have huge screens, we're not in the 80s anymore
> We have huge screens, we're not in the 80s anymore
I would like to fit more text on my modern screens, and 4-space indentation and 120+ char lines limit how many files I can view side-by-side while maintaining legibility.
I still try to keep my lines around 80 chars long in 2024 for this reason.
> We have huge screens, we're not in the 80s anymore
Huge screens are useful to show more content e.g. multiple buffers, ancillary information (like docs).
I’m not interested in wasting that additional work space in emptiness because you can’t keep to reasonable line lengths.
I use 4 spaces and short lines because it naturally limits how far I can drift rightwards before I need to refactor or rethink what I’m doing.
Thought I’ll readily confess that rust can be mildly annoying there: lots of things require additional blocks, and the affordances of function scope around split borrows can make extracting utility functions awkward.
> "I know what every invisible character in my code is"
Wouldn't that be possible even when (exclusively) using tabs for indenting? Admittedly it's been a hot minute since I last programmed and I'm far more familiar with borland's ide with turbo c++ than VS code (so I don't know what the latest formatting works/features are), but if you only indent with tabs isn't that good enough? Also, do multiple spaces not take much more time to type out? Or have you remapped your tab key to multiple spaces?
Ha, ha. When I first sat on a (then new) 24" wide screen (I remember when 19" CRTs where huge), I actually experienced something alike vertigo. Happened never again, but I haven't tried one of those concavely curved monitors yet ...
Replace the third guy with "tabs with editor configured to three-space width" and you have an accurate representation of my experience. I don't get why it gets so many people upset, even though it's obviously the sweet spot for visually distinct indentation and minimal actual indentation (not my own idea, I think the elastic tab-stops guy suggested it in one of their blogs).
The last time I mentioned this on mastodon someone responded what I consider the best possible way to answer to all of this:
> The optimal tab width is e. It's wide enough to be obvious, but not so wide as to reduce line length too much. Plus, it discourages the use of tabs for alignment by being the number most difficult to approximate with any integer number of spaces.
... and when I say "best possible way" I mean "trolls everyone while making a solid point"
The guy could've just said it was a minor formatting change and had a better chance of getting the PR approved, instead of presenting the nonsensical reason (the parser)?
It's good that Linus is really exercising those 3rd party tools! They should send some money his way for helping them test their code.