> But after closely studying Git I'm a little bit awestruck; Torvalds is a frickin' genius, a true visionary, and somehow managed to just "get it" and instantly, in a flash of insight, come up with "the solution" for version control.
Wait, what? Git was the response from Torvalds to BitKeeper being proprietary and the author being upset with some kernel developers. It's basically taking what was then state of the art for DCVS and reimplementing it, not coming up with some new paradigm. It's a good program, and to my knowledge it was engineered well, but let's not rewrite history so Torvalds is the father of DVCS.
Linus was moving off of BitKeeper, which obviously was some of the inspiration for git - but the underlying model is nothing alike. Git's data model is much simpler than CVS, RCS, Subversion, BitKeeper and basically anything else that existed at the time, and it was also much faster. The command line interface doesn't reflect this properly, but it's the underlying model that counts.
Git is nothing like a reimplementation of BitKeeper - the closest inspiration and model is actaually Monotone, which had the same basic concepts but was slow and clunky. IIRC, Torvalds credited monotone/Hoare for using merkle trees (and other things)
Furthermore, Linus wasn't just moving off of BitKeeper, he had to and quickly.
Tridge (another kernel dev) reversed BK protocol and repo structure, as trivial as they were. BK people got understandably upset about this, in part because they were providing BK to the Linux Kernel Project for free. So Tridge's little stunt basically killed any and all goodwill between BK and LKP and Linus needed a replacement. All alternatives had issues and everyone just kept going in circles, bitching and moaning. So in a true programmers way Linus sat down and wrote what he wanted. The end.
The first mistake was for a free software project to require a proprietary one to participate in development.
Bitkeeper's sole claim of ownership moral or legal was to the bitkeeper software which nobody violated. What Tridge did was come up with another way for the owners of the relevant data to access their own information something wholly reasonable and justified.
What you dismiss as a stunt proved in a stroke how much of a mistake the relationship with Bitkeeper was.
This is approximately like sony records telling you that you can't take your cd and put it in a Samsung stereo and you defending sony except no reasonable person supposes that creators of material objects aquire moral rights to control the use thereof.
What Tridge did was something to prove a point rather than to make a feasible BK alternative, so it was ultimately a stunt. I don't dismiss the importance of it and LKP should've not indeed depend on a proprietary tool, but it was one of less civil ways to go about the issue.
Stunt? Here is the description of the oh-so-cunning reverse engineering: Tridgell telnetted to the port and typed 'help': https://lwn.net/Articles/132938/
This does not matter one bit. Bitkeeper offered the use of a product under a condition, in other words, a license. And it offered the product for free, expecting the license to be honoured. Every open source developer should respect and uphold that agreement, once entered by both sides.
Your personal opinion of the license terms is irrelevant. You respect other people's license, because you want them to follow yours. If developers can not grasp that, how do we expect users to?
Using a license to limit others freedom to access and use their own data to maximize the opportunity for the developer to profit is a bug not a feature and its nothing anyone least of all developers ought to respect. Not only is the world composed of 99.99% users 0.01% developers but even developers are net consumers of software consuming 1000 x what they will ever create and profit from.
Everyone's freedom is so vastly more important than a trivial few's wholly imaginary "right" to profit. Software is anymore the building block of civilization, culture, business. Valuing the right to restrict others over everyone's interests is a strange inversion of priorities.
Why ought someone expect people to be worried about politeness insofar as accessing their own data. Conflict solely existed based on a badly thought out expectation on the part of bitkeeper. If you look at the reaction this wasn't going to be something that could be delicately handled in any case.
> The first mistake was for a free software project to require a proprietary one to participate in development.
This isn't true at all. Anyone could download the source code, tweak it, and submit patches. You can even use your own version control system to track your changes.
The project managers may have used bitkeeper, but to everyone else the only thing that bitkeeper provided was, at best, convenience.
This is extremely disingenuous. Pretending that the primary workflow of a group and metadata about its work amounts to convenience seems poorly thought out on your part.
It's also entirely irrelevant if people use different tools to do their job. No one was hindered by anyone else's personal choice of tools. Anyone in the world was free to download the source code, change it, and contribute patches if they seed fit. Bitkeeper did not hindered this, nor did the linux development process changed once bitkeeper was replaced.
BitKeeper was better than the alternatives. This was literally Linus' rationale for using it.
Nothing what you say in the second paragraph contradicts what I said, but it also doesn't address the point at all. Do you understand why a difference in tooling and ease of working reduces openness?
The BitKeeper developer was an ass, not need to be polite about it. He issued DMCA requests to websites that were analyzing his license, claiming that it was copyrighted.
This just supports what I said. Not sure you looked closely enough. BitKeeper uses SCCS format on the wire, which didn't need to be reverse engineered as it was a known format.
Content-addressing (ie hashes) enables decentralization (hashes are the same whereever you are) and integrity (changing content will change the hash). (Perhaps inspired by Tridge, who did rsync which uses hashes to test for content changes?) It's incredibly fast because commits are almost as simple as copying, and because Linus knows how to C.
Linus left out some features, like renames (which bitkeeper has), so this simple idea would be enough.
Some people rave about the index/cache, though it's separable from the core idea.
> The command line interface doesn't reflect this properly, but it's the underlying model that counts.
Could you explain why, for a tool meant to be used by humans, the underlying model (which I agree is very elegant) counts more than the user interface?
A source code version control system has to be in control of the source code. The model clearly is important to whether git is in reliable control of the source we entrust to it.
Function matters more than anything else... and exactly what "function" means and encompasses can be discussed, but it's difficult to define function such as to exclude git's model from git's function.
In any software program, how you model the data tends to be at the root of all else in the program. If you screw that up and make it complex unnecessarily, you will have an inferior program. You can put a bad UI on a good model, but you cannot expect much stability from any UI on a bad model.
It's just like the movies. You can make a bad movie from a good script, but a bad script never, ever made a good movie.
the underlying model (which I agree is very elegant) counts more than the user interface?
It does not. But git exposes this all the time, so committed git users feel that's a good thing, because understanding it makes helps them to make sense of some of the worst parts of the UX.
Git is the reference implementation of a protocol, and much like HTTP, you must learn it the hard way if you intend to be a professional in the domain.
The mistake is to believe there is a U<I|X> in the first place.
It does not, however, stop you from using one of the many graphical tools, that are fine for simple usage and will all eventually fall short.
You could use a GUI to make complex HTTP routes, through proxies and DNS records via drag&drop, and then one day you'll have some weird DNSSEC error that the maintainer does not care about, and you will have to explain that to whoever is losing money.
Acknowledging a problem is not as trivial as it was first thought is critical in our line of work in my opinion.
In reality this "reference implementation" is the sole interface 99% of the users use, so the lack of UX is an issue.
There are other tools that do it better and right, so insisting that there's any redeeming qualities to git here is being blind to the obvious. There's actual academic research describing git's problems here, for crying out loud.
> There's actual academic research describing git's problems here, for crying out loud.
And interestingly, all the academic research talks about is the high level UI. Not the underlying data model.
> In reality this "reference implementation" is the sole interface 99% of the users use
/That/ is the mystery to me. I don't see a reason why a better UX around the same underlying data model, talking to the same server (and thus interacting natively with e.g. github) hasn't picked up.
Heck, one could implement mercurial's UX on top of the git data model. In fact, that even has been prototyped. I can't find the repository anymore, but iirc it was somewhere on bitbucket.
There are at least five such front ends (easygit comes to mind) and extensions (like legit from Kenneth Reitz). But none is popular, and I suspect the reason is that, despite all the complaining, Git's default UI is not actually that bad in day to day use.
Sort of - DVCS did not have nearly as much general uptake before Git and Mercurial caught public attention, and none of the options available met Torvalds' featureset requirements.
BitKeeper existed, a few projects used darcs, GNU arch and monotone existed but I don't think I ever encountered a project using either (the Canonical arch fork aside), and that's mostly a wrap for free or Free options.
He may not be the father of a novel concept which he could write a paper about, but Git and Mercurial existing certainly coincided with the enormous uptick in DVCS usage and visibility.
I suppose the difference in opinion about the term "father" just comes down to being about the theory behind the problem space or wildly successful implementation, and IMO "taking known solutions and putting them together in a more successful way" is novel enough to claim credit.
BitKeeper could have become GitHub + git in one entity if they made the client and hosting for FOSS free and charged for on-premises servers or private repos.
> Due to the ubiquity of git (...) we (the users) are in the same case today.
Github aside, in what ways does git fall shorts of your expectation as a user? Maybe there are alternatives for you to use, but we need some clarity to decide whether they exist or not.
I used git for many years, and actually still use it extensively to collaborate with git projects, but in the last year or so Mercurial and several of the modern plugins for it have opened my eyes that git is stuck in 10 year old ideas of how DVCS should work.
I used to criticize Mercurial for their half-assed branching, lack of history editing and the refusal to admit that git got this right. But Mercurial has improved (bookmarks, histedit, etc) and git has not, so my feelings have entirely reversed now.
It's again due to the ubiquity. Even if I get stuck in a git UX hellhole, I can do a google search and be 99% sure the top result is a StackOverflow post explaining the steps needed to recover.
i don't think he called torvalds the father of dvcs, merely he was the one who came up with the simplest and correct solution to handling merges (which is to let the user resolve most of the complex ones)
and what about this line on the wikipedia page?
> This [community] version of BitKeeper also required that certain meta-information about changes be stored on computer servers operated by BitMover, an addition that made it impossible for community version users to run projects of which BitMover was unaware.
so not storing some changes locally, and being unable to access it if you don't have a paying version of the client seems to be really archaic compared to the simplicity of git. i could be misinterpreting what they mean by meta-information though.
The author doesn't know anything about VCS. And Bram is almost right in this case, Linus wrote Git in 10 days.
The thing is, Git is pretty much the most horribly over complicated user tool in history. Learning how to use Git properly is more complicated than learning how to use Unix. Several times more complicated. I honestly would be more comfortable giving a modern team who had never seen a VCS before just plain old CVS, because even if it's horrible, at least it's simple and you can just work around the stupidity. Git will leave you trapped in a hellscape of reading manuals and howtos and bloating personal repositories, and not really do anything particularly great except feature branches.
DVCS is great when you need it, but when you don't need it, it's annoying as hell.
FWIW, I learned git much faster than I learned svn and cvs, and found it far easier to use and understand the underpinnings of.
Svn in particular was so awkward to port diffs between branches and carry diffs forward in time (git equivalent of rebase) that I built a system of shell scripts around patch files. Instead of creating a branch and committing changes (so painful to create or switch branches in svn), I saved the working state diff as a patch file and reset the working directory whenever I had to switch to a different task. I had a couple of shell scripts, one to save the current diff and reset, and another to apply a diff to the current working tree. And I had a third one to do a three-way merge to resolve conflicts when the tree had been updated since the patch file had been created.
Yeah, my SVN mode of operations ended up being multiple checkouts of the repository in directories alongside each other, one for each simultaneous task.
Ditto. Switching between branches in SVN was painful. We would use branches, but it was definitely a necessity to have a few separate instances with the main branches I might have to use pre-loaded. (Also a lot more work just got done in the "trunk" than ideal, just out of expediency.) Git is a huge improvement.
The interface for git is a bit of a mess, but the simplicity of the model (I tend to think of it as "a DAG of commits", YMMV), and the wizardry that makes that model consistent with the underlying nastiness of files and merging and whatnot, more than make up for the interface badness IMO. It almost always does exactly what I want it to do, or halts in a state I can easily move forward from, which is more than I can say for other VCSs I've used (mostly subversion) in all but the simplest use cases.
Without the right mental model, I expect git would be very difficult to use. I might not recommend it for a team I didn't think could grasp the concept of a DAG. Then again, such a team will basically be cargo culting every moderately complex technology they use, so what does it matter if their "committing_notes.txt" file contains git commands or something else?
A bad argument for what? I'm responding to a guy who said that git is harder to learn than UNIX and that teams new to version control should prefer CVS over git.
I wouldn't, and I'd tried using Mercurial for my personal random projects for a while. I gave up not because hg is inferior, but because when python, the language hg's written in, chose to migrate from hg to git, the war was truly over and lost.
At the end of the day, a VCS is just a tool, one of many that I need to learn and use to do my job, and it's not worth the effort to learn both git and hg.
Somewhat related anecdote: years ago I decided to learn Dvorak, but eventually switched back, because by the time I'd become proficient, my ability to type in QWERTY's completely gone. From what I'd read online at the time this was unusual: most people who learned Dvorak could switch back (& forth) within a few seconds to a few minutes. It took me probably a week or two to be able to touch type in QWERTY again, and maybe a month to get back to my original speed. And by then it was as if I'd never learned Dvorak at all.
Anyway, the point of the story is: maybe I just have shit memory :-)
I used Mercurial very briefly just to evaluate it; it's fine. I'm criticizing git. And i'm also saying DVCS in general is more complicated than the most typical use case of VCS.
Using Git for everything is like riding a bicycle with four derailleurs to pick up milk from the corner store. Granted, this is what I do right now; just because some technology is complicated or annoying doesn't mean I don't use it. But I wouldn't recommend it to others.
git and mercurial are a million times more clever than a state-of-the-art DCVS like bitkeeper. It's a new data structure. It's a brilliant data structure. It's robust in ways even distributed databases aren't. It's fast. It's simple. It's clean. It's elegant.
git made several additional breakthroughs in terms of working in terms of snapshots instead of diffsets, keeping much less data, and performing many computations late rather than early. In essence, these result in a system infinitely more flexible and expandable than prior version control systems. Adding new ways to change the source code in CVS required a whole new data format (SVN). In git, that's a minor change.
The full power of git hasn't been anywhere close to exploited yet. The data model is very general-purpose, fast, and robust. It can do much more than just source control.
Were a lot of programmers really hung up on this idea of having increasingly smart merge algorithms back then? What Linus says seems self-evident to me: if there's even a whiff of a conflict, I want it to be shown to me so I can make a human decision about it. A merge algorithm smart enough to resolve every case is easily proven to be impossible by the fact that it's possible for branches to merge cleanly with no conflict, and yet still have a logical conflict due to some implicit contract that was broken. These are the nastiest regressions to find, and one reason I insist on rebasing before merging topic branches, because then at least git-bisect will tell you where the implicit contract was broken precisely on the second-rebased-and-merged branch. Smarter merge algorithms would just paper over more problems and so you quickly hit diminishing and then negative returns by trying to be too clever.
I'm not sure this is entirely fair. I've run into a fair number of merges where git couldn't resolve the conflict but meld or p4merge was able to do it trivially. If git were able to better deal with these, I don't think anyone would question it. I suspect over time that git probably has gotten smarter with merges.
I surveyed a fair number of DVCS systems in the run-up to git. I'm hardly an expert, but gracefully dealing with merge conflicts was indeed something system authors were investigating. Darcs went so far as to have a formal theory of patch management [1], which guaranteed never to have a conflict (unfortunately, running the proof engine could take unduly long and in some cases, may never complete).
Anyway, I think it's a tricky balance to strike. You're right that if the SCM resolves the conflict improperly and introduces a logic error, that's really tedious to track down. However, I've encountered far too many cases of bad rebases or merges resolved incorrectly by humans as well. Sometimes it's a small change the developer didn't pick up in line that was edited by both. Sometimes it was a lack of understanding of incoming changes. Sometimes it was just a battle lost to the merge tool (e.g., seeing "<<<<<<<" in source files). In all those situations, I long for a smarter merge tool.
You can still verify the result of an automatic merge. It's simpler to mark a suggestion "y" than to resolve conflicts that could be solved automatically by hand.
So I don't think this reasoning holds any ground whatsoever.
What they're actually objecting to of the idea of not having a human in the loop. I suspect if you pitched these smart merge algorithms from the "click to accept auto merge" angle, they wouldn't have much of a problem. Instead they seem to be pushed as a way of avoiding interaction.
Nah, I'm not sure the comma is grammatical, might actually make the last part a contradicting enumeration. The blame is on me, I quoted out of context and was a bit on the slow side understanding. The sentence is just not written well, too.
On topic however, missing one such conflict is easier, if you have to spot it after the damage is done, instead of having to think about it yourself beforehand, no?
If Linus had insight into issues around merging it's because that's what he does for a living. Every change to the Linux Kernel is ultimately approved/merged by him. Think about that. More than any person on earth, Linus knew what the issues involved with DVCS were and he chose a solution that fit all the use cases he saw on a daily basis.
He's not even a top 20 approver. Greg KH takes those honors, Andrew Morton is number 3, and I forget the rest. The list comes out annually from the Linux Foundation. (Sorry, no link.) The foundation requires an email address and credentials before providing access to the pdf.
During the merge window, Linus takes in most of changes for the next kernel release. All of which have been approved by someone else. With each RC release, the lieutenants still approve most of the new changes. It's a matter of when a crucial fix is needed for Linus's mainline kernel, then he'll accept a commit directly.
The merge algorithm http://wiki.monotone.ca/MarkMerge/ used by Monotone has a well defined user model for conflicts, with similarity to Codeville's merge algorithm iirc. I'm not sure if any later VCS used it, looks like perhaps not.
Mark-merge is specifically for standalone scalar values. We used it for tree structure (file name, and patent directory), for file attributes, and as a first pass on file content to see if we needed to bother doing a 3-way message.
Tree structure is overrated. The way git handles it works better in practice. Files are an implementation detail, rather than something fundamental about code structure.
Using it as a first pass at content merging was mostly a performance optimization, and also only works if you track individual files as objects.
It might be a useful building block for a system that tracks refactorings as fundamental operations and knows how to do merges on your AST instead of on the serialized text form of your choice. But as far as I know no such system exists yet, and merging complex data structures is far more complex than simply merging their individual building blocks.
Looking at the exchange a bit and reviewing the context in which it occurred: it's a great example of someone clearly deciding what problem they want to fix, followed by defining their priorities and sticking to them. As Linux has pointed out (IIRC), doing this pointed the way toward the underlying data structures, and the algorithms to manipulate this data followed from there.
Cohen, as remarkable as always, comes across in this instance as trying to "sell" his product (codeville) and taking it a little personally when Linus isn't swayed. Linus clearly defines the problem he was trying to solve, and believes that git solves it better than codeville. He would have backed codeville if he thought it met kernel dev needs better than git. After all, he had jumped on Bitkeeper before, against other people's wishes.
If you'd try to find the roots of that idea, it would certainly be at least hundreds of years old, reflecting the difference between the scientific and the religious approaches to the world.
"I am wiser than this man, for neither of us appears to know anything great and good; but he fancies he knows something, although he knows nothing; whereas I, as I do not know anything, so I do not fancy I do. In this trifling particular, then, I appear to be wiser than he, because I do not fancy I know what I do not know."
for "corrupting the youth of the city-state and asebeia (impiety) against the pantheon (gods) of Athens."
So my claim above that it's about the religion vs science still stays. "Science" is of course relatively modern term, the older one was "natural philosophy."
Which is actually romantic nonsense, just like those people who say in the modern world that "we know nothing about how things work". We know plenty. We just don't know it all.
A man who says he knows something is also being less of a wanker than a man who says he knows nothing, especially if the latter then uses that comment as leverage to claim wisdom over the former.
Read more from the original, taking just a small quote out of it doesn't do it justice. Also see my other comment for the context. The context matters.
It's about the claims based on faith versus the acknowledging the scope of what we actually know so that we can actually find out, which produces the absurdities in the religions for which they have to shame themselves today, as these simply don't match what we know today for sure.
Socrates was sentenced to death for impiety, that's the part of his own, obviously unsuccessful, defense.
I indeed haven't read the original. In context, it may make sense, but certainly pulled out of context like this, it's romantic twaddle.
Whenever I hear a person say that 'we really know nothing', it always reminds me of Insane Clown Posse being angry that science has an explanation for how magnets work... as in, finding comfort in expressing ignorance :)
Well, what's true is that these particular sentences were often not only used out of the context but even particularly modified to the significantly different "I know that I know nothing."
"Evidence that Socrates does not actually claim to know nothing can be found at Apology 29b-c, where he claims twice to know something. See also Apology 29d, where Socrates indicates that he is so confident in his claim to knowledge at 29b-c that he is willing to die for it."
So he was what we'd today call "a scientist" being ready to admit the changing but the finite limits of the knowledge while acquiring the new knowledge, not "an ignorant."
But it's also not surprising that not everybody even understood what was that about.
Yet another description of "Linus and his fireworks", and when you actually read the thread, there's a little bit of crankiness, yes, but mostly a lot of discussion. It's milder than what goes on in a lot of HN threads.
Also, Bram Cohen's diff algorithm made it into Canonical's Bazaar VCS. I think some of the Cohens' other VCS ideas may have made it into Bazaar, but I'm not sure.
This reminds me of that old discussion between Linus and ESR where the ESR tries really hard and has a lot of really good arguments but Linux doesn't care because he is right.
The thread was on HN last year, I think the title was something about Linus being to smart for his own good or something similar.
Git is popular (nowadays), it's a decent DVCS that requires craftmanship and tons of trials and errors before you can consider yourself "pro user" and it's funny that the go to solution for all git problems it's still "just hard reset your repo".
Wait, what? Git was the response from Torvalds to BitKeeper being proprietary and the author being upset with some kernel developers. It's basically taking what was then state of the art for DCVS and reimplementing it, not coming up with some new paradigm. It's a good program, and to my knowledge it was engineered well, but let's not rewrite history so Torvalds is the father of DVCS.