Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's the short version:

  I am the original sentence.
Alice commits a change in her repo:

  I am a different sentence.
Bob commits a change in his repo:

  I am the original sentence.
  I am the original sentence.
Now Alice pulls Bob's commit. What should happen?

The argument is that in certain cases it can be known which of Bob's 2 sentences is the original and which is the copy (due to context provided by an intermediate commit) and that therefore a correct VCS will figure out that the original is on the bottom:

  I am the original sentence.
  I am a different sentence.
But git doesn't look at history so will always produce:

  I am a different sentence.
  I am the original sentence.
I don't care. If you force me to care then I actually prefer git's behavior. Git is consistent: a merge will always produce the same result for the same files. I don't want history to matter.

The problem is not actually solvable. So git doesn't try to solve it. I think that's why it's called "the stupid content tracker."

EDIT: Is there anything worse than "smart" features that only work, say, 80% of the time? The closer they get to 100% the worse it gets, because then you start relying on them and they break right when you stop paying attention.



> Git is consistent: a merge will always produce the same result for the same files

I thought the point was that if you pull the exact same commits in different order the merge will produce a different result for the same files, meaning that in git the history does matter. Whereas darcs/etc will always produce the same result, such that history does not matter?


pull the exact same commits in different order

Sort of. The OP doesn't write clearly. He's also confused about how git works. What he means is..

Say Bob has 2 commits (B1-B2) and Alice has 1 (A1)

Scenario 1: Alice merges each of Bob's commits in sequence (i.e. she replays his commit history onto her repo: A1-B1-B2).

Scenario 2: Alice merges only B2 (A1-B2).

The point is that, with git, Alice's repo will be different in each scenario. Because in scenario 2 git doesn't examine commit B1 and use that info to try and figure out what the content in commit B2 "means".

With darcs, on the other hand, her scenario 1 repo will be identical to her scenario 2 repo.

The flip side is that in scenario 2 git will always produce the same result for the same B2, because B1 is irrelevant. With darcs a change in B1 will change the result.

NOTE: "git pull --rebase" actually does "replay commit history" instead of "merge" when pulling code into your repo (result: B1-B2-A1). I use it as my default. The outcome is the same as darcs, the difference is that everything is explicit.


That's what i don't get.

I don't understand where or how you could encounter a circumstance where this would matter. This complaint seems to be an abstract theoretical point (maybe to support git alternatives? dunno) that even esoteric usage of a DSCV would never come across.

I dunno, maybe i'm not being creative enough in my use of histories.

EDIT: Okay this explains everything in a considerably more concise fashion than the article does: http://news.ycombinator.com/item?id=2456529


It would matter in this situation:

In the beginning:

  function A(){
    return 1;
  }
Now commit this in one branch:

  function B(){
    return 1;
  }
  function A(){
    return 1;
  }
then this:

  function A(){
    return 1;
  }
  function B(){
    return 1;
  }
  function A(){
    return 1;
  }
And then this in another branch off the base:

  function A(){
    return 2;
  }
Now merge the two end points. Which is correct? This, assuming a purely line-based diff:

  function A(){
    return 2;
  }
  function B(){
    return 1;
  }
  function A(){
    return 1;
  }
or this, assuming knowledge of the history of events?

  function A(){
    return 1;
  }
  function B(){
    return 1;
  }
  function A(){
    return 2;
  }
In Javascript, where such code is acceptable, `A()` now returns 1 or 2.

In Git, or by applying patches manually, it depends on the order in which you merge. If you merge the `B()A()` branch with the `return 2` branch and then the `A()B()A()` one, you'll get the second result. But if you merge the `A()B()A()` directly with the `return 2` branch, you'll get first one. The same set of changes producing different outcomes.

In Darcs, the history between `A()`, `B()A()`, and `A()B()A()` are checked, and it's seen that the second `A()` is the "original" one, so the `return 2` is applied to that one.

Which means that you won't necessarily get the same behavior merging two Darcs patches as you would merging it within the repository, where there is a history. Git behaves exactly as if you were dealing with patches. I side with Git on this, personally, but it's a valid point - you have history, why not use it?


You know, I suspect that in many production cases, neither merge is "correct". The example involves a lot code duplication, and a change to the block of code which was duplicated.

The probable case is something like:

  function foo(){
    do_something_complex_but_not_correct();
  }
with one person making the change to:

  function foo(){
    something_else();
    do_something_complex_but_not_correct();
  }
and then:

  function foo(){
    do_something_complex_but_not_correct();
    something_else();
    do_something_complex_but_not_correct();
  }
in the stated two-step change, while another author makes the change to:

  function foo(){
    do_something_complex_and_also_correct();
  }
The correct "merge" is going to be to apply the second change to both blocks of code, not just the first or the second:

  function foo(){
    do_something_complex_and_also_correct();
    something_else();
    do_something_complex_and_also_correct();
  }


Which is why I side with explicit, patch-like behavior. Interpreting a `move-and-copy` as a `move` when there's a chunk of duplicate data that could mess things up means it's essentially doing a primitive semantic analysis of what you meant to do. It may be correct more of the time, but it can't be correct all of the time.

What I "meant to do" could have been as you stated, where both should have changed. Or I could have copied the internals of a function to a new one, and made minor changes around it, and actually do wish to use that new copy as the official version. There is no way to 100% accurately detect such intent without being explicit about it, so I'd prefer something dumb and therefore extremely predictable.


more precisely, I believe, the history of the merges counts, not the history of the files per their original edits.


commits are not commutative in the general case. darcs algo is commutative for the merged commits, and gits is not, in this example.

the git people are arguing that the speed lost by gaining this commutative nature is just not worth it. i agree.


Based on the article and the Reddit discussion, that is not correct. It's more like this:

Scenario #1:

Alice and Bob both make changes. Alice pulls Bob's change and merges it. Bob makes a second change. Alice pulls the second change and merges it.

Scenario #2:

Alice and Bob both make changes. Bob makes a second change. Alice pulls Bob's changes and merges.

The final result, which is Alice's change merged with Bob's two changes, ends up different in the two cases, and there were no merge conflicts.


No, no, no.

History is EVERYTHING to a VCS. You ALWAYS want exact information of what changed at what time. This lets you do all sorts of cool things like examine the provenance of a file in detail, integrate a similar change across two different branches whose code may have diverged, etc.

Meticulous tracking of history as well as efficient handling of large binary blobs are why the pros almost always rely on Perforce for large projects.


As far as I know, the history in git doesn't change unless you explicitly ask it to (rebase). So you should still always be able to tell exactly what changed and at what time it was changed. Perhaps git doesn't employ this information to the liking of others, but it should all be there.


Joel Spolsky describes mercurial as storing lists of changes, rather than a series of file snapshots.

"And so, when we want to merge our code together, Mercurial actually has a whole lot more information: it knows what each of us changed and can reapply those changes, rather than just looking at the final product and trying to guess how to put it together.

"For example, if I change a function a little bit, and then move it somewhere else, Subversion doesn’t really remember those steps, so when it comes time to merge, it might think that a new function just showed up out of the blue. Whereas Mercurial will remember those things separately: function changed, function moved, which means that if you also changed that function a little bit, it is much more likely that Mercurial will successfully merge our changes."

http://hginit.com/00.html

I'd assumed git and mercurial worked the same way.


The short of it is that Joel is wrong. Git and Mercurial use similar data structures and neither of them store changes in the way that Darcs stores changes. Maybe he knows that full well but is telling a white lie to get a teaching point across.

If you make a change to a file in Git and commit it, the new version will store the full updated contents of that file (delta compression is an orthogonal issue). Indeed, my use of the word "version" is revealing. That concept is secondary in Darcs; changes are what have primary ontological status.


Jeol is half-right and half-wrong. Mercurial stores its version history as a series of deltas, yes. Git stores its version history as a series of snapshots. (Git does do delta compression, but the delta compression is done independently of the version history, which is why git can be highly efficient at storing its complete version history in its repositories.) This doesn't matter, though, since you can get from snapshots to deltas and vice-versa very easily; the two systems are dual from each other. In that way, he is also wrong --- the reason why git and mercurial are smarter than svn is not because of how they store their commits, since that really is an implementation detail.

At least for git, git will do start by doing a 3-way merge, and if that fails, only then will it try to resolve the merge conflict by looking at the intermediate history. This is much faster, and for Linus, who wants to encourage lots of branching and merging, merge speed is highly important. This is what makes git fundamentally better than svn or cvs; the fact that it can get many more merge cases right, and that it can do this quickly and painlessly. So the darcs folks who say that git only does 3-way merges is incorrect; git can do much more sophisticated things than just 3-way merges. However, it only pulls out these more sophisticated weapons when the simple approach doesn't work (and 95+% of the time, the simple approach works just great).

What Darcs did is it focused on the "get many, many, MANY more merge cases right", but it completely ignored the "quickly" part of the equation. That's partially because it's amazingly complicated. Just take a look at the Darcs "Theory of Patches", and its obsessive fixation on being able to whether or not you different patches are commutative, etc., and that gives you a very strong hint of its complexity right here: http://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theor...

The question is whether this complexity is necessary or not. It certainly does slow things down. And fundamentally, that's the question; is it worth it to slow down nearly every single SCM operation just so that a few corner cases can be handled automatically, instead of requiring minimal human intervention? Since people of good will can disagree on this, the controversy certainly continues to exist. But I think a very large number of people are quite happy with the engineering tradeoff made by systems such as Git and Mercurial.


Great summary!

I stopped using Darcs a few years ago, but I heard the current generation at least resolved the notorious exponential time slowdowns.

Git's speed is definitely a big selling point. More than that, the ecosystem and services like GitHub are what really sold me on it versus alternatives. But Mercurial has a lot to offer and its simpler user interface, better Windows support and extensions like BFiles make it a much better fit for certain use cases.

I shouldn't have been so hasty to say that Mercurial doesn't store changes. But I'd argue, and you seem to agree, that Mercurial's revlog does not reflect a difference from Git in the basic philosophy of merging and the status and role of versions. In both cases you're basically dealing with genealogically annotated purely functional trees. By comparison, Darcs's theory of patches represents a radical departure. At the very least I'm happy that someone is trying to think deep and different thoughts in this area.


No, you are totally wrong. Did you even try this in Git? Any sensible VC system will give you a conflict here. The article discusses auto-merge behaviour. You ABSOLUTELY can get auto-merge to work 100% of the time. When it doesn't you get a conflict that you need to manually resolve. BitKeeper does get this right (disclaimer: I am one of the developers of BitKeeper).


What a powerful insight. Only now do I see how truly wrong I was. I don't know how I could have been so blind.


This is not an accurate summary. The claim is that a "correct" VCS will detect a move, not a copy, which is an entirely different beast - if it were a copy, it might be correct to apply the patch to both lines in some situations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: