Posts tagged ‘mercurial’

A Treatise On The History Of Distributed Version Control

This is not yet another git vs Mercurial debate. I admit bias towards git, which I use whenever I have a choice. This is most of the time now that gitifyhg is awesome. However, I have been using Mercurial for my day job for about a year. I am more familiar with Mercurial and it’s extensions than many developers who prefer it. I consider myself an advanced git user (not an expert) and an intermediate Mercurial user.

I therefore have the background to claim that Mercurial and git are equally capable. Mercurial doesn’t have certain features of git that I miss, but those features are implementable with development time. Sometimes git’s interface isn’t as easy to use or teach as I would like, but aliases and projects like git extras alleviate this issue.

This article is about philosophy, not technology. Mercurial’s documentation, mailing lists, and stack overflow questions are littered with dire warnings that extensions that rewrite history are dangerous and best avoided. Git, on the other hand, takes a “consenting adults” approach to history rewriting. While it acknowledges that rewriting history can be dangerous and should be avoided in certain circumstances, it also allows the coder to choose when and how to apply this rule.

To avoid comparing the two systems, I’ll refer to the two styles as “permanent history” and “mutable history”. Both git and Mercurial are fully capable of maintaining both styles of history. However, Mercurial users tend to prefer “permanent,” while git users typically adopt a “mutable” approach.

The permanent history philosophy emphasizes that a changeset cannot be altered once it has been committed. It is important to record exactly what state the repository was in when the commit was made. If that state is not acceptable, then a new commit is made to correct it. Future readers of the history in question will see that a commit was made and that later, it was amended in another commit. Permanent history is analogous to a captain’s log or accountant’s general journal. Every action should be recorded separately.

The goal of committing in the permanent history paradigm is to record a specific state of the repository.

The mutable history philosophy, in contrast, sees changesets as individual paragraphs in a living story. It can and should be edited to ensure it tells the story as effectively and coherently as possible. Each changeset should have a topic sentence (the commit message) and supporting sentences (the patch). When a commit is initially made, the book is never assumed to be in the final draft that will go to the publisher.

The goal of committing in the mutable history paradigm is to record a related set of changes.

The different stages of code history

There are several stages that changesets go through as a program is written. These stages are perceived differently by the two styles.

  1. Working directory changes have not yet been committed.
  2. Local changesets have been committed but not yet pushed to any public repository.
  3. Public changesets have been published to a public repository and are available to other coders.

The working directory stage is treated identically by the two philosophies. If it hasn’t been committed, both styles have an “anything goes” attitude. If you screw something up and fix it, you are not expected to commit the bad code for posterity. If you leave a debugging statement in the code but catch it in a git|hg diff command, just delete it before committing. If your tests aren’t passing during the uncommitted changes stage, edit the files to make sure they do pass.

The attitudes diverge slightly when it is time to commit the changes in the working directory. Neither style requires committing all of the changes that are in the working directory. For example, if you edit a test that is not related to the feature you are about to commit, you can separate the two diverse changes into separate commits. However, this practice is more common in mutable history circles than permanent, largely because permanent history coders want to record the current state of the repository while mutable history followers are focusing on related changes.

The two histories have polar opposite beliefs about the local changeset stage. Permanent history maintains that a committed changeset should not be altered in any way. Some proponents may allow amending or rolling back the most recent commit, provided it has not been pushed publicly. They frown upon editing the “second last” or earlier commits, even if they haven’t been published.

Mutable history, on the other hand, takes the same “anything goes” approach that applies to the working directory stage. If changes have never been pushed publicly, then the mutable historian will comfortably rearrange and reorder them, move patch hunks from one changeset to another, or squash relevant commits together.

Permanent history fans may be surprised to learn that their philosophy on public commits is the same as mutable history’s. Once a commit has been pushed to the permanent public repository, both philosophies consider that it should not be changed, ever. If mutable history is likened to writing a story with coherent chapters, then public commits are like a published book. Once it has been published, the book should not be altered.

Let me reiterate: altering permanent published history is considered a Bad Thing by both philosophies.

There is a fourth stage available in the mutable style that permanent history does not allow. This “temporary public” stage lies between the local changesets and public changesets phases. At this stage, other people can see your changesets before they are moved into permanent public history. They may be rearranged and edited as if they are local history, but there must be agreement between all viewers that this section of history is still considered mutable. This is akin to sharing a draft of the book with a proofreader or copy-editor before it is published.

The source code for git itself is managed in this way, as discussed in maintain-git.txt. While it contains permanent public branches that canot be altered, it also describes a “pu” branch that is temporary public. This branch is used to share the state of upcoming changesets; other developers can provide feedback, not only on the quality of the code, but also on the quality of individual patches, commit messages, and ordering.

History Is Communication

There are several reasons to maintain code history. Some examples include:

  1. Preserve a record of past state in case you need to return to it.
  2. Compare two versions of a code base and to find the specific code that introduced a new bug.
  3. Concurrent development via patching and merging is virtually impossible without it.

However, the primary purpose of code history is to communicate. Each changeset implicitly communicates that the developer had some reason to take a snapshot of the repository at that time. It communicates exactly what the state of the repository was when the snapshot was taken, and is even able to communicate what changed between that snapshot and the previous one. The commit message describes these changes in English, preferably with a one line summary followed by a complete description of what changed and why.

While the two ideologies agree that history is extremely useful for communication, mutable and permanent history disagree as to what should be communicated.

Permanent history’s main purpose is to communicate “honestly” what happened, for all posterity to see. Each snapshot shows exactly what occurred in the repository. Two developers created different changesets in parallel and then at some specified point, they merged them. Someone forgot to delete a debugging statement and made a second commit to fix it.

Mutable history prefers to communicate “effectively”. The goal is to make local changesets as readable as possible before pushing them. Each changeset ideally contains a single related set of changes. Related changesets are further grouped together on individual branches. If this is not the case, they are modified or moved before being made public.

If you catch a problem in mutable history after committing but before pushing publicly, fix the commit. If two distinct changesets actually communicate a single idea, squash them together. If a single changeset contains two ideas, split them apart or move one to a different branch so the current one only contains cogent changes.

The permanent history crowd suggests that this rewriting of local changes before they are pushed is dishonest, or lying. However, it is easy to lie at the working directory stage in the permanent history paradigm. If you run an hg|git diff and notice that you forgot to delete a debugging statement committing, then it is perfectly acceptable to delete that line and “lie” about having forgotten it.

If they truly wanted to record what “honestly” occurred, permanent history tools would track every single change at the text editor or IDE level.

I think we all agree that this is ridiculous. In truth, permanent history shares mutable history’s desire to have clean, communicative commits. The primary difference is deciding when it’s “too late” to change them. In permanent history, once committed, you can’t change it. In mutable history, you can and should change it up until the point it is pushed to a permanent public repository.

The ability to change history before pushing allows the developer to separate the two distinct tasks of “coding” and “organizing”. Often, when coding, we encounter a separate issue that needs to be addressed, a missing feature, a bug, documentation that needs writing. In strict permanent history paradigm, your only “honest” option is to commit both features in a single changeset. However, permanent history rules are relaxed before the first commit has been made, so two other available options are:

  • shelve/stash the existing changes, write and commit the second feature, and then unshelve/stash apply
  • write the two distinct features in the same working directory and use git’s index or Mercurial’s crecord extension to commit them as separate patches.

These options are commonly used by mutable history developers, but they also have another option: Commit the two features and continue coding. Then reorganize or split the changesets into a sensible series of commits appropriate to good communication before pushing the features to a permanent public repository.

To create clean, well-ordered commits, the permanent history style demands that we think about one thing at a time and decide what the most relevant history communication path is before we start coding.

The mutable history style understands that programming doesn’t work this way. It is common and acceptable to begin work on one feature and discover a bad comment or FIXME you had forgotten and perform a psychological context switch to work on that.

One of my colleagues once informed me that “Mercurial is for people who don’t need to hide their mistakes.” This is bullshit for two reasons. First, Mercurial, like git, is perfectly capable of hiding mistakes. It’s easy to edit local unpublished changesets in Mercurial before pushing them live. There are numerous extensions — both third party and built in — that allow this kind of operation.

Second, this statement deliberately misrepresents the purpose of history rewriting. We don’t rewrite history to hide our mistakes. We do it for the benefit of future readers of our git|hg log. Reorganizing history when we write it greatly reduces the cognitive overhead for readers trying to understand what we did and why. History, like code, is meant to be read more often than it is written. Crafting it before pushing it publicly eases the amount of work for future readers of that history.

It is difficult to understand a series of cumulative changesets that keep undoing themselves or refactor large sections of code. It is better to order these changesets such that they make sense. I’m not saying that only the final product should be committed, if other changesets are able to communicate useful information. There are legitimate reasons, regardless of history ideology, to record mistakes in the permanent record of the repository. If the mistake has already been pushed publicly, the best thing to do is admit that you did not communicate as effectively as you had intended and make a new changeset to fix the problem. This is akin to providing an errata to a published book.

Another good reason is to record that some experiment you attempted was a failure. Perhaps it made the system unbearably slow or it arbitrarily deletes data. These commits can live in a branch in permanent history, forever documenting that this experiment was attempted, that it failed (so people don’t waste time trying it again), and how it failed (in case someone else wants to improve upon your design). Neither philosophy advocates the hiding of this kind of mistake. However, mutable history does expect that the failed commits be well ordered with commit messages that effectively communicate what you did and what went wrong.

History is a form of documentation. Like any documentation, it should be well-crafted and report the evolution of the system effectively. For example, one of the best early pieces of advice I received when I started using git (and inadvertantly learned the mutable history philosophy) is to “never use the word ‘and’ in a commit message.” The word “and” is a sign that you are trying to communicate two different ideas or changes in a single changeset.

There are other motivators for this piece of advice in addition to disseminating useful information. If you ever want to revert certain related changes, it is easier to do so if those changes exist in a single patch or consecutive set of patches. There is no need to extract the changes that you want to keep from combined snapshots. Both DVCS’s provide commands to trivially reverse earlier changesets, but this is only useful if the changesets contain only the idea you wish to reverse. It also eases communication when a commit message says “reverse the changes from revision X,” compared to “reverse some of the changes that do this and this as committed in revision X, but allow the lines that perform an unrelated operation alone.”

Further, if you want to apply an individual set of changes to a different branch of the project without merging an entire branch, it is a trifling matter if those changes are part of a single coherent, cohesive, consecutive set of patches.

Finally, if you don’t have write access to a project, single idea changesets make it easy for the person who is integrating your patches to see what you did and what you intended. Mutable history integrators will generally reject your patches if they do not communicate effectively. You are unlikely to ever get write access to an upstream project (if it follows the mutable history paradigm) if you do not prove that you adhere to the single concept per changeset guideline.

Admittedly, it takes more effort to alter history than to just take a snapshot when you feel the code is in a semi-acceptable state or you want to have a backup available. This effort has a huge long-term payoff. When people say that they don’t care about maintaining clean history, I get the same sense of distaste as “I don’t bother with writing tests” or “I tend to put documentation off to the last minute”. Not maintaining clean history is a sign of a lazy developer. You may be saving yourself time by just committing, but you are adding overhead to everyone who ever has to interpret your commit history, including yourself.

Code Review

Code review is an extremely useful tool for improving the state of a source code repository. It’s a simple concept: other members of the team review each changeset and make suggestions for future improvement.

Reviewers using the permanent history philosophy can improve the quality of the future codebase, but they cannot suggest improvements to patches already under review. They do not have an opportunity to improve the communication quality of those patches if they have been pushed publicly, which is normally the case if you want to share patches for review.

The mutable history style changes the point at which modifications are not allowed from “in the local directory” to “publicly pushed to the permanent repository”. Code can be pushed to a temporary public location for review purposes. Other team members can review it and comment on the quality, not only of the code, but also of the individual changesets. Once review is completed, and any suggestions integrated into the change history, the temporary repository can be safely deleted.

Thus, the code review phase becomes more than a review of the code, it is also a patch review, a history review. Code review gives other developers the chance to say, “This patch could communicate more effectively if…”.

Incremental Merging

The most dangerous moments in version control occur when two different branches of development that touch overlapping pieces of code have to be merged.Someone has to figure out what the two original sets of changes did and then figure out what the combined code has to do to accommodate both ideas. If the branches have been divergent for a long period of time, this job is nearly as difficult as rewriting both features entirely from scratch.

This is compounded by the fact that normally, the two different branches were written by different developers. While the person doing the merge may be intimately familiar with their own work, they have to become just as well-versed in the alternate branch before they can merge it safely.

Worse, all of these changes get combined into a large “merge commit” that basically includes the entire modified history of the two feature branches squished into a single gigantic diff. This is horrible for communication. If just one line of code was inappropriately merged, it becomes a nightmare to answer the question, “why did this work on the feature branch but fail after the merge”?

In the permanent history paradigm, it is common to attempt to alleviate this problem by merging frequently. This way, a smaller subset of changes can be covered in each merge. Unfortunately, this is a terrific way to introduce malicous or erronious changes into the history. People tend to assume merge commits “did the right thing” and don’t review them as closely as the patches being merged.

Moreover, the history becomes much less readable as these unnecessary merge commits clutter up the intention of both feature branches. Such commits do not serve any communication purpose other than, “the developer of this branch was afraid that divergent changes would be too hard to merge at a later, more appropriate time.”

The cognitive overhead is greatly reduced in the mutable history paradigm. Mutable history encourages rebasing over unnecessary merging. Instead of merging two branches of commits, rebasing makes one branch appear linearly after another branch. When rebased, the branch contains a series of commits that make sense in a linear order with no confusing merges in the history.

When you rebases a branch, each individual changeset is applied against the upstream branch, one at a time. Because the changesets contain small unit of changes, they are less likely to conflict, and therefore apply cleanly. When there is a conflict, it is easier to tell (from both the commit message and the code) how the code needs to be written to apply that changeset against new changes. This “one change at a time” process is much easier to apply than a single large merge commit. In addition, if you have previously rebased a branch and reordered commits for optimal communication, you will find that future merges or rebases onto other branches are even easier.

This means that, compared to a merge, you do not need to be as intimate with upstream changes from other developers. You figure out how you would have written each of your own changeset if you had been applying them directly to the upstream branch. This is not nearly so mentally fatiguing as trying to unravel two parallel sets of changes a la “ok, they did this, and I did this so I need to do this to get things back into a sane state”.

While I am avoiding a git vs Mercurial debate here, I’d like to point out that the various Mercurial utilities for rebasing and history editing are not very effective as compared to git’s tools. The rebase extesion, histedit, Mercurial queues, and pbranch all do the job, but they require more effort than in git. They don’t have git’s famous rerere functionality, they have potential to lose or obliterate history altogether, and they are neither as well integrated nor maintained as git’s tools. I do not say this to convince you to switch to git, but to point out that if you have tried these tools and found them lacking, it is not because the concept of rebasing and mutable history is a bad thing, but because the tools require further development.

Note that while mutable history users avoid unnecessary merges whose soul purpose is to reducing merge fatigue, they are not averse to merge commits that communicate useful information. So it is perfectly sensible to create a merge commit that demonstrate that a feature branch (usually containing a linear set of related changesets) has been merged into default. However, before the merge occurs, the feature branch should have its history edited in such a way that the entire branch will apply cleanly and no merge conflicts will occur that require any diff to be committed with the merge.

Have your cake and eat it, too

Distributed version control systems allow us to have multiple copies of repos in different states. If you feel strongly that an “honest” permanent history is important, perhaps it would be a good idea to keep this honest copy of history in a separate repository or a different branch. But for the sake of effective, coherent communication, maintain the main history in a mutable style.

In git, this can be done with branches. Simply do the development work on one branch (maybe give it a name like permanent/branchname to identify it as such). Push all changes to this branch as they occur. However, keep your master branch clean for most effective communication. When a feature is ready to go live, create a new branch from the commits on the permanent branch and rebase them onto master in a well-ordered manner that communicates clearly.

I’m not sure how viable this would be in Mercurial, since it’s not easy to copy commits between branches. More likely, it would be suitable to have two repositories; one that contains the permanent historical record, and one that contains the edited history.

I don’t personally believe this is necessary. The mutable history paradigm communicates everything I need it to. However, if you are unsure if you are ready to make the switch, I want to make it clear that it is possible to maintain both styles for a period while you experiment with the idea. If it turns out you don’t like the mutable history paradigm, you can always delete the offending mutable branches or repository… though, of course, this would be a mutation of history in itself.

I expect people who perform this experiment to realize that well-crafted history is worth the small amount of extra up-front effort required to maintain it.

Gitifyhg is now awesome.

Gitifyhg is a git client for pushing to and pulling from Mercurial repositories. I described the first implementation, which used hg-git internally, last month. However, I found that it didn’t work as well in practice as my initial tests had indicated, and I morosely reverted to using hg directly.

While researching ways to improve the app, I stumbled across git-remote-hg by Felipe Contreras. It claimed to be a cure-all to the git-hg bridging problem that “just worked”. So I downloaded and tried it, but it didn’t live up to the author’s hype. While it could handle basic cloning, pushing to and pulling from the default branch, it failed for me when working with named upstream branches, something I have to do regularly in my day job. I submitted several issues and pull requests, most of which were ignored. The more deeply involved I became with the code, the more I felt a complete rewrite was in order.

I had some free time during the holiday season, and started my version of a git remote as an exercise, with no intent to create anything useful. To my surprise, something useful emerged. In fact, I believe gitifyhg is the most robust and functional git to hg client currently available. More importantly, it is eminently hackable code: well tested and fairly well documented. I hope this will make it easy to contribute to, and that my inbox will soon be full of pull requests.

This is the real deal. The best part is that you don’t have to learn any new commands. It’s just basic git with mercurial as a remote. The only command that has changed from normal git usage is clone:

pip install gitifyhg
git clone gitifyhg::http://selenic.com/hg
cd hg

By adding gitifyhg:: before the mercurial url, you can git clone most mercurial repositories. If you can’t, it’s a bug. Other complex repositories I have successfully cloned include py.test and pypy.

You can easily use gitifyhg in the fashion of git-svn. All named branches are available as remote branches. default maps to master. Other branches map to branches/<branchname>.

If you want to commit to the master branch, I suggest a workflow like this:

git clone gitifyhg::<any mercurial url>
cd repo_name
git checkout -b working  # make a local branch so master stays prestine
# hack and commit, hack and commit
git checkout master
git pull  # Any new commits that other people have added to upstream mercurial are now on master
git rebase master working  # rebase the working branch onto the end of master
git checkout master
git push

Working on a named mercurial branch, for example feature1, is easy:

git checkout --track origin/branches/feature1
git checkout -b working  # make a local branch so feature1 stays prestine for easy pulling and rebasing
# hack and commit, hack and commit
git checkout branches/feature1
git pull  # New commits from upstream on the feature1 branch
git rebase branches/feature1 working  # rebase the working branch onto the end of feature1
git checkout master
git push  #push your changes back to the upstream feature1 branch

It is even possible to create new named branches (assuming my_new_branch doesn’t exist yet in Mercurial):

git checkout -b "branches/my_new_branch"
# hack add commit
git push --set_upstream origin branches/my_new_branch

These basic workflows have been working flawlessly for me all week. In contrast to my previous attempts to use git to hg bridges, I have found it easier to use gitifyhg than to work in the mercurial commands that I have become expert with, but not used to, in the past year.

Gitify hg is not yet perfect. There are a few issues that still need to be ironed out. There are failing tests for most of these in the gitifyhg test suite if you would like to contribute with some low-hanging fruit:

  • Anonymous branches are dropped when cloned. Only the tip of a named branch is kept.
  • Tags can be cloned and pulled, but not pushed.
  • Bookmarks can be cloned and pushed, but not pulled reliably. I suspect this is related to the anonymous branch issue.

So give it a shot. Gitifyhg is just one easy_install gitifyhg away.

Four ways to do local lightweight (git-style) branches in Mercurial

One of the many git features that I miss in my day-to-day work using Mercurial is local lightweight branching. That is to say, branches that I don’t push to a public repository until I know they are in a sane state, and that do not take up any room in the public branch namespace.

Until recently, I thought the only ways to create a local branch that did not get pushed to a remote repository were to use multiple local clones or Mercurial Queues. Turns out there at least four ways to do local lightweight branches in Mercurial.

This is a bug, not a feature. I’m a Python programmer and a huge proponent of the “There should be one– and preferably only one –obvious way to do it.” rule. In git, the answer to pretty much every question is either, “create a branch” or “step 1: create a branch…”. Git branches are simple and elegant. Mercurial branches are… well, it depends what kind of branch you want. You do know what kind of branch you want, right?

That said, I’m forced to work in Mercurial, and until gitifyhg is working well enough for daily use, I’m constantly looking for ways to work around the shackles Mercurial places on me. Ironically, I know more about Mercurial extensions than many of my pro-hg colleagues, simply because as a git user, I know what’s missing and always studying to fill the gaps.

So, here are four methods of creating a lightweight local branch in Mercurial. I leave it to you to figure out which one is best for your workflow.

Local Clones

This is the most often suggested method of local branching in Mercurial, which is a shame because it’s ugly and reminiscent of subversion. Essentially, you simply make a new clone of the repository and work in there. If you like your changes, you push them, if not, you delete the directory.

I’ve experimented with a folder structure like this:

code/myproject
    - staging
    - feature1
    - bugfix1
    ...

Each of the subfolders is a mercurial clone of the project. Staging is a clone of upstream. feature1, bugfix1, and its siblings are clones of staging. I manage my commits from staging, pulling from upstream or from the local clones as needed.

How to make a “local branch”:

cd ..
hg clone staging featurename

How to commit changes to your “local branch”:

hg commit

How to delete a “local branch” if you don’t want it to see the light of day:

cd ..
rm -rf featurename

How to merge a finished feature and push upstream:

cd ../staging
hg pull -u # pull in upstream commits
hg pull ../featurename
hg hgview # see what's going on
hg merge # good luck

Queues

Most mercurial users I talk to recommend avoiding mercurial queues. I have no idea why, they are one of the most useful tools in the Mercurial toolbox. It does annoy me to be forced to learn an entirely different set of commands to manage patches before and after they are “made permanent”, but if you’re going to be working regularly in Mercurial, they are vital to maintaining the workflows you are used to from git.

Mercurial queues are simply a set of patches that have not yet been “commited”, with a collection of tools for managing them. Because the patches have not been committed, they can be reordered and rearranged for the best possible communication effect.

Steve Losh has written a great introduction to Mercurial Queues for git users. Chapter’s 12 and 13 of the hg book are more comprehensive guides, but can be hard to follow.

It is possible to push and pull queues (they are stored in a separate repository with the main mercurial repository), but if you actually want to share the unfinished code you’ve written, I think it’s better to either create a proper named branch and share it, or temporarily share your local repository with hg serve.

How to make a “local branch”:

hg qqueue --create feature

How to commit changes to your “local branch”:

hg qnew patchname # having to specify a name for each patch is annoying
hg qrefresh # update the most recent patch with your latest changes

You can also use hg qrecord instead of qrefresh to add specific changes, but it’s not as robust as git add -p (it doesn’t support splitting or editing hunks).

How to delete a “local branch” if you don’t want it to see the light of day:

hg qpop --all
hg qqueue patches # switch to the default queue
hg qqueue --delete feature

How to merge a finished feature and push upstream:

hg qfinish --applied

Phases

Mercurial users tend to recommend bookmarks when git users ask for lightweight branches. That doesn’t really work because all it does is give a temporary name to an anonymous head instead of a permanent one. If you push to the public repository, your branch will be pushed too, regardless if it was named, bookmarked or anonymous.

It turns out, Mercurial has a concept called phases, which allow you to mark a changeset as public, draft, or secret. Anything that has been pushed or pulled is considered public. All other local patches are draft by default, which means they will be pushed publicly when you call hg push. But if you set the phase of a patch to secret, that patch is not allowed to be pushed publicly.

The hgview extension is good for indicating what phase various commits are in.

How to make a “local branch”:

hg commit -m "initial commit to the feature"
hg bookmark feature
hg phase --force --secret feature

How to commit changes to your “local branch”:

hg commit # it's a normal mercurial commit, but now it's secret

How to delete a “local branch” if you don’t want it to see the light of day:

hg strip -rev "secret() and ancestors(feature)"

You’ll need the mercurial queue extension to get the strip command. You might want to double check that the revset I wrote is suitable for you; I’m not too clear on mercurial revsets yet.

How to merge a finished feature and push upstream:

hg phase --draft feature
hg push

Now, the interesting thing is that you can set secret phase on a named branch as well as a bookmark. So you can have locally modified named branches that don’t get pushed until you’re good and ready to push them, or never get pushed at all. Just replace bookmark with branch above.

I learned about phases today. I haven’t really used them in practice, yet, but I believe they will become my new favourite way to do local changes in Mercurial.

push -r

It is possible to tell mercurial to only push certain revisions to the upstream repository, thus keeping your other branches local. I wasn’t too keen on this at first because it seemed far too easy to accidentally push stuff I wasn’t ready to push. However, Mercurial does warn you before you push a new branch (anonymous or named), so as long as you have actually created a branch (rather than putting commits on top of the upstream branch), it will be quite safe.

To do this, simply pass the -r or --rev parameter when you hg push. For example, if you only want to push the default branch, pass --rev default and commits on any other branches will remain local.

How to make a “local branch”:

hg commit -m "initial commit to the feature"
hg branch feature

How to commit changes to your “local branch”:

hg commit # it's a normal mercurial commit

How to delete a “local branch” if you don’t want it to see the light of day:

hg strip -rev branch(feature)

How to merge a finished feature and push upstream:

hg commit -m "close feature branch" --close-branch
hg update default
hg merge feature
hg push --new-branch

How to push changes on default without pushing the new branch:

hg update default
hg push --rev default

You’ll notice that I used a named branch here. It’s possible to do the same thing with bookmarks or anonymous branches, but you would have to specify revision numbers manually and would probably have trouble defining revsets.

Bonus: qimport

This isn’t strictly to do with local branching, but it addresses another issue git users have with mercurial: rearranging commits on a branch that has never been published.

Mercurial users point to the rebase extension when we complain about this, but that extension is woefully inadequate compared to the usage we are used to in git. It can also be somewhat dangerous, unlike git’s version of rebase. I believe the rebase extension was written as a proof of concept in Mercurial, but because the documentation says “this is dangerous” nobody ever actually used it or added features to make it actually usable or safe.

However, there is another option: convert the commits to a mercurial queue and then use the relatively robust mqueue features to craft the changesets the way you want them to look. This is an important tool because it means you don’t have to decide in advance whether you want to use commits or patch queues for a particular feature. Just create a named branch, make your commits, and if you need to convert it to a queue, do so. This isn’t the place to explain the intricacies of Mercurial queues, but here’s a basic example of how to use them to remove patches from a named branch and append (rebase) them to default:

hg init .
hg view &  # Watch the changes in realtime as you type
echo a >> a
hg add a
hg commit -m "a"
hg branch feature
echo b > b
hg add b
hg commit -m "b"
echo c >> b
hg commit -m "c"
hg update default
echo a >>a
hg commit -m "aa"
hg qqueue --create feature
hg qimport --rev "branch(feature)"
hg qpop --all
hg update default
hg qpush --all
hg qfinish --applied

Gitifyhg: Accessing Mercurial repos from GIT

My company uses Mercurial for internal hosting. Other than that, it’s an absolutely terrific place to work.

About three quarters of my colleagues are sick of hearing the rest of us complain about Mercurial’s inadequacies. They mention tutorials like this one that simply don’t work in real life. I’ve done my research, and I have not been able to find a viable git to hg pipeline anywhere. There is a git-hg repo, but it appears to be un-maintained and not overly well documented. It’s also written in bash, so I’m not eager to take over maintenance.

Instead, I’ve been spending some of my non-working hours trying to make a python wrapper around hg-git named gitifyhg. It started out as an automated version of the hg-git tutorials, but is now expanding to provide additional support and tools.

Right now I’m following a git-svn style of workflow, where your git master branch can be synced up with the hg default branch. It seems to be working somewhat ok. I think at this point, the pain of using gitifyhg is about equal to the pain of using hg alone. Hopefully future improvements will reduce that pain, and as always, patches are most welcome!

The instructions are pretty clear and the unit tests provide a pretty good description of basic usage. Non-basic usage probably doesn’t work (yet). Here’s a tutorial of what does work:

Acme corporation uses Mercurial for all their repositories. Fred and Wilma are developing a project named AcmeAdmin together. Fred is content to use Mercurial for hosting, but Wilma is a git advocate who is going crazy with the restrictions Mercurial places on her. She decides she’s going to try gitifyhg on this new project to see how much trouble it is.

Fred starts the project by creating a Mercurial repo that will serve as the remote repo for both of them:

mkdir acmeadmin
cd acmeadmin
hg init

Fred and Wilma are doing a very strange sort of pair programming where they have separate repositories on the same machine. They are doing this so if you were reading along, you could type in the same commands they are typing and the demo would work. Rest assured that in spite of this queer practice, Fred and Wilma are hotshot programmers. So Fred now clones this remote repo and starts working. In the meantime, Wilma is browsing the gitifyhg README, so she hasn’t cloned anything yet.

cd ..
hg clone acmeadmin fredacme
cd fredacme
echo "Write Documentation" >> TODO
hg add TODO
hg commit -m "write the documentation for this project."
echo "Write Unit Tests" >>TODO
hg commit -m "unit tests are done"
hg push

Wilma’s caught up now and eager to try gitifyhg. She takes over the keyboard and clones the remote repository, which already contains Fred’s changes. Then she runs gitifyhg and verifies that a git repository exists with Fred’s two commits.

cd ..
hg clone acmeadmin wilmaacme
cd wilmaacme
gitifyhg
git log

Fred’s gone for coffee. Wilma now starts her own hacking and pushes the commits using gitifyhg.

echo "Implement code until tests pass" >>TODO
git commit -m "code implemented" -a
git hgpush

Wilma got kind of lucky here because she implemented her code on the master branch and there were no conflicts with Fred’s work. If there had been, I’m not sure what would have happened. Fred’s finished his coffee and pulls in Wilma’s change via Mercurial. He then commits some more changes.

cd ../fredacme
hg pull -u
echo "deploy" >> TODO 
hg commit -m "it's up and running"
hg push

Meanwhile, Wilma is working on a separate feature. She remembers to create a new git branch this time, which is good because she’s going to want to do some rebasing when Fred’s commit comes in.

cd ../wilmaacme
git checkout -b feature_branch
echo "new feature documentation" >> FEATURE
git add FEATURE
git commit -m "start new feature by documenting it"

Before pushing her changes, Wilma pulls to see if Fred has done anything. He has! But the hgpull merges all that into her master branch. She checks out her working branch and rebases it onto master. Then she pushes it upstream.

git hgpull
git checkout feature_branch
git rebase master
git checkout master
git merge feature_branch
git hgpush

And that’s the basic workflow! I’m probably going to add an hgrebase command to take care of that checkout rebase checkout merge step, since I expect it to be common. It should probably use git rebase -i, which is probably the most amazing thing that ever happened to version control.

Like I said, this is how gitifyhg works when all goes well. When things don’t go well it’s still a bit of a mess. I’ve had to manually sync up the hg and git working directories using git reset --hard and hg update -C a couple times. Once, my hgpush ended up putting a bookmarked branch on my hg repo that I didn’t want to go public (luckily, hg push fails by default when there are multiple heads); I ended up having to use hg strip to clean it up. I’m hoping to automate or prevent some of this in the future. For now, try it out and submit patches or at least issues!

Learning hg as a git user

As my friend, Jason Chu recently noted, I am primarily a git user who has discovered a need to understand and use Mercurial. I am trying to refrain from judgment on Mercurial, as I’m easily bored by bikeshed discussions and holy wars. I have a pragmatic “use what you like and let me use what I like” philosophy, but when you are interacting with other people’s code, you occasionally have to use what they like.

I have read several articles that I do not intend to link to discussing the differences, and cheat sheets of hg equivalents of common git commands. These are utterly useless. Mercurial and git have different design philosophies, as Jason noted, even though the end result of their usage is much the same. If you’re comfortable with git, and interested in learning Mercurial, you may find my own eureka moment helpful.

hg commit is not the same as git commit.

Most comparisons of git and hg do not notice this distinction, but I was really puzzled by how hg could be more powerful than Subversion and supposedly equally powerful to my beloved git until I came to this realization.

In git, when you make a commit, you are creating local history that can be easily changed, modified, or ratified. You can rebase over those changes as many times as you like. You can use git commit –amend to change the commit or add changes to it. The history is not remotely considered “permanent” until you push it to a public repo, and even then, there are times when it is acceptable to rewrite it.

Conversely, in hg, when you commit, you are doing what the word actually says: committing. You are saying “this commit looks the way I want it to, I am finished with it.” You may not be pushing the commit to a remote repo any time soon, you may not be publishing it, but you have written in sandstone that this commit is complete.

I say written in sandstone, rather than stone because there are a variety of hg commands and extensions that allow local history editing, rebasing, and rollbacks. I haven’t learned how fluid these extensions are compared to equivalent history modification in git, but the feeling I am getting is that such changes would be considered much more invasive in hg than in git. History editing is a third party extension; this says to me “not officially supported” (as compared to built-in extensions like Mercurial queues). Mercurial typically desires us to think of a commit as an object that is permanently in the history. Many of the other slightly-deeper-than-cosmetic differences between the two systems seem to stem from this same basic difference.

In git, I have gotten quite used to coding first, and then creating an appropriate history later. There are numerous other potential workflows with git, but that’s the one I like. At first, I thought this was impossible or very difficult with Mercurial. However, when I realized that “commit” falls somewhere between the commands “commit” and “push” in git, things started to fall into place.

Mercurial has a powerful tool called queues that allow you to manipulate history to your heart’s content before you call commit. I’ve been using these effectively to create a workflow that I am comfortable with. It’s not the same as what I’d do in git, not remotely, but the overall outcome is similar.

A related basic understanding that is a little better documented than the difference between hg commit and git commit is the following:

hg branch is not the same as git branch.

Once again, hg branch lies somewhere between git branch and git push origin. When you call hg branch, you are stating an intent that the branch will be public. In git, you can have as many unpublished branches as you want. In Mercurial, this behavior is better achieved by the use of bookmarks, although I’ve found that Mercurial queues are easier to work with.

There are many tutorials for new hg users coming from a svn background, and a few tutorials for those coming from a git background. If you are hoping to learn Mercurial effectively, I suggest avoiding most of those options. It is much better to study Mercurial from the perspective of a programmer who hasn’t seen version control before. Such coders don’t exist (I hope!), but this attitude allows you to learn how the new system should be used, not how to make it behave like a system you are previously used to.

For mercurial basics, I strongly recommend http://hginit.com/, an irreverent and entertaining tutorial on the simpler concepts.

I had a lot of trouble understanding hg queues until I read the hg book chapter on the topic I intend to read the entire red-bean book at some point, as it appears to be much more coherent than the official Mercurial documentation. Now that I’ve been playing with hg queues for a day or so, I have come to understand that they can cover several common git tasks that appear to be missing from hg, including stashing, rebase -i, and similar. The key takeaway is you don’t commit your queues until you are quite certain you want them to become permanent history.

I haven’t yet figured out just when to choose hg queues over hg bookmarks, but a good read for getting used to hg bookmarks can be found here.

I strongly recommend enabling the hgk extension Just add the following to your ~/.hgrc. This will enable an hg view command that is more similar to gitk than it should be, considering the basic differences between branches in the two systems.

[extensions]
hgk =
mq = 
bookmarks =

(The second line is for enabling hg queues and third enables bookmarks.)

Overall, I suspect that I will always prefer git to hg. However, unlike subversion, I think Mercurial does supply me with tools I need to work effectively. Different tools from git, but effective nonetheless.

One beef I have with both git and Mercurial is that they violate the “one best way to do things” principle which makes learning, communicating about, and deciding how to use them more complicated than it needs to be.