Guido Van Rossum Should Retire (and focus on python)

At the two previous Pycons I’ve attended (2009 and 2012), Guido Van Rossum’s keynotes sounded bored and uninterested, even though the content was meaningful. I was actually wondering if this would be the year that he would step down from BDFL of Python. Thankfully, I was dead wrong.

Instead, he presented a highly technical and very exciting addition to the Python language. Alfredo told me this started when he took a month off between stepping down at Google and starting at DropBox. Now, when normal people take a month off, they relax or travel or visit friends and family. Not our BDFL. He writes a callback-free asynchronous event loop API and reference implementation that is expected to massively alleviate Python’s oft-maligned lack of a consistent, unhackish concurrency solution.

Let’s have more of that. What if Mr. Van Rossum could hack on Python full time? Would we see quantum progress in Python every month?

Anyone who knows about the Gittip project likely thinks they can guess where this is going. We, the people, can each tip our BDFL a few cents or dollars per week so he can focus on whatever he deems worthy. It’s safe to assume that a man who spends his vacation time drafting a new Python library would choose to work on Python full time if we funded him.

This plan is great, and I actually think that Guido could easily earn enough to quit his day job if he endorsed Gittip and invited individuals to tip him. But I’d like to discuss a different plan: Not individuals, but companies should tip Guido the maximum gittip amount as a sort of “partial salary”. At $1248 per year, most companies wouldn’t even notice this expense, and they would get a better programming language and standard library in return. The rate of accelerated development would be even higher if each of these companies chose to invest an entire salary, split between a hundred Python core and library developers. If a hundred companies chose to do this, those hundred people could work on Python full time. The language and library would improve so vastly and so rapidly that the return on investment for each of those companies would be far greater than if they had paid that same salary to a single developer working on their in-house product, full time.

It might take some convincing to justify such a strategy to these corporations. Companies tend to like to know what is happening to their money, and simply throwing a hefty developer salary at Gittip would be hard to justify. Obviously “goodwill” could support some of it, in the same way that so many companies sponsored Pycon in exchange for exposure.

Cutthroat CEOs should perhaps consider not just the value that having Guido working exclusively on Python is, but also the cost of having him work for the competition. I’m sure Box.com CEO Aaron Levie was a little nervous when he found out that the first and greatest Python programmer of all time had recently hired on at a major competitor. Perhaps Box.com can’t afford to steal Guido from Dropbox, but if all the companies currently involved in cloud storage were to tip Guido $24 per week on Gittip, this incredible programmer could be working on an open source product that directly and indirectly benefits their company rather than improving the competing product on a full-time basis.

Most of the arguments that Gittip will fail are based on the premise that not enough money can be injected into the platform to sustain full time development by open source programmers. However, if an open and caring relationship can be built such that the corporate world is also funding the system, I think it can become extremely successful. Everyone will benefit: Open source projects will improve at a rapid pace. Exceptional developers will get to pursue their passions. End users will get better products. The overall level of happiness in the world will be higher.

I would like to see a world where brilliant young software engineers are not risking their mental health (and consequently, their lives) on startup ideas in the hopes of being bought out for a few billion dollars. I would like to see a world where those engineers are not working for large corporations that have neither their employees nor their end users (but rather, their stockholders and advertisers) interests at heart. I would like to see a world where those developers can choose to invest their passion in open source products that will change the world.

My Android phone no longer has a Google account

This week, I was finally able to fulfil a longstanding goal: to delete my Google account from my Android phone. This is a step in a series of progressions towards “completely” disappearing from Google’s radar. I have been comfortable with the state of my laptop, which avoids all Google spyware using ghostery to block Google analytics, disabling cookies on all Google domains, and using Startpage.com for search. I’ve dropped Google Talk in favour of a jabber server hosted by a friend. While I still actively monitor my Gmail account via IMAP, it is not my primary address and is largely only used for correspondence that is already public, such as mailing lists and Google Groups.

The three things that I have still been using Google for were:

  • Maps
  • Paid Apps From Google Play
  • Contact Backup

I still use Google maps on occasion, though my main navigation equipment is an offline Garmin GPS device that — to the best of my knowledge — is not notifying anyone of my location at any time. I largely addressed the other two issues this past week.

I recently received my Cubieboard in the mail. It’s basically a specced up Raspberry PI. I installed Arch Linux by following the instructions at this thread.

I then set up Own Cloud by following the instructions at the Arch wiki. Once it was set up, I realized that I personally don’t have much use for calendar sync or file sharing, but that the contact backup was crucial. I didn’t want a full LAMP stack running on my little ARM processor, so I uninstalled Own Cloud and set up Radicale instead. Now my phone’s contacts are backed up and I no longer need my Google account to support that feature.

Then I was notified that AOKP, my current Android ROM of choice, had released an update. I thought “Hm, I wonder if I can get away with not installing the Google Apps package at all.”

I couldn’t. But I tried. The main issue is that there are two paid apps in my Google Play account (SwiftKey and SwipePad MoreSpace) that I do not want to live without, and do not want to purchase again from another vendor. In the case of SwipePad, I couldn’t even find another vendor. I toyed with backing up and restoring the .apk’s, but I got certificate and signing errors. I’ve read that these can be circumnavigated with Titanium Backup, but I haven’t gotten around to trying it yet.

So I installed Google apps and reluctantly activated my Google Account to install these two paid apps. Then I disabled my Google Account.

I then installed Aptoide to replace Google Play. It had recent versions of all the free apps I use on a regular basis. It looks like it will be able to supply my app needs into the future.

I have logged into my Gmail account and deleted my pre-existing contact list. This means that even if I do have to enable Google Play in the future, I will no longer be spammed with “Your friends like this app” messages. It also means Google will not be able to track my future relationships unless they are with people who use Google services.

Now if only Ghostery and Firefox would get Ghostery working on Android, I’d actually feel safe using my device!

Suicide is not a choice

There is a common misconception that those people who commit suicide have made a rational decision between two options and picked the one that they thought was most suitable for them. I’ve read this many times, often in the context of, “I really miss him, but I respect his choice.”

For those of you lucky enough to have never experienced suicidal thoughts, I want to make something clear:

Suicide is not a choice. It is a compulsion.

Obviously, I can’t really speak individually for the million people a year who take their own lives, nor for the order of magnitude more that failed in their attempt. There are, in fact, reasons for a mentally healthy person to choose (perhaps even rationally) to take their own lives. However, I believe that most of the people that killed themselves last year did not have a choice.

Consider a different illness for a moment. Consider cancer. It is a horrible disease. When a patient is diagnosed with cancer, they know they may recover, or that they may die. They don’t have a choice in the matter. Many patients find reserves within them to battle the illness with every tool available to them. Others don’t. Some survive, many die. Some beat the illness for a period, only to have the disease attack them again several years later.

In the case of cancer deaths, the cells in the human body turn on the victim to the point that it can no longer support that body’s vital systems.

Contrary to popular belief, mental illness works in much the same way. Instead of cells, it is the patient’s brain that turns on them. Their thoughts attack them repeatedly and incessantly until, eventually, they are compelled to destroy the body that houses them. Suicide is the result of an untreated psychological cancer.

Suicide ultimately arrives when the victim believes they do not have a choice. It becomes the only option. Suicidal thoughts begin as general thoughts about death. This leads to thoughts about the patient’s own death.They becomes obsessed, and begin to think about ways to actualize one or more of these scenarios. What options do they have, what tools can they use? Next, they are compelled to pick a time. If nothing changes, the time comes, and they die.

I speak from experience. Suicidal thoughts were my constant companion for twenty years, starting at the tender age of eight. At various times, I have reached the point where I believed I had no other options. I chose dates for my death. Luckily, phrases like, “I’d rather see you institutionalized than dead,” or “Do you need to be hospitalized” helped me realize there was something I hadn’t tried yet. I survived. At the moment, I am in remission, and I am optimistic that my “cancer” will not return.

So now, when I hear someone say, “It’s hard to deal with, but I respect her choice,” I hear the truth: “I am pretending she had a choice rather than admit that I didn’t give them the choice she needed.” Saying a suicide case had a choice is as insulting to their memory as suggesting that a cancer victim should have chosen to fight harder or a rape victim should have dressed differently.

Excluding tests with py.test 2.3.4 using -k selection

When I use py.test, I often rely on the -k switch to swiftly select the test I want to run. Instead of having to type the full module, class, and test path as is required with unittest and nose, I can just type a few characters that uniquely match the name of the test.

For example, if I have a test file containing methods test_basic_clone and test_basic_clone_notes, I can run the latter test simply by calling py.test -k clone_no.

However, I often create multiple tests that have similar names. This can make it difficult to run just one test if the test name is a prefix of a longer test name. If I want to run just test_basic_clone, any substring will also be a substring of the test_basic_clone_notes test, and both tests are matched by -k.

Since pytest version 2.3.4, the -k keyword supports expressions. So I can build an expression like this:

py.test -k "basic_clone and not notes"

This selects all tests matching “basic_clone”, then excludes any containing the word “notes”. Thus, I run only the test I’m interested in without having to fix my crappy naming scheme. It’s more typing than is normally the case, but is still less cognitive load than trying to remember what module and class I’m editing and constructing a selector based on those attributes.

A Treatise On The History Of Distributed Version Control

This is not yet another git vs Mercurial debate. I admit bias towards git, which I use whenever I have a choice. This is most of the time now that gitifyhg is awesome. However, I have been using Mercurial for my day job for about a year. I am more familiar with Mercurial and it’s extensions than many developers who prefer it. I consider myself an advanced git user (not an expert) and an intermediate Mercurial user.

I therefore have the background to claim that Mercurial and git are equally capable. Mercurial doesn’t have certain features of git that I miss, but those features are implementable with development time. Sometimes git’s interface isn’t as easy to use or teach as I would like, but aliases and projects like git extras alleviate this issue.

This article is about philosophy, not technology. Mercurial’s documentation, mailing lists, and stack overflow questions are littered with dire warnings that extensions that rewrite history are dangerous and best avoided. Git, on the other hand, takes a “consenting adults” approach to history rewriting. While it acknowledges that rewriting history can be dangerous and should be avoided in certain circumstances, it also allows the coder to choose when and how to apply this rule.

To avoid comparing the two systems, I’ll refer to the two styles as “permanent history” and “mutable history”. Both git and Mercurial are fully capable of maintaining both styles of history. However, Mercurial users tend to prefer “permanent,” while git users typically adopt a “mutable” approach.

The permanent history philosophy emphasizes that a changeset cannot be altered once it has been committed. It is important to record exactly what state the repository was in when the commit was made. If that state is not acceptable, then a new commit is made to correct it. Future readers of the history in question will see that a commit was made and that later, it was amended in another commit. Permanent history is analogous to a captain’s log or accountant’s general journal. Every action should be recorded separately.

The goal of committing in the permanent history paradigm is to record a specific state of the repository.

The mutable history philosophy, in contrast, sees changesets as individual paragraphs in a living story. It can and should be edited to ensure it tells the story as effectively and coherently as possible. Each changeset should have a topic sentence (the commit message) and supporting sentences (the patch). When a commit is initially made, the book is never assumed to be in the final draft that will go to the publisher.

The goal of committing in the mutable history paradigm is to record a related set of changes.

The different stages of code history

There are several stages that changesets go through as a program is written. These stages are perceived differently by the two styles.

  1. Working directory changes have not yet been committed.
  2. Local changesets have been committed but not yet pushed to any public repository.
  3. Public changesets have been published to a public repository and are available to other coders.

The working directory stage is treated identically by the two philosophies. If it hasn’t been committed, both styles have an “anything goes” attitude. If you screw something up and fix it, you are not expected to commit the bad code for posterity. If you leave a debugging statement in the code but catch it in a git|hg diff command, just delete it before committing. If your tests aren’t passing during the uncommitted changes stage, edit the files to make sure they do pass.

The attitudes diverge slightly when it is time to commit the changes in the working directory. Neither style requires committing all of the changes that are in the working directory. For example, if you edit a test that is not related to the feature you are about to commit, you can separate the two diverse changes into separate commits. However, this practice is more common in mutable history circles than permanent, largely because permanent history coders want to record the current state of the repository while mutable history followers are focusing on related changes.

The two histories have polar opposite beliefs about the local changeset stage. Permanent history maintains that a committed changeset should not be altered in any way. Some proponents may allow amending or rolling back the most recent commit, provided it has not been pushed publicly. They frown upon editing the “second last” or earlier commits, even if they haven’t been published.

Mutable history, on the other hand, takes the same “anything goes” approach that applies to the working directory stage. If changes have never been pushed publicly, then the mutable historian will comfortably rearrange and reorder them, move patch hunks from one changeset to another, or squash relevant commits together.

Permanent history fans may be surprised to learn that their philosophy on public commits is the same as mutable history’s. Once a commit has been pushed to the permanent public repository, both philosophies consider that it should not be changed, ever. If mutable history is likened to writing a story with coherent chapters, then public commits are like a published book. Once it has been published, the book should not be altered.

Let me reiterate: altering permanent published history is considered a Bad Thing by both philosophies.

There is a fourth stage available in the mutable style that permanent history does not allow. This “temporary public” stage lies between the local changesets and public changesets phases. At this stage, other people can see your changesets before they are moved into permanent public history. They may be rearranged and edited as if they are local history, but there must be agreement between all viewers that this section of history is still considered mutable. This is akin to sharing a draft of the book with a proofreader or copy-editor before it is published.

The source code for git itself is managed in this way, as discussed in maintain-git.txt. While it contains permanent public branches that canot be altered, it also describes a “pu” branch that is temporary public. This branch is used to share the state of upcoming changesets; other developers can provide feedback, not only on the quality of the code, but also on the quality of individual patches, commit messages, and ordering.

History Is Communication

There are several reasons to maintain code history. Some examples include:

  1. Preserve a record of past state in case you need to return to it.
  2. Compare two versions of a code base and to find the specific code that introduced a new bug.
  3. Concurrent development via patching and merging is virtually impossible without it.

However, the primary purpose of code history is to communicate. Each changeset implicitly communicates that the developer had some reason to take a snapshot of the repository at that time. It communicates exactly what the state of the repository was when the snapshot was taken, and is even able to communicate what changed between that snapshot and the previous one. The commit message describes these changes in English, preferably with a one line summary followed by a complete description of what changed and why.

While the two ideologies agree that history is extremely useful for communication, mutable and permanent history disagree as to what should be communicated.

Permanent history’s main purpose is to communicate “honestly” what happened, for all posterity to see. Each snapshot shows exactly what occurred in the repository. Two developers created different changesets in parallel and then at some specified point, they merged them. Someone forgot to delete a debugging statement and made a second commit to fix it.

Mutable history prefers to communicate “effectively”. The goal is to make local changesets as readable as possible before pushing them. Each changeset ideally contains a single related set of changes. Related changesets are further grouped together on individual branches. If this is not the case, they are modified or moved before being made public.

If you catch a problem in mutable history after committing but before pushing publicly, fix the commit. If two distinct changesets actually communicate a single idea, squash them together. If a single changeset contains two ideas, split them apart or move one to a different branch so the current one only contains cogent changes.

The permanent history crowd suggests that this rewriting of local changes before they are pushed is dishonest, or lying. However, it is easy to lie at the working directory stage in the permanent history paradigm. If you run an hg|git diff and notice that you forgot to delete a debugging statement committing, then it is perfectly acceptable to delete that line and “lie” about having forgotten it.

If they truly wanted to record what “honestly” occurred, permanent history tools would track every single change at the text editor or IDE level.

I think we all agree that this is ridiculous. In truth, permanent history shares mutable history’s desire to have clean, communicative commits. The primary difference is deciding when it’s “too late” to change them. In permanent history, once committed, you can’t change it. In mutable history, you can and should change it up until the point it is pushed to a permanent public repository.

The ability to change history before pushing allows the developer to separate the two distinct tasks of “coding” and “organizing”. Often, when coding, we encounter a separate issue that needs to be addressed, a missing feature, a bug, documentation that needs writing. In strict permanent history paradigm, your only “honest” option is to commit both features in a single changeset. However, permanent history rules are relaxed before the first commit has been made, so two other available options are:

  • shelve/stash the existing changes, write and commit the second feature, and then unshelve/stash apply
  • write the two distinct features in the same working directory and use git’s index or Mercurial’s crecord extension to commit them as separate patches.

These options are commonly used by mutable history developers, but they also have another option: Commit the two features and continue coding. Then reorganize or split the changesets into a sensible series of commits appropriate to good communication before pushing the features to a permanent public repository.

To create clean, well-ordered commits, the permanent history style demands that we think about one thing at a time and decide what the most relevant history communication path is before we start coding.

The mutable history style understands that programming doesn’t work this way. It is common and acceptable to begin work on one feature and discover a bad comment or FIXME you had forgotten and perform a psychological context switch to work on that.

One of my colleagues once informed me that “Mercurial is for people who don’t need to hide their mistakes.” This is bullshit for two reasons. First, Mercurial, like git, is perfectly capable of hiding mistakes. It’s easy to edit local unpublished changesets in Mercurial before pushing them live. There are numerous extensions — both third party and built in — that allow this kind of operation.

Second, this statement deliberately misrepresents the purpose of history rewriting. We don’t rewrite history to hide our mistakes. We do it for the benefit of future readers of our git|hg log. Reorganizing history when we write it greatly reduces the cognitive overhead for readers trying to understand what we did and why. History, like code, is meant to be read more often than it is written. Crafting it before pushing it publicly eases the amount of work for future readers of that history.

It is difficult to understand a series of cumulative changesets that keep undoing themselves or refactor large sections of code. It is better to order these changesets such that they make sense. I’m not saying that only the final product should be committed, if other changesets are able to communicate useful information. There are legitimate reasons, regardless of history ideology, to record mistakes in the permanent record of the repository. If the mistake has already been pushed publicly, the best thing to do is admit that you did not communicate as effectively as you had intended and make a new changeset to fix the problem. This is akin to providing an errata to a published book.

Another good reason is to record that some experiment you attempted was a failure. Perhaps it made the system unbearably slow or it arbitrarily deletes data. These commits can live in a branch in permanent history, forever documenting that this experiment was attempted, that it failed (so people don’t waste time trying it again), and how it failed (in case someone else wants to improve upon your design). Neither philosophy advocates the hiding of this kind of mistake. However, mutable history does expect that the failed commits be well ordered with commit messages that effectively communicate what you did and what went wrong.

History is a form of documentation. Like any documentation, it should be well-crafted and report the evolution of the system effectively. For example, one of the best early pieces of advice I received when I started using git (and inadvertantly learned the mutable history philosophy) is to “never use the word ‘and’ in a commit message.” The word “and” is a sign that you are trying to communicate two different ideas or changes in a single changeset.

There are other motivators for this piece of advice in addition to disseminating useful information. If you ever want to revert certain related changes, it is easier to do so if those changes exist in a single patch or consecutive set of patches. There is no need to extract the changes that you want to keep from combined snapshots. Both DVCS’s provide commands to trivially reverse earlier changesets, but this is only useful if the changesets contain only the idea you wish to reverse. It also eases communication when a commit message says “reverse the changes from revision X,” compared to “reverse some of the changes that do this and this as committed in revision X, but allow the lines that perform an unrelated operation alone.”

Further, if you want to apply an individual set of changes to a different branch of the project without merging an entire branch, it is a trifling matter if those changes are part of a single coherent, cohesive, consecutive set of patches.

Finally, if you don’t have write access to a project, single idea changesets make it easy for the person who is integrating your patches to see what you did and what you intended. Mutable history integrators will generally reject your patches if they do not communicate effectively. You are unlikely to ever get write access to an upstream project (if it follows the mutable history paradigm) if you do not prove that you adhere to the single concept per changeset guideline.

Admittedly, it takes more effort to alter history than to just take a snapshot when you feel the code is in a semi-acceptable state or you want to have a backup available. This effort has a huge long-term payoff. When people say that they don’t care about maintaining clean history, I get the same sense of distaste as “I don’t bother with writing tests” or “I tend to put documentation off to the last minute”. Not maintaining clean history is a sign of a lazy developer. You may be saving yourself time by just committing, but you are adding overhead to everyone who ever has to interpret your commit history, including yourself.

Code Review

Code review is an extremely useful tool for improving the state of a source code repository. It’s a simple concept: other members of the team review each changeset and make suggestions for future improvement.

Reviewers using the permanent history philosophy can improve the quality of the future codebase, but they cannot suggest improvements to patches already under review. They do not have an opportunity to improve the communication quality of those patches if they have been pushed publicly, which is normally the case if you want to share patches for review.

The mutable history style changes the point at which modifications are not allowed from “in the local directory” to “publicly pushed to the permanent repository”. Code can be pushed to a temporary public location for review purposes. Other team members can review it and comment on the quality, not only of the code, but also of the individual changesets. Once review is completed, and any suggestions integrated into the change history, the temporary repository can be safely deleted.

Thus, the code review phase becomes more than a review of the code, it is also a patch review, a history review. Code review gives other developers the chance to say, “This patch could communicate more effectively if…”.

Incremental Merging

The most dangerous moments in version control occur when two different branches of development that touch overlapping pieces of code have to be merged.Someone has to figure out what the two original sets of changes did and then figure out what the combined code has to do to accommodate both ideas. If the branches have been divergent for a long period of time, this job is nearly as difficult as rewriting both features entirely from scratch.

This is compounded by the fact that normally, the two different branches were written by different developers. While the person doing the merge may be intimately familiar with their own work, they have to become just as well-versed in the alternate branch before they can merge it safely.

Worse, all of these changes get combined into a large “merge commit” that basically includes the entire modified history of the two feature branches squished into a single gigantic diff. This is horrible for communication. If just one line of code was inappropriately merged, it becomes a nightmare to answer the question, “why did this work on the feature branch but fail after the merge”?

In the permanent history paradigm, it is common to attempt to alleviate this problem by merging frequently. This way, a smaller subset of changes can be covered in each merge. Unfortunately, this is a terrific way to introduce malicous or erronious changes into the history. People tend to assume merge commits “did the right thing” and don’t review them as closely as the patches being merged.

Moreover, the history becomes much less readable as these unnecessary merge commits clutter up the intention of both feature branches. Such commits do not serve any communication purpose other than, “the developer of this branch was afraid that divergent changes would be too hard to merge at a later, more appropriate time.”

The cognitive overhead is greatly reduced in the mutable history paradigm. Mutable history encourages rebasing over unnecessary merging. Instead of merging two branches of commits, rebasing makes one branch appear linearly after another branch. When rebased, the branch contains a series of commits that make sense in a linear order with no confusing merges in the history.

When you rebases a branch, each individual changeset is applied against the upstream branch, one at a time. Because the changesets contain small unit of changes, they are less likely to conflict, and therefore apply cleanly. When there is a conflict, it is easier to tell (from both the commit message and the code) how the code needs to be written to apply that changeset against new changes. This “one change at a time” process is much easier to apply than a single large merge commit. In addition, if you have previously rebased a branch and reordered commits for optimal communication, you will find that future merges or rebases onto other branches are even easier.

This means that, compared to a merge, you do not need to be as intimate with upstream changes from other developers. You figure out how you would have written each of your own changeset if you had been applying them directly to the upstream branch. This is not nearly so mentally fatiguing as trying to unravel two parallel sets of changes a la “ok, they did this, and I did this so I need to do this to get things back into a sane state”.

While I am avoiding a git vs Mercurial debate here, I’d like to point out that the various Mercurial utilities for rebasing and history editing are not very effective as compared to git’s tools. The rebase extesion, histedit, Mercurial queues, and pbranch all do the job, but they require more effort than in git. They don’t have git’s famous rerere functionality, they have potential to lose or obliterate history altogether, and they are neither as well integrated nor maintained as git’s tools. I do not say this to convince you to switch to git, but to point out that if you have tried these tools and found them lacking, it is not because the concept of rebasing and mutable history is a bad thing, but because the tools require further development.

Note that while mutable history users avoid unnecessary merges whose soul purpose is to reducing merge fatigue, they are not averse to merge commits that communicate useful information. So it is perfectly sensible to create a merge commit that demonstrate that a feature branch (usually containing a linear set of related changesets) has been merged into default. However, before the merge occurs, the feature branch should have its history edited in such a way that the entire branch will apply cleanly and no merge conflicts will occur that require any diff to be committed with the merge.

Have your cake and eat it, too

Distributed version control systems allow us to have multiple copies of repos in different states. If you feel strongly that an “honest” permanent history is important, perhaps it would be a good idea to keep this honest copy of history in a separate repository or a different branch. But for the sake of effective, coherent communication, maintain the main history in a mutable style.

In git, this can be done with branches. Simply do the development work on one branch (maybe give it a name like permanent/branchname to identify it as such). Push all changes to this branch as they occur. However, keep your master branch clean for most effective communication. When a feature is ready to go live, create a new branch from the commits on the permanent branch and rebase them onto master in a well-ordered manner that communicates clearly.

I’m not sure how viable this would be in Mercurial, since it’s not easy to copy commits between branches. More likely, it would be suitable to have two repositories; one that contains the permanent historical record, and one that contains the edited history.

I don’t personally believe this is necessary. The mutable history paradigm communicates everything I need it to. However, if you are unsure if you are ready to make the switch, I want to make it clear that it is possible to maintain both styles for a period while you experiment with the idea. If it turns out you don’t like the mutable history paradigm, you can always delete the offending mutable branches or repository… though, of course, this would be a mutation of history in itself.

I expect people who perform this experiment to realize that well-crafted history is worth the small amount of extra up-front effort required to maintain it.

Gitifyhg is now awesome.

Gitifyhg is a git client for pushing to and pulling from Mercurial repositories. I described the first implementation, which used hg-git internally, last month. However, I found that it didn’t work as well in practice as my initial tests had indicated, and I morosely reverted to using hg directly.

While researching ways to improve the app, I stumbled across git-remote-hg by Felipe Contreras. It claimed to be a cure-all to the git-hg bridging problem that “just worked”. So I downloaded and tried it, but it didn’t live up to the author’s hype. While it could handle basic cloning, pushing to and pulling from the default branch, it failed for me when working with named upstream branches, something I have to do regularly in my day job. I submitted several issues and pull requests, most of which were ignored. The more deeply involved I became with the code, the more I felt a complete rewrite was in order.

I had some free time during the holiday season, and started my version of a git remote as an exercise, with no intent to create anything useful. To my surprise, something useful emerged. In fact, I believe gitifyhg is the most robust and functional git to hg client currently available. More importantly, it is eminently hackable code: well tested and fairly well documented. I hope this will make it easy to contribute to, and that my inbox will soon be full of pull requests.

This is the real deal. The best part is that you don’t have to learn any new commands. It’s just basic git with mercurial as a remote. The only command that has changed from normal git usage is clone:

pip install gitifyhg
git clone gitifyhg::http://selenic.com/hg
cd hg

By adding gitifyhg:: before the mercurial url, you can git clone most mercurial repositories. If you can’t, it’s a bug. Other complex repositories I have successfully cloned include py.test and pypy.

You can easily use gitifyhg in the fashion of git-svn. All named branches are available as remote branches. default maps to master. Other branches map to branches/<branchname>.

If you want to commit to the master branch, I suggest a workflow like this:

git clone gitifyhg::<any mercurial url>
cd repo_name
git checkout -b working  # make a local branch so master stays prestine
# hack and commit, hack and commit
git checkout master
git pull  # Any new commits that other people have added to upstream mercurial are now on master
git rebase master working  # rebase the working branch onto the end of master
git checkout master
git push

Working on a named mercurial branch, for example feature1, is easy:

git checkout --track origin/branches/feature1
git checkout -b working  # make a local branch so feature1 stays prestine for easy pulling and rebasing
# hack and commit, hack and commit
git checkout branches/feature1
git pull  # New commits from upstream on the feature1 branch
git rebase branches/feature1 working  # rebase the working branch onto the end of feature1
git checkout master
git push  #push your changes back to the upstream feature1 branch

It is even possible to create new named branches (assuming my_new_branch doesn’t exist yet in Mercurial):

git checkout -b "branches/my_new_branch"
# hack add commit
git push --set_upstream origin branches/my_new_branch

These basic workflows have been working flawlessly for me all week. In contrast to my previous attempts to use git to hg bridges, I have found it easier to use gitifyhg than to work in the mercurial commands that I have become expert with, but not used to, in the past year.

Gitify hg is not yet perfect. There are a few issues that still need to be ironed out. There are failing tests for most of these in the gitifyhg test suite if you would like to contribute with some low-hanging fruit:

  • Anonymous branches are dropped when cloned. Only the tip of a named branch is kept.
  • Tags can be cloned and pulled, but not pushed.
  • Bookmarks can be cloned and pushed, but not pulled reliably. I suspect this is related to the anonymous branch issue.

So give it a shot. Gitifyhg is just one easy_install gitifyhg away.

Four ways to do local lightweight (git-style) branches in Mercurial

One of the many git features that I miss in my day-to-day work using Mercurial is local lightweight branching. That is to say, branches that I don’t push to a public repository until I know they are in a sane state, and that do not take up any room in the public branch namespace.

Until recently, I thought the only ways to create a local branch that did not get pushed to a remote repository were to use multiple local clones or Mercurial Queues. Turns out there at least four ways to do local lightweight branches in Mercurial.

This is a bug, not a feature. I’m a Python programmer and a huge proponent of the “There should be one– and preferably only one –obvious way to do it.” rule. In git, the answer to pretty much every question is either, “create a branch” or “step 1: create a branch…”. Git branches are simple and elegant. Mercurial branches are… well, it depends what kind of branch you want. You do know what kind of branch you want, right?

That said, I’m forced to work in Mercurial, and until gitifyhg is working well enough for daily use, I’m constantly looking for ways to work around the shackles Mercurial places on me. Ironically, I know more about Mercurial extensions than many of my pro-hg colleagues, simply because as a git user, I know what’s missing and always studying to fill the gaps.

So, here are four methods of creating a lightweight local branch in Mercurial. I leave it to you to figure out which one is best for your workflow.

Local Clones

This is the most often suggested method of local branching in Mercurial, which is a shame because it’s ugly and reminiscent of subversion. Essentially, you simply make a new clone of the repository and work in there. If you like your changes, you push them, if not, you delete the directory.

I’ve experimented with a folder structure like this:

code/myproject
    - staging
    - feature1
    - bugfix1
    ...

Each of the subfolders is a mercurial clone of the project. Staging is a clone of upstream. feature1, bugfix1, and its siblings are clones of staging. I manage my commits from staging, pulling from upstream or from the local clones as needed.

How to make a “local branch”:

cd ..
hg clone staging featurename

How to commit changes to your “local branch”:

hg commit

How to delete a “local branch” if you don’t want it to see the light of day:

cd ..
rm -rf featurename

How to merge a finished feature and push upstream:

cd ../staging
hg pull -u # pull in upstream commits
hg pull ../featurename
hg hgview # see what's going on
hg merge # good luck

Queues

Most mercurial users I talk to recommend avoiding mercurial queues. I have no idea why, they are one of the most useful tools in the Mercurial toolbox. It does annoy me to be forced to learn an entirely different set of commands to manage patches before and after they are “made permanent”, but if you’re going to be working regularly in Mercurial, they are vital to maintaining the workflows you are used to from git.

Mercurial queues are simply a set of patches that have not yet been “commited”, with a collection of tools for managing them. Because the patches have not been committed, they can be reordered and rearranged for the best possible communication effect.

Steve Losh has written a great introduction to Mercurial Queues for git users. Chapter’s 12 and 13 of the hg book are more comprehensive guides, but can be hard to follow.

It is possible to push and pull queues (they are stored in a separate repository with the main mercurial repository), but if you actually want to share the unfinished code you’ve written, I think it’s better to either create a proper named branch and share it, or temporarily share your local repository with hg serve.

How to make a “local branch”:

hg qqueue --create feature

How to commit changes to your “local branch”:

hg qnew patchname # having to specify a name for each patch is annoying
hg qrefresh # update the most recent patch with your latest changes

You can also use hg qrecord instead of qrefresh to add specific changes, but it’s not as robust as git add -p (it doesn’t support splitting or editing hunks).

How to delete a “local branch” if you don’t want it to see the light of day:

hg qpop --all
hg qqueue patches # switch to the default queue
hg qqueue --delete feature

How to merge a finished feature and push upstream:

hg qfinish --applied

Phases

Mercurial users tend to recommend bookmarks when git users ask for lightweight branches. That doesn’t really work because all it does is give a temporary name to an anonymous head instead of a permanent one. If you push to the public repository, your branch will be pushed too, regardless if it was named, bookmarked or anonymous.

It turns out, Mercurial has a concept called phases, which allow you to mark a changeset as public, draft, or secret. Anything that has been pushed or pulled is considered public. All other local patches are draft by default, which means they will be pushed publicly when you call hg push. But if you set the phase of a patch to secret, that patch is not allowed to be pushed publicly.

The hgview extension is good for indicating what phase various commits are in.

How to make a “local branch”:

hg commit -m "initial commit to the feature"
hg bookmark feature
hg phase --force --secret feature

How to commit changes to your “local branch”:

hg commit # it's a normal mercurial commit, but now it's secret

How to delete a “local branch” if you don’t want it to see the light of day:

hg strip -rev "secret() and ancestors(feature)"

You’ll need the mercurial queue extension to get the strip command. You might want to double check that the revset I wrote is suitable for you; I’m not too clear on mercurial revsets yet.

How to merge a finished feature and push upstream:

hg phase --draft feature
hg push

Now, the interesting thing is that you can set secret phase on a named branch as well as a bookmark. So you can have locally modified named branches that don’t get pushed until you’re good and ready to push them, or never get pushed at all. Just replace bookmark with branch above.

I learned about phases today. I haven’t really used them in practice, yet, but I believe they will become my new favourite way to do local changes in Mercurial.

push -r

It is possible to tell mercurial to only push certain revisions to the upstream repository, thus keeping your other branches local. I wasn’t too keen on this at first because it seemed far too easy to accidentally push stuff I wasn’t ready to push. However, Mercurial does warn you before you push a new branch (anonymous or named), so as long as you have actually created a branch (rather than putting commits on top of the upstream branch), it will be quite safe.

To do this, simply pass the -r or --rev parameter when you hg push. For example, if you only want to push the default branch, pass --rev default and commits on any other branches will remain local.

How to make a “local branch”:

hg commit -m "initial commit to the feature"
hg branch feature

How to commit changes to your “local branch”:

hg commit # it's a normal mercurial commit

How to delete a “local branch” if you don’t want it to see the light of day:

hg strip -rev branch(feature)

How to merge a finished feature and push upstream:

hg commit -m "close feature branch" --close-branch
hg update default
hg merge feature
hg push --new-branch

How to push changes on default without pushing the new branch:

hg update default
hg push --rev default

You’ll notice that I used a named branch here. It’s possible to do the same thing with bookmarks or anonymous branches, but you would have to specify revision numbers manually and would probably have trouble defining revsets.

Bonus: qimport

This isn’t strictly to do with local branching, but it addresses another issue git users have with mercurial: rearranging commits on a branch that has never been published.

Mercurial users point to the rebase extension when we complain about this, but that extension is woefully inadequate compared to the usage we are used to in git. It can also be somewhat dangerous, unlike git’s version of rebase. I believe the rebase extension was written as a proof of concept in Mercurial, but because the documentation says “this is dangerous” nobody ever actually used it or added features to make it actually usable or safe.

However, there is another option: convert the commits to a mercurial queue and then use the relatively robust mqueue features to craft the changesets the way you want them to look. This is an important tool because it means you don’t have to decide in advance whether you want to use commits or patch queues for a particular feature. Just create a named branch, make your commits, and if you need to convert it to a queue, do so. This isn’t the place to explain the intricacies of Mercurial queues, but here’s a basic example of how to use them to remove patches from a named branch and append (rebase) them to default:

hg init .
hg view &  # Watch the changes in realtime as you type
echo a >> a
hg add a
hg commit -m "a"
hg branch feature
echo b > b
hg add b
hg commit -m "b"
echo c >> b
hg commit -m "c"
hg update default
echo a >>a
hg commit -m "aa"
hg qqueue --create feature
hg qimport --rev "branch(feature)"
hg qpop --all
hg update default
hg qpush --all
hg qfinish --applied

New Arch Linux Laptop Sticker Design


I’ve tweaked the design on the incredibly popular Arch Linux Laptop Stickers available through the Arch Linux Schwag store. I’ve taken the black outline of the sticker for both a more modern look and to alleviate problems when the decal cutters don’t quite line up with the border. I hope you like the effect!

My camera isn’t top of the line, so if someone would supply me some quality photos of these new stickers in action, I’d be happy to include them on the Arch Schwag website.

Thank you for supporting Arch Linux.I am always eager for new ideas to put in the Arch Schwag store. I’

Addressing mistakes I made in releasing Hacking Happy

When I released Hacking Happy two weeks ago, I made a rather serious mistake. It’s the first time I’ve self-published in eBook format. I put a lot of effort into thinking through the release and marketing of the book, but one problem slipped through.

I believe that when you purchase a digital product, you are purchasing the content, not the format. I released Hacking Happy as an eBook in four different formats, each available for download at a minimum purchase price of $5. This was the easiest way to make the book available on Gumroad, and I didn’t think about it much. I thought it would look good on the home page to have links to several different formats! However, I didn’t consider that someone may want copies of the book in two different formats. There are various reasons they may want to do this, and I do not believe they should have to pay full price for each of the different formats when they are essentially getting the same content.

Therefore, I have now made Hacking Happy available as a zip file of all four formats, in addition to the other download links. It is the same minimum price as the other links. However, this didn’t help anyone who had already supported me in buying the book in a single format. Luckily, Gumroad allows me to e-mail my buyers and I was able to supply them with a private link to the zipfile if they wish to access other formats.

Of course, since you own the content you purchased, you are welcome to convert it to other formats as you see fit!

The other issue people raised had more to do with marketing than the book itself. Part of the discussion on Hacker News pointed out that the excerpt didn’t really say much about what was in the book. I have alleviated this by adding a table of contents to the excerpt link on the home page and by choosing an excerpt from a chapter other than the introduction. I believe the chosen excerpt is representative of the contents of the book, and also highlights my writing style.

The response to this book has been very humbling. Other than complaints from people who chose not to purchase it, the feedback has been entirely positive. It has received one five star review on Amazon and I have received e-mails of support, congratulations, and gratitude. I knew when I wrote the book that it was necessary and would fill a niche, and I knew when I published it that I had done a good job. But the feedback reinforcing that knowledge has brought me as much happiness as the process of writing the book did!

This blog is not ad supported

I’m sick of the whining about internet ad blocking and the claims that it is or should be illegal.

This blog is not ad supported. It does track you, for which I hope you will forgive me, using WordPress Stats. If that bothers you — and it should — please install the ghostery extension. But it is not ad supported.

I believe this gives my visitors a better experience. Obviously, your experience is enhanced by the lack of distracting advertisements screen real estate being used to display things that are hopefully more valuable to you. But there’s more to it than that.

When you visit my blog, you can read each article on a single page. You don’t have to click through three “next page” links because someone wants to maximize their ad revenue. Further, my articles are (I hope) concise and to the point. I have no incentive to add irrelevant details to an essay in order to increase the number of pages you view. So you can read my thoughts and get on with your day.

More subtly, when you visit my blog, you can be sure that every article contains information that I consider to be valuable. I don’t write content-free essays with juicy titles to attract ad impressions. My visitors are not cattle whom I milk for ad revenue.

I started this blog over three years ago because Judd Vinet, founder of Arch Linux, had suggested I do so. I was just getting started in freelancing, and he said I’d be amazed how many clients can come out of a well-written technical post that happens to get top rating in Google. This has turned out to be true. In this light, the blog is itself an advertisement, showcasing my skills as a programmer, and more recently, as an author. Nowadays, I write articles, not so people will hire me (I’m actively hiding from head hunters), but so they will see those links to published books, gittip, and flattr on my sidebar.

I also write articles as a contribution to open source projects, both by promoting or introducing those projects to the few thousand visitors this blog receive per month, and by providing tutorials or instructions for them.

And I write because I can’t help writing. I am keenly aware of my audience, and thus the process is rather interactive. I write about things I believe you will find interesting. I want you to keep coming back, not just to use my open source projects, not just so you’ll tell you’re friends about my books, but because I want to keep writing articles you will read.

One of my more popular articles, bizarrely enough, has been the CSS popups are annoying rant I wrote three years ago. I notice that there are far fewer CSS popups today than there were back then. I’d love to take credit for that, though I have trouble being that vain. In that article, I suggested boycotting all websites that use CSS popups. Today I’d like to suggest a few additional actions we can take to stop advertising from ruining our internet experience:

  • Use adblock and ghostery. Not just to protect your privacy and improve your browsing experience, but to send a signal to the entire internet that you are a human being, not a product.
  • Avoid any site that display articles in multi-page format. I have a few worst offenders remapped to 127.0.0.1 in my hosts file.
  • Avoid sites that consistently publish content-free posts with juicy titles.
  • Start supporting non-advertising income streams for individual content creators. This can range from financial contributions via sites like gittip or flattr to purchasing or subscribing to products the author has posted for sale to simply writing a review or recommendation promoting their product or content to other people.

I’d like to close with a message for you to pass on to those people whining that ad blocking cuts into their advertising revenue:

If your soul purpose in writing a blog is to make money off of ad revenue, stop writing and find a true passion. While you are making a few dollars or maybe a few hundred dollars a month off of Adsense, Google is making billions of dollars. Yes, you are being used, and yes, you are being cheated, but not by the visitors who are blocking your ads.

Advertising, especially targeted advertising, is a huge industry right now, but I believe and hope it is going to die. People are learning that word of mouth is a much more reliable way to discover a product than advertising. Businesses are switching from advertising to discovering subtle ways to manipulate users into doing the advertising for them. Further, people are becoming more educated about how big businesses are abusing them. I’m not the only one that is sick of being treated like a product instead of a customer.