Archive for June 2010

Packaging django management commands: Not Zip Safe

I had a devil of a time sorting this out; it must be documented in other places, but on the off chance that this information is useful to anyone, I’m posting it here.

I wrote a django app that had a custom management command in it. When I ran this app in my development environment, it ran fine. But when I deployed it from Pypi as an egg, the command mysteriously disappeared. Django simply did not see it.

Django not seeing management commands is a common problem for me, it seems like I always have to say “please” in just the right tone of voice before my custom management commands will work. This problem, however, was a new one for me.

I ended up exploring the Django sources and discovered, eventually, that the imp builtin was unable to find anything inside a zipped egg. This struck me as odd, so I did some research on a nifty tool I discovered (over a decade ago) called Google.

I came across this message, which basically says that Django and Django apps are “flat-out not zip-safe and probably never will be”, making specific reference to custom management commands.

Then I had to do more research to figure out how to mark my app as not zip safe. I ended up switching my setup.py for the app from distutils.core to setuptools and added a zip_safe=False argument to the setup call.

In addition, for my future and perpetual sanity, I discovered that buildout can accept an unzip=true command to ALWAYS unzip eggs. I placed this under the [buildout] section in my buildout.cfg.

Converting a Django project for zc.buildout

Jacob Kaplan-Moss has written an article on using zc.buildout to develop a django application. My goal is slightly different: I want to deploy an entire django project, with numerous dependencies, using zc.buildout. The documentation seems scarce, so I’m trying to keep track of each step as I go, in the hopes that it may be useful to someone someday.

I have an existing Django project that I’m having trouble deploying and sharing with other developers. It’s located in a private github repository. So my goal is not only to manage a Django project, but to manage an already mature project. This is, of course, harder than starting from scratch.

I do my development on Arch Linux, which is currently running Python 2.6 (and 3.1, but Django isn’t supported, so I’m using 2.6 for this project). I have git version 1.7.1, and my project is using Django version 1.2.1.

Since I didn’t know what I was doing, I started by doing some exploring. I created an empty directory and ran:

wget http://svn.zope.org/*checkout*/zc.buildout/trunk/bootstrap/bootstrap.py

to install the buildout bootstrap. I then created a very simple buildout.cfg file based on the djangorecipe example:

[buildout]
parts = django
eggs = ipython
 
[django]
recipe = djangorecipe
version = 1.2.1
eggs = ${buildout:eggs}
project = my_project

I then ran:

python bootstrap.py
./bin/buildout

Suddenly, my directory containing only two files (bootstrap.py and buildout.cfg) looked like this:

bin
buildout.cfg
downloads
my_project
bootstrap.py
develop-eggs
eggs
parts

Jacob’s article has an excellent description of all these files. The main question for me was “where does my source go?” This example shows that the project source code goes in my_project. Djangorecipe had created the following structure in that directory:

development.py
__init__.py
media
production.py
settings.py
templates
urls.py

The development.py and production.py files both have from my_project.settings import * calls, and then customize some variables. My habit has always been to have a localsettings.py in my .gitignore and include from localsettings.py import * in my main settings.py. For my project I had to decide whether to stick with my old habits, or modify my setup to be parallel to the djangorecipe version.

I see that djangorecipe has a way to select the settings to use for a given buildout, but if buildout.cfg is under version control, wouldn’t that make selecting settings a pain? And if each developer has a different database setup, would we require a different settings module for each developer? In my experience, it is better to do things the way the examples in the documentation say it should be done, because they know what they’re doing and I don’t. But in this case, I decided to keep my layout as is. I can always change it later.

The thing I wanted to learn from that experiment was where my source goes; apparently it goes in a folder with my project’s name at the same level as buildout.cfg and bootstrap.py. Looks like I’m going to have to move my code around in my project’s version control.

First I checked out a new branch, because that is the thing to do in git. Specifically because I want it to be easy to go back to the status quo if I decide, halfway through the process, that buildout is a pain to configure.

git checkout -b buildout

The first thing I want to do is move all my files into a new subdirectory with my project’s name, so buildout can have the top of the git tree for it’s own files:

mkdir my_project
git mv -k !(my_project) my_project
mv localsettings.py my_project
rm *.pyc
git commit

The git mv command essentially says “move anything that isn’t my_project into my_project“. The -k switch says “just ignore it if it isn’t under version control.” This left my localsettings.py and a few .pyc files in the main directory, since those files are in .gitignore, so I cleaned those up manually. Finally, I committed the changes, so the move happened in one place.

Now it’s time to start creating a new buildout, this time in the version controlled directory. I ran the wget command to get bootstrap.py, and I copied the buildout.cfg from my exploration directory. Then I ran the bootstrap and bin/buildout commands to see what happened. It did the same thing before, except for providing a django: Skipping creating of project: my_project since it exists. That’s what I wanted. Running git status showed several patterns that needed to be added to my .gitignore:

.installed.cfg
bin
develop-eggs
downloads
eggs
parts

I also had to change the .gitignore file to ignore my_project/static/uploads instead of just static/uploads.

At this point, I decided to commit bootstrap.py and buildout.cfg:

git add bootstrap.py buildout.cfg
git commit

Now, I know I’m missing dozens of dependencies, but I wanted to see what happens if I run bin/django. My understanding is that this is supposed to be a wrapper similar to management.py, but using the buildout’s django environment. It failed, telling me that the development settings.py file didn’t exist. I modified the buildout.cfg to add settings = settings to the django recipe. Then I ran bin/django again, and nothing had changed.

Whenever you change buildout.cfg, you have to also run bin/buildout to create the new environment (rant: I hate compile steps!).

I was worried that my custom management commands (in my case, for py.test testing, and running south migrations) would not show up, but there they were, listed in the help output that bin/django provided. This is especially surprising, since I have not installed south inside the buildout yet! It appears that bin/django is a drop-in replacement for manage.py.

Next, I ran bin/django shell expecting to enter dependency hell. Not yet! Instead, I got the error “no module named my_project.settings”. Looking at the bin/django script, it is trying to prepend the project name to the project. I have a habit of not including an __init__.py in my project directory, preferring to think of a django project as a collection of apps, rather than an independent project. I don’t want to write from my_project.my_app import something, because then the apps are no longer reusable. In my world, the project is not a package. Apparently, djangorecipe thinks it is. So touch my_project/__init__.py had to happen, since I definitely don’t want to start hacking the recipe at this point!

Now I have “no module named ” errors for each of my INSTALLED_APPS. Because I list my apps as “x” instead of “myproject.x”. To fix this, I added extra-paths = my_project, which inserts the project directory into the path.

Then I ran bin/django shell and bin/django runserver only to discover that everything was working! Apparently my buildout had not installed to a private environment, and was still accessing the default site-packages on my system. Not quite what I wanted. I thought zc.buildout created an isolated environment, much like virtualenv, only portable across systems. My mistake.

zc.buildout does not create an isolated sandboxed environment by default.

I had to do a lot of google searching to come to this conclusion. There are many statements out there that suggest that zc.buildout can and does create an isolated environment, but none of them turned out to be true. zc.buildout is all about reproducibility, while virtualenv is about isolation They are not competing products, and the ideal environment uses both of them.

So I removed all the temp files and directories (including the hidden .installed.cfg) that buildout had created for me and started over to install them to a virtualenv:

virtualenv -p python2.6 --no-site-packages .
source bin/activate
python bootstrap.py
bin/buildout

I temporarily removed IPython from the eggs because it was refusing to download. The server must be down. This time, when I run the bin/django shell, I get a proper dependency error for psycopg2. Looks like I’m finally on the right track. I also had to add several directories virtualenv had created to my .gitignore.

Before buildout, I had a rather complicated dependencies.sh file that installed all my dependencies using a combination of easy_install, git checkout, hg checkout, etc. I started with the easy_install stuff; stuff that can be installed from Pypi. I created a new eggs part in my buildout. The entire file now looked like this:

[buildout]
parts = eggs django
 
[eggs]
recipe = zc.recipe.egg
interpreter = python
eggs =
    psycopg2
    south==0.7
    django-attachments
    pil==1.1.7
    Markdown
    recaptcha-client
    django-registration-paypal
    python-dateutil
 
[django]
settings = settings
recipe = djangorecipe
version = 1.2.1
eggs = ${eggs:eggs}
project = my_project
extra-paths = my_project

Trying to run bin/buildout now causes a “Text file busy” error. At this point, I’m seriously considering that buildout is more of a pain than a help. It’s poorly documented and broken (some might say poorly documented IS broken). And I know I have an even harder task coming up when I have to patch a git clone.

But, I’m obstinate and I persevered. Google was quick to confirm my hypothesis that virtualenv and buildout were both trying to access the “bin/python” file. The solution was to change the interpreter = python line in my recipe; I called the buildout interpreter” py”.

This time, when I ran bin/django shell I got an error pertaining to a module that needs to be installed from git. Time to look for a git recipe! Here’s how it eventually looked:

[django-mailer]
recipe = zerokspot.recipe.git
repository = git://github.com/jtauber/django-mailer.git
as_egg = True

I also had to add django-mailer to my parts in the [buildout] section, and arranged the [django] extra-paths section as follows:

extra-paths =
    ${buildout:directory}/my_project
    ${buildout:directory}/parts/django-mailer

I had a second git repository to apply, and this one was messy because the code on the project was not working and my dependencies.sh was applying a patch to it. I was considering whether I had to hack the git recipe to support applying patches when I realized a much simpler solution was to fork it on github. So I did that, applied my patch, and rejoiced at how simple it was.

Finally, I had to install an app from a mercurial repository (because we can’t all use the One True DVCS, can we?) I found MercurialRecipe, but no examples as to how to use it. It’s not terribly difficult:

[django-registration]
recipe = mercurialrecipe
repository = http://bitbucket.org/ubernostrum/django-registration

With all my dependencies set up, I was finally able to run bin/django shell without any errors.

Now I have to figure out how to make this thing work in production, but that’s another post. I hope it works flawlessly on my co-developer’s Mac. Hopefully the pain will be less painful than the old pain. This was a huge amount of work, several hours went into it, and I won’t know for a while if it was worth it.

Django Registration Paypal Backend

One of my clients needed integration with Paypal for his site. The django-paypal module is great for this sort, but it didn’t quite suit our needs because we wanted to disallow account logins until payment had been collected. We had been using django-registration to support registration, and I decided to stick with it. The development version of django-registration has support for different registration backends. The default version sends an e-mail to verify the user’s e-mail address before they can get in. A simpler backend included with the project allows direct registrations with no verification. We needed a backend that required, not only e-mail address verification, but also confirmation that a paypal payment had been made.

And so, django-registration-paypal was born:

http://pypi.python.org/pypi/django-registration-paypal/0.1.1

http://github.com/buchuki/django-registration-paypal

It is, like all my first releases, pretty rough around the edges, and contains at least one glaring security hole. But it’s out in the wild now, and patches, as always, are welcome.

Google Wave Re-evaluated

Several months ago, I posted an evaluation of Google Wave that wasn’t too positive.

Since then, the Wave interface has had several minor, but very important improvements that make it much more pleasant to use. If there were more people using it, I would definitely view it as a viable alternative to e-mail and possibly even instant messaging. It still needs some serious usability engineering (it will never become mainstream unless they overcome the modal editing, for one thing), but I believe Google’s wave client has promise.

However, I no longer view wave as an alternative e-mail and IM. I’ve been considering it lately, as a replacement for HTTP. That’s right, the entire web. The current version of the web is highly interactive and realtime. HTTP was not designed for this. It was designed as a set of static resources intertwined with links. Then Javascript came along, and AJAX. The latest is a host of HTTP push technologies such as Comet.

These technologies allow the server to send information to the client without the client having to request it. A clever hack keeps an HTTP connection open for long periods of time so incoming data can be pushed across it without the client having to make a new request. Does this sound brilliant or what?

You’re doing it wrong! If our applications are now highly interactive systems that need to both push and pull data to and from the server, HTTP is not the protocol to use. In my opinion, these push technologies are messy. No, not just messy, but dirty, filthy hacks intended to force a system to be used in a way that is the opposite of what was designed.

No, if we are developing web applications these days that require two-way communication pipes (and we are), we need a two-way protocol.

Enter Google Wave. The Wave federation protocol is exactly that: a set of extensions to the two-way XMPP protocol. Wave is not a terrific solution for hosting static content, but static content is a rare commodity these days. Wave provides an ideal platform for most of the common activities people interact with on a daily basis. It already gives us chat, wiki, document editing, and e-mail like functionality for free. With a bit of effort, it can give us a blogging type of functionality (with comments), using public waves.

A couple of simple robots could replace Twitter. Wave doesn’t have support for “following” a user. You have to be added to specific waves. However, it would be trivial to write a robot that a user adds to any waves they want to have subscribed followers added to. “Retweet” functionality should also be fairly easy to implement. And this would be implemented on a distributed system, so there would be no more of those extremely irritating “twitter over capacity” messages.

Wave even solves the mighty facebook privacy issue. Only share your wave with those people you explicitly want to view it. Current policy is to post your ideas on Facebook public to all your friends. But a massive paradigm shift is occurring, as more and more people accidentally include their parents, employers, or spouse in conversations that were meant for other audiences. People are clamoring for Facebook to make it trivial to SELECT who we share information with. Guess what? Wave already does this.

In spite of all this, I’m still not on the Wave bandwagon. I don’t insist or even suggest that anyone should start using. The current browser-based client is complicated and intimidating and has a large learning curve. In today’s world, people expect things to be as easy to use as Google Search or Twitter. Wave is not. The currently available wave client is more complicated than e-mail or instant messaging. It’s not going to take off.

I believe there is potential for simpler clients to be designed. I’m finally understanding why Google has focused on solidifying the wave server and protocol, rather than improving the public facing client. They are the kind of technologies we should be looking to in the future, not HTTP. Google Wave may never take over the world, but at some point, something will have to replace the static HTTP protocol. Maybe wave, maybe HTTP 2.0 will have extensions for two way communication, maybe an entirely new protocol yet.