Posts tagged ‘python’

Packaging django management commands: Not Zip Safe

I had a devil of a time sorting this out; it must be documented in other places, but on the off chance that this information is useful to anyone, I’m posting it here.

I wrote a django app that had a custom management command in it. When I ran this app in my development environment, it ran fine. But when I deployed it from Pypi as an egg, the command mysteriously disappeared. Django simply did not see it.

Django not seeing management commands is a common problem for me, it seems like I always have to say “please” in just the right tone of voice before my custom management commands will work. This problem, however, was a new one for me.

I ended up exploring the Django sources and discovered, eventually, that the imp builtin was unable to find anything inside a zipped egg. This struck me as odd, so I did some research on a nifty tool I discovered (over a decade ago) called Google.

I came across this message, which basically says that Django and Django apps are “flat-out not zip-safe and probably never will be”, making specific reference to custom management commands.

Then I had to do more research to figure out how to mark my app as not zip safe. I ended up switching my setup.py for the app from distutils.core to setuptools and added a zip_safe=False argument to the setup call.

In addition, for my future and perpetual sanity, I discovered that buildout can accept an unzip=true command to ALWAYS unzip eggs. I placed this under the [buildout] section in my buildout.cfg.

Converting a Django project for zc.buildout

Jacob Kaplan-Moss has written an article on using zc.buildout to develop a django application. My goal is slightly different: I want to deploy an entire django project, with numerous dependencies, using zc.buildout. The documentation seems scarce, so I’m trying to keep track of each step as I go, in the hopes that it may be useful to someone someday.

I have an existing Django project that I’m having trouble deploying and sharing with other developers. It’s located in a private github repository. So my goal is not only to manage a Django project, but to manage an already mature project. This is, of course, harder than starting from scratch.

I do my development on Arch Linux, which is currently running Python 2.6 (and 3.1, but Django isn’t supported, so I’m using 2.6 for this project). I have git version 1.7.1, and my project is using Django version 1.2.1.

Since I didn’t know what I was doing, I started by doing some exploring. I created an empty directory and ran:

wget http://svn.zope.org/*checkout*/zc.buildout/trunk/bootstrap/bootstrap.py

to install the buildout bootstrap. I then created a very simple buildout.cfg file based on the djangorecipe example:

[buildout]
parts = django
eggs = ipython
 
[django]
recipe = djangorecipe
version = 1.2.1
eggs = ${buildout:eggs}
project = my_project

I then ran:

python bootstrap.py
./bin/buildout

Suddenly, my directory containing only two files (bootstrap.py and buildout.cfg) looked like this:

bin
buildout.cfg
downloads
my_project
bootstrap.py
develop-eggs
eggs
parts

Jacob’s article has an excellent description of all these files. The main question for me was “where does my source go?” This example shows that the project source code goes in my_project. Djangorecipe had created the following structure in that directory:

development.py
__init__.py
media
production.py
settings.py
templates
urls.py

The development.py and production.py files both have from my_project.settings import * calls, and then customize some variables. My habit has always been to have a localsettings.py in my .gitignore and include from localsettings.py import * in my main settings.py. For my project I had to decide whether to stick with my old habits, or modify my setup to be parallel to the djangorecipe version.

I see that djangorecipe has a way to select the settings to use for a given buildout, but if buildout.cfg is under version control, wouldn’t that make selecting settings a pain? And if each developer has a different database setup, would we require a different settings module for each developer? In my experience, it is better to do things the way the examples in the documentation say it should be done, because they know what they’re doing and I don’t. But in this case, I decided to keep my layout as is. I can always change it later.

The thing I wanted to learn from that experiment was where my source goes; apparently it goes in a folder with my project’s name at the same level as buildout.cfg and bootstrap.py. Looks like I’m going to have to move my code around in my project’s version control.

First I checked out a new branch, because that is the thing to do in git. Specifically because I want it to be easy to go back to the status quo if I decide, halfway through the process, that buildout is a pain to configure.

git checkout -b buildout

The first thing I want to do is move all my files into a new subdirectory with my project’s name, so buildout can have the top of the git tree for it’s own files:

mkdir my_project
git mv -k !(my_project) my_project
mv localsettings.py my_project
rm *.pyc
git commit

The git mv command essentially says “move anything that isn’t my_project into my_project“. The -k switch says “just ignore it if it isn’t under version control.” This left my localsettings.py and a few .pyc files in the main directory, since those files are in .gitignore, so I cleaned those up manually. Finally, I committed the changes, so the move happened in one place.

Now it’s time to start creating a new buildout, this time in the version controlled directory. I ran the wget command to get bootstrap.py, and I copied the buildout.cfg from my exploration directory. Then I ran the bootstrap and bin/buildout commands to see what happened. It did the same thing before, except for providing a django: Skipping creating of project: my_project since it exists. That’s what I wanted. Running git status showed several patterns that needed to be added to my .gitignore:

.installed.cfg
bin
develop-eggs
downloads
eggs
parts

I also had to change the .gitignore file to ignore my_project/static/uploads instead of just static/uploads.

At this point, I decided to commit bootstrap.py and buildout.cfg:

git add bootstrap.py buildout.cfg
git commit

Now, I know I’m missing dozens of dependencies, but I wanted to see what happens if I run bin/django. My understanding is that this is supposed to be a wrapper similar to management.py, but using the buildout’s django environment. It failed, telling me that the development settings.py file didn’t exist. I modified the buildout.cfg to add settings = settings to the django recipe. Then I ran bin/django again, and nothing had changed.

Whenever you change buildout.cfg, you have to also run bin/buildout to create the new environment (rant: I hate compile steps!).

I was worried that my custom management commands (in my case, for py.test testing, and running south migrations) would not show up, but there they were, listed in the help output that bin/django provided. This is especially surprising, since I have not installed south inside the buildout yet! It appears that bin/django is a drop-in replacement for manage.py.

Next, I ran bin/django shell expecting to enter dependency hell. Not yet! Instead, I got the error “no module named my_project.settings”. Looking at the bin/django script, it is trying to prepend the project name to the project. I have a habit of not including an __init__.py in my project directory, preferring to think of a django project as a collection of apps, rather than an independent project. I don’t want to write from my_project.my_app import something, because then the apps are no longer reusable. In my world, the project is not a package. Apparently, djangorecipe thinks it is. So touch my_project/__init__.py had to happen, since I definitely don’t want to start hacking the recipe at this point!

Now I have “no module named ” errors for each of my INSTALLED_APPS. Because I list my apps as “x” instead of “myproject.x”. To fix this, I added extra-paths = my_project, which inserts the project directory into the path.

Then I ran bin/django shell and bin/django runserver only to discover that everything was working! Apparently my buildout had not installed to a private environment, and was still accessing the default site-packages on my system. Not quite what I wanted. I thought zc.buildout created an isolated environment, much like virtualenv, only portable across systems. My mistake.

zc.buildout does not create an isolated sandboxed environment by default.

I had to do a lot of google searching to come to this conclusion. There are many statements out there that suggest that zc.buildout can and does create an isolated environment, but none of them turned out to be true. zc.buildout is all about reproducibility, while virtualenv is about isolation They are not competing products, and the ideal environment uses both of them.

So I removed all the temp files and directories (including the hidden .installed.cfg) that buildout had created for me and started over to install them to a virtualenv:

virtualenv -p python2.6 --no-site-packages .
source bin/activate
python bootstrap.py
bin/buildout

I temporarily removed IPython from the eggs because it was refusing to download. The server must be down. This time, when I run the bin/django shell, I get a proper dependency error for psycopg2. Looks like I’m finally on the right track. I also had to add several directories virtualenv had created to my .gitignore.

Before buildout, I had a rather complicated dependencies.sh file that installed all my dependencies using a combination of easy_install, git checkout, hg checkout, etc. I started with the easy_install stuff; stuff that can be installed from Pypi. I created a new eggs part in my buildout. The entire file now looked like this:

[buildout]
parts = eggs django
 
[eggs]
recipe = zc.recipe.egg
interpreter = python
eggs =
    psycopg2
    south==0.7
    django-attachments
    pil==1.1.7
    Markdown
    recaptcha-client
    django-registration-paypal
    python-dateutil
 
[django]
settings = settings
recipe = djangorecipe
version = 1.2.1
eggs = ${eggs:eggs}
project = my_project
extra-paths = my_project

Trying to run bin/buildout now causes a “Text file busy” error. At this point, I’m seriously considering that buildout is more of a pain than a help. It’s poorly documented and broken (some might say poorly documented IS broken). And I know I have an even harder task coming up when I have to patch a git clone.

But, I’m obstinate and I persevered. Google was quick to confirm my hypothesis that virtualenv and buildout were both trying to access the “bin/python” file. The solution was to change the interpreter = python line in my recipe; I called the buildout interpreter” py”.

This time, when I ran bin/django shell I got an error pertaining to a module that needs to be installed from git. Time to look for a git recipe! Here’s how it eventually looked:

[django-mailer]
recipe = zerokspot.recipe.git
repository = git://github.com/jtauber/django-mailer.git
as_egg = True

I also had to add django-mailer to my parts in the [buildout] section, and arranged the [django] extra-paths section as follows:

extra-paths =
    ${buildout:directory}/my_project
    ${buildout:directory}/parts/django-mailer

I had a second git repository to apply, and this one was messy because the code on the project was not working and my dependencies.sh was applying a patch to it. I was considering whether I had to hack the git recipe to support applying patches when I realized a much simpler solution was to fork it on github. So I did that, applied my patch, and rejoiced at how simple it was.

Finally, I had to install an app from a mercurial repository (because we can’t all use the One True DVCS, can we?) I found MercurialRecipe, but no examples as to how to use it. It’s not terribly difficult:

[django-registration]
recipe = mercurialrecipe
repository = http://bitbucket.org/ubernostrum/django-registration

With all my dependencies set up, I was finally able to run bin/django shell without any errors.

Now I have to figure out how to make this thing work in production, but that’s another post. I hope it works flawlessly on my co-developer’s Mac. Hopefully the pain will be less painful than the old pain. This was a huge amount of work, several hours went into it, and I won’t know for a while if it was worth it.

Django Registration Paypal Backend

One of my clients needed integration with Paypal for his site. The django-paypal module is great for this sort, but it didn’t quite suit our needs because we wanted to disallow account logins until payment had been collected. We had been using django-registration to support registration, and I decided to stick with it. The development version of django-registration has support for different registration backends. The default version sends an e-mail to verify the user’s e-mail address before they can get in. A simpler backend included with the project allows direct registrations with no verification. We needed a backend that required, not only e-mail address verification, but also confirmation that a paypal payment had been made.

And so, django-registration-paypal was born:

http://pypi.python.org/pypi/django-registration-paypal/0.1.1

http://github.com/buchuki/django-registration-paypal

It is, like all my first releases, pretty rough around the edges, and contains at least one glaring security hole. But it’s out in the wild now, and patches, as always, are welcome.

Python 3 Object Oriented Programming

For the past eight months, I’ve been working hard on a project that’s a little out of the ordinary, for me. It’s the reason there’s been such a drastic reduction in number of blog posts here. It’s the reason I haven’t been earning enough money to cover my expenses each month. It’s my biggest accomplishment to date.

I’ve written a book (an entire book!) on object oriented programming, with a focus on syntax and libraries supported in the exciting new Python version 3. It’s designed for beginner to intermediate Python developers who are more familiar with Python as a scripting language than as an object oriented programming language.

As a byproduct, it also introduces Python 3 syntax, and will be a great reference for programmers wanting to upgrade their Python 2 skills. For the most part, Python 3 is a simpler, more elegant language. The learning curve is shallow, but it takes some getting used to.

It also summarizes the state of the most exciting libraries available for Python 3 at this time. If you’ve been wondering when it’s time to start migrating to the new language, it is now!

I’m currently in the rewrite phase on the book (it’s time consuming!) but it’s already available for preorder directly from my publisher:

https://www.packtpub.com/python-3-object-oriented-programming/book

I’m not great at marketing, so to put it bluntly: I hope you all buy a copy! I’ve put a great deal of effort into this project, and I’m very proud of the result. This book is a great resource and fills a void in the available references. It also fills a void in my available writings, as my blog posts tapered off over the past few months!

Image Manipulation in Python 3

Enough libraries have been ported to Python 3 to finally make it seriously possible to write real world code in this modern Python interpreter. Sure, we don’t have django or really any decent web framework yet (CherryPy runs, but it’s not a full web stack), and database support is limited (SQLAlchemy supports postgres and sqlite3), but for the most part, if you need to do something Python 3, you can.

The major exception I’ve discovered is Image manipulation. The Python Imaging Library has not been ported to Python 3, and there is no indication when it will be. The latest version of PIL, 1.1.7 was released in late 2009, with an indication that it would be “made available for Python 3,” but no estimate as to timeline. There are no mailing list posts answering an increasingly popular question “when will PIL be available for python 3?” I found no source repositories that indicate that it has been started. There is a patch floating around that can supposedly be applied to PIL 1.1.6 to make it Python 3 compatible, but it didn’t work for me.

I tried doing a port of PIL 1.1.7 myself, but was unable to find documentation for the modifications to the C extension API in Python 3. My C is pretty rusty, and my schedule is way too full for the next two months, so I gave up on the task. Because of the lack of support from the official PIL developers (I hold nothing against them; we’re all busy in the open source world, and can only contribute what we have time and interest to contribute), I’m hoping this post will motivate someone to attempt a port of PIL, making Python 3 that much more attractive.

If you came to this post looking for some kind of Python 3 image manipulation library and were disappointed, don’t be! The Pygame image module allows us to load a few image formats, and we can manipulate the resulting surfaces in a variety of ways. It’s lower level and not as comprehensive as the Python Imaging Library, but it is one useful alternative if you need to do image manipulation in Python 3.

The Utility Of Python Coroutines

Coroutines are a mysterious aspect of the Python programming language that many programmers don’t understand. When the first came out I thought, “Cool, now you can send values into generators to reset the sequence… when would I use that?” The examples in most books and tutorials are academic and unhelpful.

Last year, I attended David Beazley’s course A Curious Course On Coroutines along with a fellow Archer. We agreed that it was an exceptionally interesting course (Beazley built an OS scheduler in Python, with just a minimal amount of code: how cool is that), but that we didn’t see any practical application of it in our regular work.

Yesterday, I started working with the Tornado code to port it to Python 3. Tornado uses an async framework; I hate async because I hate working with code like this:

def somemethod(self):
    #
    self.stream.read_until("\r\n\r\n", self.callback)
 
def self.callback(self, content):
    # handle content read from the stream

I understand the utility of this code; while the stream is being read, the app can take care of other stuff, like accepting new connections, until the stream has been read. You receive high speed concurrency without the overhead of threads, or the confusion of GIL. When the read is complete, it calls the callback function. It makes perfect sense, but when you read code with a lot of such callbacks, you’re constantly trying to figure out where the code went next.

In my mind, the above code is really saying:

def somemethod(self):
    #
    self.stream.read_until("\r\n\r\n")
    # give up the CPU to let other stuff happen
    # but let me know as soon as the stream has finished reading
    # handle content read from the stream

I find this paradigm much easier to read; everything I want to do surrounding content is in one place. After pondering different ways to write a language in which this was possible, it hit me that this is what coroutines are for, and it’s possible in my preferred language.

Because coroutines use generator syntax, I thought they had something to do with iterators. They don’t, really. The above code can be written like so:

def somemethod(self):
    #
    self.stream.read_until("\r\n\r\n")
    content = (yield)
    # handle the content

The calling code would call somemethod() and somemethod().next(), and eventually, when content is available, somemethod().send(content) to drive it.

A generator compiles to an object with an iterator interface. The coroutine above (sort of, but not really, at all) compiles to a function with a callback interface (you could say it is an iterator over callbacks). You can use yield multiple times in one method to receive more data (or to send it; put the value on the right side of yield, like in a generator).

The mainloop that called this code would still be at least as complicated to read as it is using a callback syntax, but the objects on the async loop are now much easier to read.

This paradigm has been implemented in the Diesel web framework. I’ve looked at it before and thought it was an extremely bizarre way to design a web framework. I still do, but now I understand what their goals were. If you’ve ever struggled with the, “why would I ever use this?” question when it comes to coroutines, now you understand too.

I have no immediate plans to rewrite my tornado port using coroutines, but maybe someday if I’m bored, I’ll give it a try.

SimpleHTTPServer in Python 3

If you’ve been doing any testing of client code that uses urllib or httplib, you probably know about this command:

python -m SimpleHTTPServer

This starts a very simple server in the current working directory; it serves all files from that directory, and is, quite simply, the quickest way to get something set up if you want to test some kind of web parsing or client code. (It’s also handy if you want to fire up a server to easily share files from your hard drive for a few minutes).

SimpleHTTPServer has been merged with BaseHttpServer into the http.server package in Python 3. I couldn’t easily find documentation for the new command, and ended up writing the following simple code:

from http.server import HTTPServer, SimpleHTTPRequestHandler
 
httpd = HTTPServer(('127.0.0.1', 8000), SimpleHTTPRequestHandler)
httpd.serve_forever()

Then I did a bit more digging around and realized that this command does what the old one did.

python3 -m http.server

My code performs a little differently (it only serves on the localhost interface), but if anyone is looking for the old SimpleHTTPServer command line, there you have it.

The http.server module is normally supposed to be as a base for creating more complicated server environments (see your favourite web framework, for example), but the fact that it can be executed directly has a great deal of utility as well.

By the way, if you didn’t know about SimpleHTTPServer, you might also be interested in the built-in smtpd server as well. I use this command frequently:

python -m smtpd -n -c DebuggingServer localhost:2525

This runs a simple smtp server on the given interface and port, and outputs all mail sent to that port to the console. It is very useful for testing and debugging web-based send-mail forms and such. You can, of course, run a standard smtpd server by not passing the -c DebuggingServer.

Managing my TODOs in 2010

Last May I wrote an article on my ideal todo list. I implemented it in offline-enabled format, but never got around to writing the server-side code and it didn’t get used. I’ve been using a paper-based day book effectively all year, but the book is filled up.

Today, starting a new year, I needed something quick to manage my todos. I’m on a bad internet connection, and don’t want a web-based app; even offline enabled apps are quirky. I decided to write something quick and dirty using the command line. Half an hour later, this is what I have:

  • All my todos are stored in text files in one directory.
  • Each textfile contains the things I want to accomplish in one day, named after that day in 2010-01-31 format so they show up in sorted order.
  • I edit the files in my favourite text editor and put a “*” beside ones I’ve completed.
  • I wrote some scripts to easily open “relative” names such as TODAY, YESTERDAY, TOMORROW, and TWODAYS side by side.
  • I named each script starting with a 1 so that they show up at the beginning of the listing. This is useful in gui file managers as I can double click those scripts to open them.
  • I don’t actually use gui file managers much, but I put a link to this one on my desktop with a fancy icon so I don’t forget my tasks.
  • When I opened the directory in nautilus, I discovered that I can zoom in on the files, and actually read their contents without opening them. I switched it to compact view so I can fit more TODOs in one screen.
  • I’ll probably have one extra text file for “things that need to be done eventually.”
  • I haven’t really tested it, but I intend to use it for the next week and revise it as necessary. I may have to whip up a web.py server to give a simple interface to it from my phone, or maybe ConnectBot will suffice. It’s not important at the moment, I don’t take the phone anywhere due to a complete lack of coverage.

    If it seems to be working as well as the daybook did last year, I’ll keep it up. If I tend to forget to use it, like other electronic solutions I’ve tried, I’ll get a new daybook.

    What little code there is, I’ve posted to github.

Django Permissions System Issues

In a recent post I bashed Django’s url handling as being too complex. In this one I’m going to bash (with a great deal of respect) the permission system. Before I do this, I want to state one thing unequivocally:

I love Django. It’s incredibly powerful, and most of the time it makes complex stuff simple, and trivial stuff easy. I don’t believe there is a web framework out there that is more suitable for most of the tasks I need to do (I use web.py extensively for some of those exceptions). But Django has issues (usually based in over-engineering: the primary pest of Python programming), and my goal is to remind people not to assume that just because something is the “proper” way to do things in Django does not mean it is the proper way to do it in Python.

The Django permissions system works extremely well in this scenario: You have a bunch of different tasks related to models (usually add, edit, and delete, which Django kindly automatically creates for every model you create) that you want to allow different people to be able to do, based on the type of model. You can classify permissions into groups, such that different groups have specific access to different types of models. Then you can add users to those groups. But if you want to fine-tune permissions on a user-by-user basis, you are free to do so.

This model is great for broadcast sites like blogs and news sites (the original problem domain for which Django was designed). But when you have interactive sites where users can log in, it fails drastically because it lacks row level permissions. I can’t use the Django permission system to ensure that a user can only modify posts they created, and not the posts of their mortal enemy on another account.

That’s a well-known beef with the Django permissions system, and there are various work-arounds available. In this case, however, Django fails to make a complex task simple.

But that’s not my main beef with Django’s permissions.

Most of the specs I receive seem to classify users into groups, similar to the Django concept of groups. Certain groups are able to do certain things. But my specs don’t generally define the permissions so well. They usually associate groups with views. Group A can access View B, Group B can access Views C and D. This is a much coarser grain of control than Django permissions.

The Django permission system supports this. All I have to do is figure out which permissions are active in any given view and ensure that those permissions are required (Django gives me a nice decorator to do this) to access the view. Then I just need to make sure each group of users have exactly the permissions they require to access those views.

What? That’s all I have to do? In practice, that is even more complicated than it sounds. I need to define fixtures for the groups (which sometimes fail since we’re dealing with content types here, and they don’t cooperate with fixtures) so that end users don’t have to go mucking with permissions. It’s easy to overlook a permission in a group or in a view, which makes it impossible for a group of users to access a view they’re supposed to be able to access. Worse, it’s also easy to add a permission to a group that shouldn’t have it. With n3 (n is the number of models) or more permissions, it’s really easy to get confused.

My solution to date has been to write a whole passel of tests to ensure that exactly the right people are accessing exactly the right views. That’s a good thing, of course. Lots of tests always is. The disturbing thing is how many of those tests fail when I write them. If I overlooked any tests, there are probably outstanding permission bugs. That could be a simple annoyance when a user can’t log in, or a blatant security hole when an anonymous user manages to access (or worse: modify) data meant to be private.

In this case, Django is failing to make a simple task trivial.

Over time, I’ve started “cutting corners” to save time. Instead of checking individual permissions, I set my views up to check which groups the user is in. This obviously doesn’t give me as much control, but it is working at an abstraction level suitable to the task. If I suddenly need more control over permissions, I can easily fall back on Django’s permissions system. I’ve realized that my “cutting corners” is actually “writing less complicated, easier to maintain, more Pythonic code.” Having realized this, I’ll probably write a few extra auth decorators like @group_required to simplify my code.

In summary, while the Django permissions system is best of class for it’s intended purpose, it does not extend adequately to more complicated tasks (row-level permissions), nor does it simplify elegantly to cover more trivial tasks (group-level permissions).

Why I don't like reverse and {% url %} in Django

Django has a feature that allows you to look up an url based on a function name and arguments. Here’s how it works:

######
# urls.py
urlpatterns = patterns('',
    (r'^projects/(?P\d+)/$', 'views.view_project'),
)
 
######
# views.py
def view_project(request, project_id):
    # create the view
 
######
# something.py
some_url = reverse("views.view_project", 4)
# some_url = /projects/4/
 
######
# sometemplate.html
<a href="{% url 'views.view_project' 4 %}">My project</a>

The idea behind urls.py is to separate urls, which are not supposed to change (because someone may have bookmarked them, there might be links to the page, etc) from implementation (views.py), which can change at any time (due to refactoring, new features, and bugfixes). You can move functions around at will, and simply change urls.py to point at the right function without breaking people’s links. I love this decoupling.

The idea behind reverse and url is that you should not be hard-coding urls into your code, you should use the names of the functions being called instead. I’ve tried to do this because its the “proper thing to do” if you’re a django coder… but I don’t like it, and I’m going to quit using it in my personal projects.

Here’s why:
1) It’s harder to maintain. As stated above, I often change the name or location of a view function. When I do that, I have to go through all the files that have a url or reverse call and change the view there. If I had hardcoded the url, I’d only have to change it in urls.py. I’ve been stung by this much more often than by changing urls.

2) Its harder to read. When you see “/path/to/something” you know you’re looking at an url. When you see reverse(‘some.module.path’, some, arg), it takes longer for the brain to parse, even if you know what the reverse call does.

3) In the case of {% url %} it exposes implementation details to the template author. Template authors should not know or care what python functions are being called internally. But they know what an url is, and what it represents.

In short, it adds an extra layer of abstraction to url handling. I like abstraction layers to be thin and useful; in the case of reverse(), the added complexity does not, in my minimalist opinion, justify the supposed gain in simplicity.