Archive for the ‘Python’ Category.

Python 3 Web Framework

I got it in my head this weekend that it was about time someone wrote a web framework for Python 3. My head is kind of stubborn about these things, so I asked it some questions:

Does the world need another web framework?
Do I need another web framework?
Do I have time to do this?

The answers were all “no.” Still, I’m planning to go ahead with it until I get bored. Then the project can sit and collect dust with all my others.

A bit of discussion with The Cactus, led to a few ideas:

I discovered that QP is apparently the “first Python-3 enabled web framework.” I didn’t try it, so I was perhaps unfair in discarding it, but it doesn’t look… suitable.

I looked around some more, and found that CherryPy is about to release a Python 3 enabled version. I’m sure that will spawn a whole slough of Python 3 frameworks built around CherryPy. I considered such a plan (I’d call it ChokeCherryPy based on a receipe my mom devised): create some kind of templating system based on str.format, some session support, and some kind of database module wrapped around py-postgresql. Could be fun. But I’d end up with a mess of third-party technologies much like TurboGears, and that would be embarrassing, plus I’m sure the TG team already has people working on this.

Then I came back to my original plan, which was to port either Tornado or web.py to Python 3. Tornado looks like a smaller codebase (easier to port) and I’ve never used it before, so it’s also a chance to learn something new. So today I forked Tornado on github and run 2to3 on it. I’ve already got the “hello world” demo running; it wasn’t too hard once I figured out the difference between bytes and strings. At least, I think I did that part correctly.

The project is named psyclone, a little play on the ‘destructive weather patterns’ genre. I was close to p3clone, but it’s too hard to convince people it should be pronounced ‘cyclone’.

This isn’t a project I expect to go anywhere; django will be ported to Python 3 soon enough, and other frameworks will be popping up all over. But I’ve been working with Python 3 a lot lately, and I thought it was time to tackle the ‘scary’ job of porting an existing app. It’s tedious, but not difficult.

SimpleHTTPServer in Python 3

If you’ve been doing any testing of client code that uses urllib or httplib, you probably know about this command:

python -m SimpleHTTPServer

This starts a very simple server in the current working directory; it serves all files from that directory, and is, quite simply, the quickest way to get something set up if you want to test some kind of web parsing or client code. (It’s also handy if you want to fire up a server to easily share files from your hard drive for a few minutes).

SimpleHTTPServer has been merged with BaseHttpServer into the http.server package in Python 3. I couldn’t easily find documentation for the new command, and ended up writing the following simple code:

from http.server import HTTPServer, SimpleHTTPRequestHandler
 
httpd = HTTPServer(('127.0.0.1', 8000), SimpleHTTPRequestHandler)
httpd.serve_forever()

Then I did a bit more digging around and realized that this command does what the old one did.

python3 -m http.server

My code performs a little differently (it only serves on the localhost interface), but if anyone is looking for the old SimpleHTTPServer command line, there you have it.

The http.server module is normally supposed to be as a base for creating more complicated server environments (see your favourite web framework, for example), but the fact that it can be executed directly has a great deal of utility as well.

By the way, if you didn’t know about SimpleHTTPServer, you might also be interested in the built-in smtpd server as well. I use this command frequently:

python -m smtpd -n -c DebuggingServer localhost:2525

This runs a simple smtp server on the given interface and port, and outputs all mail sent to that port to the console. It is very useful for testing and debugging web-based send-mail forms and such. You can, of course, run a standard smtpd server by not passing the -c DebuggingServer.

Django Permissions System Issues

In a recent post I bashed Django’s url handling as being too complex. In this one I’m going to bash (with a great deal of respect) the permission system. Before I do this, I want to state one thing unequivocally:

I love Django. It’s incredibly powerful, and most of the time it makes complex stuff simple, and trivial stuff easy. I don’t believe there is a web framework out there that is more suitable for most of the tasks I need to do (I use web.py extensively for some of those exceptions). But Django has issues (usually based in over-engineering: the primary pest of Python programming), and my goal is to remind people not to assume that just because something is the “proper” way to do things in Django does not mean it is the proper way to do it in Python.

The Django permissions system works extremely well in this scenario: You have a bunch of different tasks related to models (usually add, edit, and delete, which Django kindly automatically creates for every model you create) that you want to allow different people to be able to do, based on the type of model. You can classify permissions into groups, such that different groups have specific access to different types of models. Then you can add users to those groups. But if you want to fine-tune permissions on a user-by-user basis, you are free to do so.

This model is great for broadcast sites like blogs and news sites (the original problem domain for which Django was designed). But when you have interactive sites where users can log in, it fails drastically because it lacks row level permissions. I can’t use the Django permission system to ensure that a user can only modify posts they created, and not the posts of their mortal enemy on another account.

That’s a well-known beef with the Django permissions system, and there are various work-arounds available. In this case, however, Django fails to make a complex task simple.

But that’s not my main beef with Django’s permissions.

Most of the specs I receive seem to classify users into groups, similar to the Django concept of groups. Certain groups are able to do certain things. But my specs don’t generally define the permissions so well. They usually associate groups with views. Group A can access View B, Group B can access Views C and D. This is a much coarser grain of control than Django permissions.

The Django permission system supports this. All I have to do is figure out which permissions are active in any given view and ensure that those permissions are required (Django gives me a nice decorator to do this) to access the view. Then I just need to make sure each group of users have exactly the permissions they require to access those views.

What? That’s all I have to do? In practice, that is even more complicated than it sounds. I need to define fixtures for the groups (which sometimes fail since we’re dealing with content types here, and they don’t cooperate with fixtures) so that end users don’t have to go mucking with permissions. It’s easy to overlook a permission in a group or in a view, which makes it impossible for a group of users to access a view they’re supposed to be able to access. Worse, it’s also easy to add a permission to a group that shouldn’t have it. With n3 (n is the number of models) or more permissions, it’s really easy to get confused.

My solution to date has been to write a whole passel of tests to ensure that exactly the right people are accessing exactly the right views. That’s a good thing, of course. Lots of tests always is. The disturbing thing is how many of those tests fail when I write them. If I overlooked any tests, there are probably outstanding permission bugs. That could be a simple annoyance when a user can’t log in, or a blatant security hole when an anonymous user manages to access (or worse: modify) data meant to be private.

In this case, Django is failing to make a simple task trivial.

Over time, I’ve started “cutting corners” to save time. Instead of checking individual permissions, I set my views up to check which groups the user is in. This obviously doesn’t give me as much control, but it is working at an abstraction level suitable to the task. If I suddenly need more control over permissions, I can easily fall back on Django’s permissions system. I’ve realized that my “cutting corners” is actually “writing less complicated, easier to maintain, more Pythonic code.” Having realized this, I’ll probably write a few extra auth decorators like @group_required to simplify my code.

In summary, while the Django permissions system is best of class for it’s intended purpose, it does not extend adequately to more complicated tasks (row-level permissions), nor does it simplify elegantly to cover more trivial tasks (group-level permissions).

Authenticated Threaded Comments in Django 1.1

I’m looking to include threaded comments in my latest project, such that only authenticated users can use them. The Django ThreadedComments project seems to have lost momentum; as the old site says that the new version will use the comment hooks provided in Django 1.1 and will be backwards incompatible, but very little fresh development has been done.

Turns out, almost all the required functionality is available for free in the standard Django comments system, and you only need one slightly messy hack.

Authenticated Comments

I want a system that doesn’t ask for name/email/url; you can only post if you’re logged in, and the only visible box is one for comments. Behind the scenes, django automatically fills in these details from the currently logged in user if they are not supplied. It also kindly attaches that user to the comment (so it can be retrieved as comment.user). All you have to do is restrict commenting to logged in users and render the template directly to ignore the fields you don’t need. Like this:

some_template.html

{% if user.is_authenticated %}
    <h4>Submit New Comment:</h4>
    {% get_comment_form for project as comment_form %}
    {% include 'comment_form.html' %}
{% endif %}

comment_form.html

    {% load comments %}
    <form action="{% comment_form_target %}" method="POST">
        {{comment_form.content_type.errors}}{{comment_form.content_type}}
        {{comment_form.object_pk.errors}}{{comment_form.object_pk}}
        {{comment_form.timestamp.errors}}{{comment_form.timestamp}}
        {{comment_form.security_hash.errors}}{{comment_form.security_hash}}
        {{comment_form.comment.errors}}{{comment_form.comment}}
        <span>{{comment_form.honeypot}}</span>
        <br />
    </form>

Most of the variables in comment_form.html are hidden fields to help keep spammers at bay. If you prefer, you can do this same customization a bit higher in the abstraction layer by using a Custom Comment Form that doesn’t contain the url, e-mail and name fields at all. The django.contrib.comments.forms.CommentSecurityForm holds everything you need except a honeypot.

Threaded Comments

Django comments can be attached to any object… including existing comments. That idea and some thoughtful template design provides threaded comments:

some_template.html

{% get_comment_list for project as comments %}
{% for comment in comments %}
    {% include 'comment.html' %}
{% endfor %}

Here I’m getting all the comments attached to a specific “project”. For non-threaded comments, comment.html would basically just contain {{comment}} to render the comment. But I want threaded comments:

comment.html

{% load comments %}
<div class="comment">
    {{comment.user}}
    {{comment.comment}}
    {% get_comment_list for comment as comments %}
    {% with 'comment.html' as comment_include %}
        {% for comment in comments %}
            {% include comment_include %}
        {% endfor %}
        {% if user.is_authenticated %}
        <br /><a href="#">Reply to This Comment</a>
            {% get_comment_form for comment as comment_form %}
            {% include 'comment_form.html' %}
        {% endif %}
    {% endwith %}
</div>

Here, for each comment, I get the list of comments attached to that comment, and render them, along with a box to reply to the comment. This box is hidden by default in my stylesheet and displayed with a bit of jquery. I recursively include the same template for each comment so you can have as many levels of comments as you like (that may be a bad thing).

The recursive include is the messy hack — by default, Django includes templates at parse time. This means the template is included regardless of whether any comments exist, and the Python stack quickly fills up with too many recursive calls. But if the template being included is a variable instead of a string, it has to know what that variable value is and can thus only be executed at render time. So I simply wrap the include with a with template tag. This forces django to use the IncludeNode instead of ConstantIncludeNode (see django/templates/loader_tags.py), which renders the template at render instead of parse time.

Some uncreative styling in my css makes the comments indent a bit and hides the reply box until the jquery link is clicked.

.comment {
    margin-left: 1em;
    border-style: solid;
    border-width: 1px;
    border-color: #aaa;
    border-top-width: 0px;
    border-right-width: 0px;
    padding: 1ex;
}
.comment_form textarea {
    height: 3em;
    width: 40em;
}
.comment .comment_form {
    display: none;
}

Obviously, it could do with a makeover from a talented web stylist, but here’s how the general idea looks:

Nearly-unstyled threaded comments.

Nearly-unstyled threaded comments.

Why I don't like reverse and {% url %} in Django

Django has a feature that allows you to look up an url based on a function name and arguments. Here’s how it works:

######
# urls.py
urlpatterns = patterns('',
    (r'^projects/(?P\d+)/$', 'views.view_project'),
)
 
######
# views.py
def view_project(request, project_id):
    # create the view
 
######
# something.py
some_url = reverse("views.view_project", 4)
# some_url = /projects/4/
 
######
# sometemplate.html
<a href="{% url 'views.view_project' 4 %}">My project</a>

The idea behind urls.py is to separate urls, which are not supposed to change (because someone may have bookmarked them, there might be links to the page, etc) from implementation (views.py), which can change at any time (due to refactoring, new features, and bugfixes). You can move functions around at will, and simply change urls.py to point at the right function without breaking people’s links. I love this decoupling.

The idea behind reverse and url is that you should not be hard-coding urls into your code, you should use the names of the functions being called instead. I’ve tried to do this because its the “proper thing to do” if you’re a django coder… but I don’t like it, and I’m going to quit using it in my personal projects.

Here’s why:
1) It’s harder to maintain. As stated above, I often change the name or location of a view function. When I do that, I have to go through all the files that have a url or reverse call and change the view there. If I had hardcoded the url, I’d only have to change it in urls.py. I’ve been stung by this much more often than by changing urls.

2) Its harder to read. When you see “/path/to/something” you know you’re looking at an url. When you see reverse(‘some.module.path’, some, arg), it takes longer for the brain to parse, even if you know what the reverse call does.

3) In the case of {% url %} it exposes implementation details to the template author. Template authors should not know or care what python functions are being called internally. But they know what an url is, and what it represents.

In short, it adds an extra layer of abstraction to url handling. I like abstraction layers to be thin and useful; in the case of reverse(), the added complexity does not, in my minimalist opinion, justify the supposed gain in simplicity.

Py.test funcargs and Django

Holger Krekel just released the 1.0 version of py.test. Py.test is a functional and unit testing tool that cuts out a lot of the annoying boilerplate found in the unittest module included in the Python standard library.

I do a lot of Django coding. Django has a built-in test engine based on unittest. Its annoying, but it does a few things (such as capturing e-mails and creating a test database) automatically so I’ve tended to use it rather than setting up py.test to take care of these things. Today I decided I’d rather use py.test for my latest project. Turns out its not that complicated:

# there are better ways to do these first three lines
import os, sys
sys.path.append('.')
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'
from django.conf import settings
from django.test.client import Client
from django.test.utils import setup_test_environment, teardown_test_environment
from django.core.management import call_command
 
def pytest_funcarg__django_client(request):
    old_name = settings.DATABASE_NAME
    def setup():
        setup_test_environment()
        settings.DEBUG = False
        from django.db import connection
        connection.creation.create_test_db(1, True)
        return Client()
    def teardown(client):
        teardown_test_environment()
        from django.db import connection
        connection.creation.destroy_test_db(old_name, 1)
    return request.cached_setup(setup, teardown, "session")
 
def pytest_funcarg__client(request):
    def setup():
        return request.getfuncargvalue('django_client')
    def teardown(client):
        call_command('flush', verbosity=0, interactive=False)
        mail.outbox = []
    return request.cached_setup(setup, teardown, "function")

Put that in your conftest.py and you can write tests like this:

def test_something(client):
    response = client.post('/some_url', {'someparam': 'somevalue'})
    assert "somestring" in response.content
    # Other assertions...

This uses the innovative py.test ‘funcarg’ mechanism to create a test database when testing starts, and refer to that database throughout the test run. The ‘django_client’ funcarg sets up a database when the session starts and deletes it when it finishes. The ‘client’ funcarg creates a similar client, but also resets the database after each test is run to ensure there are no interaction effects between tests.

I haven’t fully tested it yet, but its nice to know I can get the most useful django test functionality so cheaply in py.test.

Offline-Enabled Web Apps: The Future

I was reluctant to join the world of web development. I started in high school with a few sites and realized several things: Javascript sucks, Internet Explorer sucks; therefore web development sucks.

Fast-forward through a couple academic degrees. Job hunting with one requirement: Python. Python jobs all require Django.

So I learned Django, assuming, incorrectly, that if I was developing python backends, I wouldn’t need to work with the horrors of Javascript or Internet Explorer. I earned money. I relearned Javascript and became a first rate web developer.

In the back of my mind I still felt that web development sucks. So a few weeks back when deciding on a platform for a personal project, I thought I’d try something new. The Android platform was in my hands and I gave it a whirl.

I didn’t enjoy it much and I am now rewriting the app as an offline enabled webapp using Google Gears.

Then Chrome OS was announced and I realized that I’ll probably be doing a lot of offline enabled webapps using Google gears and/or HTML 5. Like it or not, it’s the future. Me, I like it. There are a lot of advantages to this kind of setup: I can access the apps from my phone, my laptop, my parent’s desktop, or Phrakture’s hacked computer whenever and wherever I want. I don’t have to write a different client for each one. Its true ‘write once, run anywhere’. I can upgrade each of those clients automatically as long as there’s a network connection.

On that note, you don’t need a network connection to run HTML 5 or Google Gears based apps. They both provide a ‘localserver’ that caches pages and javascripts, and give you an SQLite database for data caching. Typically offline versions of apps are not as powerful as their networked counterparts, but they do not require network access to run. Further, because they are locally cached, they can be made to run as fast as a “standard” (old fashioned) desktop app. The apps run in the browser, but the browser is just a container, a window manager, to hold the application.

In traditional webapps, you code most of the logic on the server side. In this new model, you end up coding most of the logic in the client, because the app needs to run without a guaranteed server connection. For me, this has a massive, nearly show-stopping drawback: A large portion of the app must be written in Javascript. JQuery makes Javascript suck lest, but it still sucks. I’m a Python programmer.

For years, I’ve dreamed of browsers supporting tags that allow me to write my DOM manipulation scripts in Python rather than the ubiquitous and annoying Javascript. This wasn’t possible because python can’t be adequately sandboxed such that arbitrary scripts running on the web don’t have access to, say, your entire hard drive.

This is no longer true. The PyPy project finally has a complete Python 2.5 interpreter that can be safely sandboxed. Since discovering this at Pycon 2009, I’ve been thinking about interfacing it with a web browser.

I figured “somebody must have started this already”. Google didn’t help much, but when I logged into #pypy on freenode I was told “fijal started doing that with webkit yesterday”. I’ve been following up trying to get the project to build (I was warned that the build process is a mess and was invited to wait until it is cleaned up a bit). So far, no luck, but I am optimistic that python support is finally coming to the browser. Granted, it won’t be much use for public webapps (at first) since browsers won’t want to be distributing pypy, but a lot of my projects are personal, and satisfying the general public will be far lower on my priorities list than ‘developing in my preferred language’.

I’ll have to install a pypy interpreter into Chrome Lite under Android before this is useful to me. That may be tricky.

Recent Developments

I’ve been awfully busy the past few weeks, but finally had three separate evenings to sit down and code on some of my little projects this week. I’m anticipating having more time in a couple weeks, as I gave notice on my job on Thursday. I am, however planning a move and exploring numerous job opportunities in my home province.

I managed to fix a couple bugs on WhoHasMy. As previously reported this project was originally coded in 48 hours for the Django Dash competition. We tied for fifth place and have traded the resulting bitbucket account for a github prize. I’m very happy with the placement given that my brilliant co-developers had nil django experience going into the competition and I hadn’t touched it professionally in months. I added a TOS to the page as requested in a comment on my earlier post, fixed some ordering bugs in the lists, fixed a couple broken links, and made it easier to add information about friends when you loan an item to someone not currently in the system. And here we thought it was 100% bug free when we finished our 48 hour stint and stumbled off to bed.

I have also spent a fair bit of time improving Quodroid, the Android app for controlling quod libet on my laptop from my phone. It now uses fancy icon buttons, allows you to specify the host and port you want to connect to, lists the currently playing song whenever you perform an action, allows volume control, and gives a semi-sane error message when the phone can’t connect. In short, its actually useful and usable by someone other than myself. I’ve been using it regularly the past few days. I still have to arrange it to perform the network stuff in a service instead of the main activity, which occasionally becomes unresponsive if the server is slow to respond. I’m actually becoming more comfortable with Java again as I develop this, its not as evil as I thought, but it certainly cuts into productivity.

Today, I made a few changes to opterator. I wrote my first app (a contrived example code-test for a job I’m pursuing) that actually used opterator a couple weeks back and found it was missing a few features. It now the ability to have multiple copies of a single option. Turns out this actually worked, all you had to do was use the ‘append’ action. I wrote three tests, didn’t change a line of code and poof, I had append support! I then realized that storing the action in the docstring was unnecessary as it could be introspected from the type of the keyword argument. This makes the @param docstrings a lot more readable and informative. As simple as this little module is, I feel its one of my more brilliant innovations.

I’ve also tossed around the idea of having multiple opterated main methods in a single module and allow the decorator to pick which one to call depending on the options. This seemed cool at first, but I think it may violate the ‘one best way’ policy of Python. I also realized that deriving sensible error messages and usage strings would be really painful, from the end user’s perspective, so I’m holding off on this until I’ve decided how best to do it.

Exceptions As Flow Control In Python

I’ve complain in recent months that my knowledge of Python has stagnated. I look at code I wrote six months ago and I still think its well-written code and there isn’t much I would change. It is well-written code, but in previous lives, my definition of good code would evolve.

Possibly due to my early training in Java, I’ve always been quite cautious when raising exceptions. In Java, exceptions are a curse. The overhead of maintaining them and listing which ones can be caught or thrown from a given call-stack is time-consuming and of dubious benefit.

In the past, I’ve read the word ‘Exception’ as meaning ‘an exceptional circumstance I didn’t anticipate’. I don’t raise exceptions often unless I’m writing a library and want to communicate an exceptional circumstance the calling system may not have expected.

Excluded from this definition are ‘error conditions I did anticipate’, such as validation errors or permission checks. When I expect such errors, I tend to include them in the return values. If you’re a python coder you might think I mean this:

## BAD CODE ##
result = some_function_with_error()
if type(result): == tuple:
    error, message = result
    # handle error

But I would never do this. Checking the type of an object in Python is a sure sign you’re doing something wrong, probably very wrong. Either that or you’re writing some kind of introspective framework. I’ve seen code where people return an object in the normal case and a string if its an error. They then check if type(response) == str. This will fail when you ultimately have to switch to unicode, and its going to make your Python 3 porting attempts very confusing. Don’t do it.

However, I frequently use this idiom:

isadmitted = auth.check_privilege()
    if isadmitted != True:
        return isadmitted

In this case, check_privilege is returning either True or an error string. I used to think this was good code, but its actually doing an implicit type check. if you have to write ‘x != True’ instead of ‘not x’, its not proper duck-typing. While its unlikely that boolean will ever have values other than ‘True’ and ‘False’, its highly probable that your function will one-day want to return something else.

That last bit is based on real code I decided to refactor today. I’ve expanded my definition of ‘Exception’ to ‘anything that is not the normal return value of a function’. Now I have code that looks like this:

try:
       response = self.process_post(session)
except (ValidationError, InsufficientPrivledgesError), e:
    session.rollback()
    response = e.response
else:
    session.commit()
finally:
    session.close()
return response

I’ve wondered why so much attention was paid to Python’s try…except handlers in recent releases. Now I know — its because they expect Python coders to use them. Two months ago if I’d written an exception handler that included all of except, else, and finally, I’d have thought I was doing something wrong. My new understanding (notice I don’t say ‘corrected understanding’) is that this is how such code should look.

The good news is: every time I look at code I wrote six months ago and see something that uses an if statement to check a return value for an error condition, I’m going to be itching to rewrite it. My understanding of what good Python code should be has evolved. I’m back!

Django Dash

“I am going to send you a link and I want you to think about it before you just say no.”

That’s how Jason introduced the idea of the Django Dash to me. He figured it’d be fun to try to develop an entire web app in 48 hours using a web framework and Javascript toolkit he was unfamiliar with. (Jason has odd ideas of fun). I agreed, but we’re both aesthetically challenged. We’re good web programmers. Website design and graphics, not so good.

Enter Phil, an acquaintance of Jason’s I had not yet heard of. Great guy, great designer, great team mate.

So last weekend, I spent about 35 out of 48 hours on skype with these two goofballs three timezones away, typing python code and cursing javascript. We made 100 commits more than any other team (but we wore out before we cleared 500). Sleep and exhaustion tried to throw us off course, but we pulled it off. We had a ton of fun and I even learned something (the for statement can have an else clause).

The constant skype linkup really helped in terms of motivation, its so much more productive to just ask a question and have it responded to than to dig through someone’s code trying to figure out what they were thinking, to scan google results looking for the info you need, or to send an e-mail to someone and wait for them to respond. Skype is also more productive than instant messaging. This surprised me; turns out that its much easier to talk and type in or scroll your source window than it is to be constantly switching back and fourth to your IM window.

For those interested, here is the result of our 48 hour sprint, a relatively complete and not-quite bug-free loaned item tracking application: WhoHasMy.

Phil deserves all the credit on the sleek design, he’s totally awesome and is the difference between an ugly django app with ajax calls and a professional one. He also has an entertaining habit of verbifying nouns.

Jason deserves the credit for the initial idea, most of the program design, autocomplete, and bailing us out when git-svn confused us… several times.

I will take credit for an outrageous number of commits editing the to-do file and keeping us organized. I think I may also have written some python code and some interesting ajax requests.

Our top priority was to have fun. And we did — It was a blast. I think we’ve got a decent chance at a prize, though there’s some stiff competition out there. But hey, Who would turn down a free private github subscription or Bacon?