A week without Google cookies

After my last post I decided to do a little experiment to determine how dependent I am on Google’s services and their knowledge of me. Rather than cut Google out altogether, I decided to disable all cookies and scripts from the Google domains and see what happened.

The most immediate effect was that I couldn’t log into Google services. This only really affected me at the news and reader sites. When I visit the news page, I get Swiss German (this is a mistake on Google’s part as I live in the French speaking part of Switzerland), and have to click through the dropdown to get Canadian English. Creating a bookmark to go directly to the Canadian News site fixed this.

I only have a few feeds, mostly webcomics in my Reader feed, so I didn’t miss it much, I just accessed those sites directly. If it was important, I could use Thunderbird for syndication instead.

I was also unable to turn off the extremely irritating ‘google instant’ behaviour. I hate having search results appear as I start typing and then disappear or move after I’ve realized I’ve seen what I want, but haven’t had time to tell my fingers to stop.

The biggest deficit was that Google maps no longer has any memory. I was surprised to discover that Google maps was my most personalized google product. I really appreciated maps predicting my home location, knowing that my search results should probably be close to Geneva rather than the US, and syncing up my location searches with my phone so my GPS had access to the locations I had just searched as I walked out the door.

I access my GMail account via IMAP (the web interface is too slow compared to local caching), so not having cookies didn’t impede that. I don’t use Gmail as my primary address anyway.

I also kept my Google Talk (accessed via pidgin) account enabled. I could ask my friends to use my Jabber address instead, but I figure Google would still be logging the chats at their end.

I’ve had Google Analytics disabled via noscript for quite a while already.

Overall, I’m quite confident that I could disable my Google account altogether and not feel I was missing out on anything (just as I’m not missing anything by not having a Facebook account). However, I don’t really have a reason to do so. I don’t consider Google to be evil in practice. In theory, however, they simply may not be evil yet.

Because Google services are “free” I am expected to give them access to my data trail as “payment” for those services, as with all free web services. Whether the product is worth this fee is a separate question. I could pay for competing services, but I have no reason to trust the competition more or less than Google. Zoho currently hosts my e-mail; the only reason I feel any safer with them than Google is that they aren’t big enough to have the intelligence gathering that Google has in place. Dave Crouse hosts this blog on archlinux.me, I trust him a lot more than any big corporation or other nonhuman entity.

I don’t agree with the “If you have nothing to hide, you shouldn’t worry” sentiment, simply because the definition of “nothing to hide” can change over time. Things that seem innocent in Google’s hands right now may take a more sinister meaning if their network ever becomes sentient!

I also realized that Google has access to all my public content (as does every service). This suggests that it would be sensible to migrate from Twitter to Google Plus, as I can still use the public stream the same way I currently use Twitter, but if I want to restrict publication of certain content to a specific circle, I have that option.

The only way it would be possible to hide completely would be to disable my Internet connection altogether. Even then, anyone can take my picture walking down the street, and every time I show my passport at an airport or hotel, someone, somewhere, knows I’ve been there. Since I can’t hide completely, and I don’t see that there’s any benefit to partial hiding (much like Dan McGee’s arguments against partial package signing), I think all I can do is accept that privacy is an old-fashioned concept in the emerging world, much like copyright.

Paranoia and Google Plus

After two weeks on Plus, I’ve decided:

* They have the best online photo upload and management system I’ve ever used.
* When deciding who to share content with, I usually discover that what I wanted to share wouldn’t be of interest to anybody and keep it to myself.
* Stuff that other people share is not interesting enough to warrant checking the stream.
* Google, (plus 1 buttons, analytics, chrome), Facebook (like buttons), and Twitter (tweet to your followers buttons) all have a disturbingly accurate record of my browsing habits. About 90% of the sites I visit have at least one of these buttons on them, they all execute scripts on the page, most of them set cookies.

It’s making me a bit paranoid, I already use https and noscript and selectively accept cookies, but it’s not enough, I don’t feel safe.

codemirror > editarea

So you’re working on some kind of webapp that needs some in-browser code editing functionality of some sort. You search google and discover that most folks are using editarea these days. You figure it must work to be so popular and decide to try it out.

Then as you start to use it, you realize it’s unmaintained, doesn’t work terribly well in Webkit browsers, doesn’t work at all in Internet Explorer 9, is poorly documented and hard to extend, expand, or fix. You’re frustrated.

But repeated web searches do not yield anything as functional, and you resign yourself to a half-working editor, or maybe even decide to write something from scratch.

I’ve been that person, and I was recently lucky enough, while searching for something completely unrelated, to stumble across CodeMirror a standards-compliant javascript-powered code editor that is well documented. It’s not as featureful as editarea, but if you add CodeMirror UI to the mix, you end up with a comparable featureset that actually works.

I’m not normally one for simply reposting links, but as I had so much trouble finding CodeMirror in the first place, I hope this post will either help others to find it, or at least increase CodeMirror’s page rank in search results.

External monitors

When I first started using Linux over a decade ago, dual screen was a pain to set up. When I got my first laptop four years ago, setting up an external monitor was also painful. Then came xrandr and life was good. Now there are nifty little monitor switching GTK apps that allow you to drag screens around just like in Windows or MacOS.

But that’s a lot of fiddling around. For the longest time, my use case has always been either:
a) I am using only my laptop
b) I am using my laptop with my 1920×1080 external monitor connected via VGA (It’s an old laptop)

To accommodate these two use cases, I had connected my “Switch Display” (fn+F7 on my thinkpad) key to the following simple script:

  #!/bin/bash
  if ! xrandr | grep VGA1 | grep disconnected  >/dev/null ; then
      xrandr --output LVDS1 --mode 1024x768 --output VGA1 --mode 1920x1080 --above LVDS1
  else
      xrandr --auto
  fi

Succinctly, if the external monitor is connected, enable it as “above” my laptop, otherwise, just enable the laptop monitor. All I have to do is plug in or unplug my monitor, hit Fn+F7, and my display would automatically adjust itself.

For the record, I used xbindkeys to connect the button to the script with the following .xbindkeysrc:

 .xbindkeysrc
  "/home/dusty/bin/check_external"
      XF86Display

This served me well until I bought myself a new television that only operates at 1360×768 on the VGA port. Further, when I’m connecting my laptop to the tv, the television is usually below the laptop monitor rather than above, as my monitor is.

So now, my check_external script looks thusly:

  #!/usr/bin/python
  import subprocess
 
  positions = {
      "1920x1080": "--above", # Monitor
      "1360x768": "--below" # TV
  }
 
  output = subprocess.check_output("xrandr", shell=True).decode("utf-8")
 
  external_connected=resolution=False
  for line in output.split("\n"):
      if external_connected:
          if "+" in line: # + represents the default resolution for that monitor
              resolution = line.split()[0] # + the resolution is in the first column
              break
      if "VGA1 connected" in line:
          external_connected=True
 
  if external_connected:
      subprocess.call(
              "xrandr --output LVDS1 --mode 1024x768 --output VGA1 --mode {resolution} {position} LVDS1".format(
                  resolution=resolution, position=positions.get(resolution, "--above")), shell=True)
  else:
      subprocess.call("xrandr --auto", shell=True)

This is Python 3 code, and works delightfully on my Arch Linux running awesome setup. I still have to do custom xrandr commands if I ever connect to someone else’s projector or monitor (this happens so rarely that I don’t think I’ve done it since Archcon last year), but normally I can get away with a quick “xrandr –auto” in those cases, which usually just clones the display. There are dozens of ways to set up monitors, but this works great for me, and I can normally have my display up and running the way I want it with a couple keystrokes.

When you use a Django query keyword as a field name

I need to model a location in the Alberta Township System coordinate space. The model is extremely simple:

class Location(models.Model):
    project = models.ForeignKey(Project)
    lsd = models.PositiveIntegerField(null=True, blank=True)
    section = models.PositiveIntegerField(null=True, blank=True)
    township = models.PositiveIntegerField(null=True, blank=True)
    range = models.PositiveIntegerField(null=True, blank=True)
    meridian = models.PositiveIntegerField(null=True, blank=True)

There’s a rather subtle problem with this model, that came up months after I originally defined it. When querying the foreign key model by a join on location, having a field named range causes Django to choke:

>>> Project.objects.filter(location__range=5)
------------------------------------------------------------
Traceback (most recent call last):
  File "", line 1, in
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 141, in filter
    return self.get_query_set().filter(*args, **kwargs)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/query.py", line 556, in filter
    return self._filter_or_exclude(False, *args, **kwargs)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/query.py", line 574, in _filter_or_exclude
    clone.query.add_q(Q(*args, **kwargs))
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1152, in add_q
    can_reuse=used_aliases)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1092, in add_filter
    connector)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/sql/where.py", line 67, in add
    value = obj.prepare(lookup_type, value)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/sql/where.py", line 316, in prepare
    return self.field.get_prep_lookup(lookup_type, value)
  File "/home/dusty/code/egetime/venv/lib/python2.7/site-packages/django/db/models/fields/related.py", line 136, in get_prep_lookup
    return [self._pk_trace(v, 'get_prep_lookup', lookup_type) for v in value]
TypeError: 'int' object is not iterable

That’s a pretty exotic looking error in Django’s internals, but it didn’t take long to figure out that using location__range is making Django think I want to use the range field lookup on Location.id instead of the field I defined in the model. I expect a similar problem would arise if I had a field named “in”, “gt”, or “exact”, for example.

The solution is simple enough, but didn’t occur to me until searching Google and the Django documentation, and ultimately scouring the Django source code failed to yield any clues. If you ever encounter this problem, simply explicitly specify an exact lookup:

>>> Project.objects.filter(location__range__exact=5)
[< Project: abc>, > Project: def >]

How I reverted several git commits in a single commit

I hate to publicly admit this, but I recently made four commits that should have been merged into one commit, including two with embarrassing commit messages like, “third commit without testing, for shame!” I’m thoroughly shocked that fellow coder, Dan McGee hasn’t already attacked me for my misdemeanor.

Please forgive me, I was tired and in a hurry and was working on something that was easier tested on the production server and most certainly deserve to be attacked by a velociraptor.

To complicate matters, there was a fifth commit in the middle of these four commits that was pertaining to an irrelevant task, and several other users had committed changes after those commits.

Fastforward to today. Those four commits made in a hurry, now have to be reverted. As with any task, there are several ways to do this using git, but none of them are immediately obvious. git reset –keep was out of the question because of the newer commits. I think I could have git rebased the changes out of a new branch and merged them, but the method that made the most sense to me was to revert them independently, and then squash them.

Here’s how my history looked:

A–C1–C2–Ex–C3–C4–O1–O2–O3

The four C commits are the ones I want to revert. Ex was an extraneous commit I want to keep and the O commits were made by other authors later.

This was the desired end state:

A–C1–C2–Ex–C3–C4–O1–O2–O3–R

where R is a commit reverting the changes made in the four C commits. I didn’t want to simply erase the C commits, (which can be done easily with git rebase), as embarrassing as they are, because they are public history that had been pushed to other users.

My process was to run git revert several times:

git revert C4
git revert C3
git revert C2
git revert C1

Possibly there is a way to do all of this in one command, I’m not sure. This left me with:

A–C1–C2–Ex–C3–C4–O1–O2–O3–R1–R2–R3–R4

where the four R commits are reversions of the four C commits.

Then I ran:

git rebase -i HEAD~5

git rebase -i is my favourite method of rebasing. It lists the five most recent commits in vim asking me what to do with each one. You can choose several options for each commit. Here is what I chose:

pick O3
reword R1
squash R2
squash R3
squash R4

pick O3 says to include that commit and leave it unchanged. When rebasing, I usually go one commit earlier than I expect to make sure I’m modifying the correct history. The reword commit simply allows me to change the commit message of R1 to “Revert the XYZ changes because I no longer need them” The squash commits mean that those three R commits are merged into the previous commit — R1. And my end state is as desired:

A–C1–C2–Ex–C3–C4–O1–O2–O3–R

I’m pretty sure there are other other ways to do this. I chose this multi-step process because it allows me to understand what is going on at each step and to double check that I haven’t accidentally removed, merged, or reverted a commit I didn’t mean to.

New Arch Linux Laptop Bags

My supplier for the Arch Linux Laptop Bags product line has updated her offerings. We have four new laptop bags available, and some of the older models have been dropped or reduced in price. Check out Arch Linux Schwag to review the offerings.

In addition, I’ve reduced the price on Arch Linux pens to below cost, in an effort to liquidate some stagnant inventory.

As always, thanks for supporting Arch Linux!

Prickle v0.4: CouchDB to MongoDB

I took some free time to port Prickle, my stay out of your way time tracking tool, from CouchDB to MongoDB. I originally wrote it to use Pylons and Couchdb specifically because I felt like studying a couple of technologies I hadn’t tried before. I found CouchDB to be a bit unwieldy to code with, and it didn’t take long for it to get slow on my Prickle datasets. Rather than figure out how to optimize CouchDB, I decided to port it to MongoDB.

The port was relatively straightforward. I was able to use pretty much the same model layout (Prickle has a very simple data layer). I tightened it up a bit to use actual references between models instead of implicit foreign keys by ids.

I chose to use MongoEngine for the database layer. I started with pymongo, but decided that a more abstract object-document-manager would be more useful.

I found querying in MongoEngine to be much simpler than the map-reduce queries in CouchDB. It is very similar to the Django ORM with which I am very familiar. Document creation and mapping is also familiar. Overall, I think Prickle will be a lot easier to maintain and extend using MongoDB than it was with CouchDB.

Prickle v0.4 does not contain any new features over 0.3. It also contains one bugfix to the date validation, submitted by Shelby Munsch.

Privacy vs Freedom of Speech: Wikileaks

The Pirate Party of Canada has asked it’s members to vote on its stance towards the Wikileaks discussion. In most cases, the PPoC requests its members to have their own opinion, and, if ever elected, to represent their constituents before representing the party. The PPoC only has a unified stance on matters of copyright law, privacy, and free speech. The Wikileaks issue definitely falls under this category, and the party therefore needs to make a collective decision. Here, I am publishing my personal stance on the issue, regardless of the decision they make.

Every human being should have a right to privacy. If we wish to keep any detail about ourselves secret, we should have the right to do so. Legal or illegal, moral or immoral, if we don’t want some piece of data to be public knowledge, the right to privacy is paramount.

We wave this right as soon as we tell anyone our secret. Whether it is a family member, a close friend, a stranger, or everyone on Facebook, the secret is no longer ours to keep. By telling the person that secret, we have given them the right to maintain the secret, or to pass it on or publish it however they deem fit. We can request that they keep the secret, but we cannot demand it. However, that person still has the same right to privacy that we originally had. If only two of us know the secret, we both have the right to protect that secret. No-one should be able to forcibly take that secret from us without our consent.

Once a sufficient number of people knows the secret, the probability that their collective privacy will be greater than the right to gossip approaches zero. “Private knowledge” vs “public knowledge” is not a binary distinction. One person knowing our secret does not make the secret “public.” However, it means that we no longer have the soul ability to keep it private.

The other side of the coin is the responsibility to protect individual privacy. Many professional and government organizations have access to individual data about us that we may want to keep secret. Our doctors, nurses, and medical staff, our accountants and lawyers, our banks, tax agencies, and passport authorities, our driver’s license, health care, and motor vehicle registries all have access to data that they require, but we have the right to protect. They are responsible for protecting that individual data on our behalf. If they fail, data becomes public that should not be public.

So far, I’ve been talking about individual privacy. Privacy does not apply to corporations or governments. They should be held accountable to the individuals in the world, they should be required to operate transparently and openly. They are responsible to maintain the privacy of their employees, members, clients, and customers, but have no right to privacy as a single corporate entity.

Once data is made public, the right to publish that data trumps the right to privacy. This is freedom of speech. Any individual or organization who has access to data has the right to publish the data. The right to free speech does not trump the right to privacy, however, once privacy has been given up, the right to free speech is stronger.

The Wikileaks fiasco violates all of these principles. The private data of individuals was compromised. Government organizations were not operating transparently. Government organizations failed in their responsibility to protect the private data of individuals in their care. Freedom of speech was violated when both governments and corporate entities that should have been completely disinterested oppressed the publisher of the data.

I’d like to emphasize this point: Government organizations failed in their responsibility to protect the private data of individuals in their care. The failure rests squarely on the shoulders of the governments in question. Rather than attacking one (of many) publisher of the information, the governing body is obligated to fix their internal processes. Further, the corporate entities that are attacking Wikileaks should be focusing on this real culprit, not the publisher.

One less relevant note: it is true that the right to freedom of speech can be applied immorally. Consider the celebrity publications of today: the paparazzi are, by most accounts, disgustingly immoral. They violate the right to individual privacy (such violation should be illegal), but have the right to publish information once obtained. Wikileaks may (arguably) be immoral, but they are not so immoral as the paparrazzi that killed and photographed Princess Diana. Why is Wikileaks being persecuted while celebrity gossip rags are running free?

Django: don’t use distinct and order_by across relations

I needed to get a list of project objects that had Time objects attached to them that had been updated by a specific user. I wanted the list to be ordered by the most recently updated Time object, and importantly, I wanted the list of project objects to be distinct (since there are multiple time objects attached to any one project).

I was trying to make the following query work in django:

Project.objects.filter(time__user=user).distinct().order_by('-time__date')

As the note here describes, this particular combination (distinct and order_by on a related field) doesn’t work so well. The related table (Time, in this case) columns are being added to the query’s SELECT clause, giving me multiple copies of projects that I wanted to be distinct.

There is a Django feature request to support named fields in the call to distinct, but it is not incorporated into trunk yet, mostly due to database backend support.

After some searching and pondering, I was able to get the same list of projects using aggregates instead:

Project.objects.filter(time__user=user).annotate(
                models.Max("time__date")).order_by('-time__date__max')

This solution to the problem doesn’t seem to be suggested often, so I thought I’d take the time to mention it.