Archive for February 2010

Early Userspace in Arch Linux

There have been some major changes in Arch’s early userspace tools recently. So I thought I’d take the time to sit down and explain to everyone what these changes are about.

Booting Linux Systems: Why do we need Early Userspace?

Traditionally, booting a Linux system was simple: Your bootloader loaded the kernel. The kernel was extracted and initialized your hardware. The kernel initialized your hard disk controller, found your hard drive, found the root file system, mounted it and started /sbin/init.

Nowadays, there is a shitload of different controllers out there, a huge number of file systems and we are a good distro and want to support them all. So we build them all into one big monolithic kernel image which is now several megabytes big and supports everything and the kitchensink. But then someone comes along and has two SATA controllers, three IDE controller, seven hard drives, plus three external USB drives and who knows what. The Linux kernel will now detect all those asynchronously – and where is the root file system now? Is it on the first drive? Or the third? What is “the first drive” anyway? And how do I mount my root file system on the LVM volume group inside the encrypted container residing on a software RAID array? You see, this is all getting a bit ugly, and the kernel likes to pretend it is stupid – or it simply doesn’t care about your pesky needs, especially now that it has become so fat after we built every imaginable driver in the world into it.

What now? Simple: We pass control to userspace, handle hardware detection there, set up all the complicated stuff that people want, mount the root file system and launch /sbin/init ourselves. You are probably asking yourself “How do I execute userspace applications when the root file system is not mounted?”. The answer is: magic!

What is initramfs?

Okay, the answer is not magic. The answer is actually initramfs: Each Linux system has a ramfs file system that is always mounted and called rootfs. You will probably never see it, because your real file systems are mounted over it. However, the kernel also has a compressed cpio archive attached to it that it extracts directly into rootfs after boot. Even better, you can attach a compressed cpio archive to your kernel from the bootloader which is also extracted into rootfs.

Before the kernel runs the old-fashioned init code, it checks whether rootfs contains a file called /init. If it does, it skips the traditional mounting/init code and instead executes /init. This program is now responsible for doing all these complex task that the kernel thought to be too complicated. This way, we can build a kernel that has no built-in support for any hard disk controller or filesystems at all, instead we build them all as modules (this is actually what we do in the Arch Linux default kernel) and include the needed ones in the initramfs image.

klibc – The Purgatory of the Distro Initramfs Maintainer

klibc was originally created to be a small and lightweight C library for early userspace. It comes with a number of tools to support you in setting everything up. It also comes with klcc, an ugly perl script that calls gcc and builds binaries against klibc instead of your usual C library. When mkinitcpio was originally created in 2006 by Aaron Griffin as a replacement for the old, unflexible mkinitrd and mkinitramfs scripts, it was decided to base it on klibc. From the beginning, klibc had lots of problems:

  • The set of shipped tools was limited and the tools that were included lacked vital options.
  • Most external tools could not be built against klibc or had to be heavily patched to do so.
  • There was no dynamic linker, all binaries were hard-linked against a specific version of klibc – this version changed every time anything in the klibc source or the kernel headers you built against changed, requiring a rebuild of all binaries that used klibc.
  • It was not possible to create any dynamic libraries other than klibc itself.

All this resulted in high maintenance effort to keep udev and module-init-tools working, we also had to maintain a small klibc-extras package with our own tools to replace those that were missing from klibc, and we had to include any more advanced application like lvm or cryptsetup as glibc-based statically linked binaries.

At some point, klibc stopped being compatible with the current kernel headers and we had to introduce more and more hacks to be able to rebuild it again when needed. As of Linux 2.6.30, I was unable to build a working version of klibc at all, leaving us with an old binary which could not be bugfixed anymore. In the middle of 2009, upstream died completely, there were no commits made to the git repository anymore, and the mailing list only received a handfull of posts each month. That was when I started to ask myself the following question: Where is the point in maintaining a separate C library and tools that are only used for a fraction of a second each time you boot? What we supposedly gained from this was a smaller initramfs and thus faster boot time.

Keeping it simple

In 2009, I decided that in order to be able to create an initramfs environment with low maintenance effort, many features and much flexibility, the following changes needed to be made:

  • Do not maintain a separate C library for it, simply use the one from the normal system
  • For basic system and scripting tools, use busybox to get a good compromise between high functionality and small binary size
  • For filesystem label, UUID and type detection, use util-linux-ng’s blkid for full and bleeding-edge support of all new and old filesystems
  • For other advanced functions, use modprobe, udev, lvm, cryptsetup, mdadm/mdassemble from the normal Arch packages

This way, I would only need to maintain the mkinitcpio scripts themselves and a properly configured busybox binary. I had used busybox for quite some time on my OpenWRT router(s) and was thus familiar with how awesome it was. It also turned out that implementing NFS root support was easier if we used the nfsmount and ipconfig utilities that were shipped with klibc.

It is February 2010 now, and in the last few weeks I finally had the time to do all the work. Just a few days ago I released mkinitcpio 0.6. This version is much stabler, more flexible and less error-prone than any klibc-based version we ever had in the past. On average, the initramfs is now between 600KB and 1MB bigger than the klibc-based ones, I guess nobody will ever complain about that – it is still smaller than on most other distributions. And I am glad that I hopefully never have to touch klibc again.