Early Userspace in Arch Linux

There have been some major changes in Arch’s early userspace tools recently. So I thought I’d take the time to sit down and explain to everyone what these changes are about.

Booting Linux Systems: Why do we need Early Userspace?

Traditionally, booting a Linux system was simple: Your bootloader loaded the kernel. The kernel was extracted and initialized your hardware. The kernel initialized your hard disk controller, found your hard drive, found the root file system, mounted it and started /sbin/init.

Nowadays, there is a shitload of different controllers out there, a huge number of file systems and we are a good distro and want to support them all. So we build them all into one big monolithic kernel image which is now several megabytes big and supports everything and the kitchensink. But then someone comes along and has two SATA controllers, three IDE controller, seven hard drives, plus three external USB drives and who knows what. The Linux kernel will now detect all those asynchronously – and where is the root file system now? Is it on the first drive? Or the third? What is “the first drive” anyway? And how do I mount my root file system on the LVM volume group inside the encrypted container residing on a software RAID array? You see, this is all getting a bit ugly, and the kernel likes to pretend it is stupid – or it simply doesn’t care about your pesky needs, especially now that it has become so fat after we built every imaginable driver in the world into it.

What now? Simple: We pass control to userspace, handle hardware detection there, set up all the complicated stuff that people want, mount the root file system and launch /sbin/init ourselves. You are probably asking yourself “How do I execute userspace applications when the root file system is not mounted?”. The answer is: magic!

What is initramfs?

Okay, the answer is not magic. The answer is actually initramfs: Each Linux system has a ramfs file system that is always mounted and called rootfs. You will probably never see it, because your real file systems are mounted over it. However, the kernel also has a compressed cpio archive attached to it that it extracts directly into rootfs after boot. Even better, you can attach a compressed cpio archive to your kernel from the bootloader which is also extracted into rootfs.

Before the kernel runs the old-fashioned init code, it checks whether rootfs contains a file called /init. If it does, it skips the traditional mounting/init code and instead executes /init. This program is now responsible for doing all these complex task that the kernel thought to be too complicated. This way, we can build a kernel that has no built-in support for any hard disk controller or filesystems at all, instead we build them all as modules (this is actually what we do in the Arch Linux default kernel) and include the needed ones in the initramfs image.

klibc – The Purgatory of the Distro Initramfs Maintainer

klibc was originally created to be a small and lightweight C library for early userspace. It comes with a number of tools to support you in setting everything up. It also comes with klcc, an ugly perl script that calls gcc and builds binaries against klibc instead of your usual C library. When mkinitcpio was originally created in 2006 by Aaron Griffin as a replacement for the old, unflexible mkinitrd and mkinitramfs scripts, it was decided to base it on klibc. From the beginning, klibc had lots of problems:

  • The set of shipped tools was limited and the tools that were included lacked vital options.
  • Most external tools could not be built against klibc or had to be heavily patched to do so.
  • There was no dynamic linker, all binaries were hard-linked against a specific version of klibc – this version changed every time anything in the klibc source or the kernel headers you built against changed, requiring a rebuild of all binaries that used klibc.
  • It was not possible to create any dynamic libraries other than klibc itself.

All this resulted in high maintenance effort to keep udev and module-init-tools working, we also had to maintain a small klibc-extras package with our own tools to replace those that were missing from klibc, and we had to include any more advanced application like lvm or cryptsetup as glibc-based statically linked binaries.

At some point, klibc stopped being compatible with the current kernel headers and we had to introduce more and more hacks to be able to rebuild it again when needed. As of Linux 2.6.30, I was unable to build a working version of klibc at all, leaving us with an old binary which could not be bugfixed anymore. In the middle of 2009, upstream died completely, there were no commits made to the git repository anymore, and the mailing list only received a handfull of posts each month. That was when I started to ask myself the following question: Where is the point in maintaining a separate C library and tools that are only used for a fraction of a second each time you boot? What we supposedly gained from this was a smaller initramfs and thus faster boot time.

Keeping it simple

In 2009, I decided that in order to be able to create an initramfs environment with low maintenance effort, many features and much flexibility, the following changes needed to be made:

  • Do not maintain a separate C library for it, simply use the one from the normal system
  • For basic system and scripting tools, use busybox to get a good compromise between high functionality and small binary size
  • For filesystem label, UUID and type detection, use util-linux-ng’s blkid for full and bleeding-edge support of all new and old filesystems
  • For other advanced functions, use modprobe, udev, lvm, cryptsetup, mdadm/mdassemble from the normal Arch packages

This way, I would only need to maintain the mkinitcpio scripts themselves and a properly configured busybox binary. I had used busybox for quite some time on my OpenWRT router(s) and was thus familiar with how awesome it was. It also turned out that implementing NFS root support was easier if we used the nfsmount and ipconfig utilities that were shipped with klibc.

It is February 2010 now, and in the last few weeks I finally had the time to do all the work. Just a few days ago I released mkinitcpio 0.6. This version is much stabler, more flexible and less error-prone than any klibc-based version we ever had in the past. On average, the initramfs is now between 600KB and 1MB bigger than the klibc-based ones, I guess nobody will ever complain about that – it is still smaller than on most other distributions. And I am glad that I hopefully never have to touch klibc again.

29 Comments

  1. Dieter@be says:

    Thanks for explaining Thomas and the work you’re doing.
    What’s the deal with the compression stuff? At fosdem you said you were using lzma but I see no compression option in my (pre-0.6) mkinitcpio.conf. Will this be user configurable with 0.6 ?

  2. brain0 says:

    Dieter, these options have been there for a while, see http://projects.archlinux.org/mkinitcpio.git/tree/mkinitcpio.conf#n59

  3. Dieter@be says:

    Oops. I forgot I removed those entries myself :)

  4. Früher Userspace bei Arch Linux…

    (…) Dieser Eintrag ist eine Übersetzung von brain0s Early Userspace in Arch Linux für jene Arch Linux Nutzer, welche mit dem Englischen dann doch ihre Probleme ha (…)…

  5. Alexander says:

    Great Article!

    Can you give your permission to translate it in Greek?

  6. brain0 says:

    Sure, go ahead.

  7. [...] paketini de kurmanız gerekiyor. Yeni mkinitcpio hakkında ayrıntılı bilgiye bu adresten [...]

  8. Andrew says:

    Great article, thanks for the interesting read.

  9. pointone says:

    Fantastic overview of initramfs! Do you mind if I copy certain sections to improve/update ?

  10. pointone says:

    Ugh… screwy formatting. To improve/update:

    http://wiki.archlinux.org/index.php/Mkinitcpio

  11. brain0 says:

    @pointone: Go ahead, just keep a short reference to my blog in it. I should really put a license on this stuff to avoid all the questions :)

  12. smakked says:

    Very informative, thanks for the explanation.

  13. agh says:

    Thanks!, I’m an Arch linux newbie, thanks to you I can use arch, learn and in the future help developing it!

  14. KimTjik says:

    A very good and vivid explanation. I’m not able to fully understand all what’s going on here, but it certainly looks like a far better solution. Thanks not just for the coding, but also the willingness to explain it to mortals like me!

  15. Social comments and analytics for this post…

    This post was mentioned on Twitter by archlinux_es: [Ingles] #archlinux Early Userspace in Arch Linux http://bit.ly/cgzUXf !archlinux…

  16. Samuelion says:

    Awesome article, very easy to understand even for newbies.
    Bookmarked. Thanks !!

  17. John says:

    I appreciate the explanation and for making it simple. I do not know enough about how all this works not being familiar with all of this but I appreciate the simple explanation as it makes it nicer for new people to understand how things work.

  18. Ville says:

    Good read. It’s really nice to hear that some of the workload has been lifted! Sounds sane.

  19. matyas says:

    Excelente artículo.

    Saludos de la Argentina

  20. Imam Krismanto says:

    Please help me why unable boot from network nfs root with error ipconfig no such device.

  21. brain0 says:

    This is certainly not the right place for this question … maybe this is more helpful: http://bugs.archlinux.org/task/18370

  22. jelly12gen says:

    nice written article, even somebody how isn’t into userspace it’s clear and interesting :)

  23. [...] 你可以在我的博客里找到更多关于这次升级的信息。 [...]

  24. Yaro Kasear says:

    Correct me if I am wrong, but Linux never made a rule of assuming the first partition of the first disk was always the boot partition. In fact, last I checked, where /boot and / are is SPECIFICALLY configured into every bootloader’s configuration. One as a GRUB configuration value and one as a kernel argument. Leaving either out results in an unbootable Linux.

    It has nothing to do with early userspace, all it does is make sure the kernel has the base amount of filesystem drivers to boot.

  25. brain0 says:

    Yaro, those were only examples. In the past we made the assumption that you could find each filesystem by knowing it’s on the M’th drive, N’th partition – and these assumptions are not true: Depending on your setup, it is likely that the “first hard drive” sda and the “second hard drive” sdb swap names randomly on each boot. People have run into this problem and “root=/dev/sda1″ only worked every second time they booted.

  26. [...] Early Userspace in Arch Linux. Stumble! for WP Share and Enjoy: [...]

  27. richs-lxh says:

    This is a very informative article. I posted an extract and linked back as well as Tweeting it.

    thanks.