Morten Linderud

F/OSS Developer, Arch Linux Developer and security team.

github mastodon twitter email
Reproducible Arch Linux Packages
Nov 11, 2019
7 minutes read

Arch Linux has been involved with the reproducible builds efforts since 2016. The goal is to achieve deterministic building of software packages to enhance the security of the distribution.

After almost 3 years of continued effort, along with the release of pacman 5.2 and contributions from a lot of people, we are finally able to reproduce packages distributed by Arch Linux!

This enables users to build packages and compare them with the ones distributed by the Arch Linux team. Users can independently verify the work done by our packagers, and figure out if malicious code has been included in the pristine source during the build, which in turns enhances the overall supply chain security. We are one of the first binary distributions that has achieved this, and can provide tooling down to users.

That was the TL;DR! The rest of the blog post will explain the reproducible builds efforts, and the technical work that has gone into achieving this.

Reproducible Builds

The reproducible builds effort was started by Debian in 2013 and currently encompasses several projects across the spectrum, from coreboot, Bitcoin, NixOS and F-Droid to mention a few.

The goal of the effort is to figure out what makes software projects undeterministic, and come up with standards to solve these issues.

Two of these standards are SOURCE_DATE_EPOCH to ensure timestamps can be overridden during the build, and BUILD_PATH_PREFIX_MAP to ensure we can consistently trim build paths across toolchains. SOURCE_DATE_EPOCH has been integrated into gcc and the toolchain for several projects. BUILD_PATH_PREFIX_MAP however has not seen a lot of adoption yet.

Another project they work on is a Continuous Integration (CI) framework to test reproducible packages. The Debian one is probably more interesting to browse, but Arch Linux has a CI environment for the past two years which has been consistently rebuilding packages.

However, the CI does not address the real issue. It is a very nice tool for debugging and uncovering issues that cause undeterministic builds. As Holger explained on the debian-devel mailing list earlier this year, “these tests are done without looking at the actual .deb files distributed from ftp.debian.org (and we always knew that and pointed it out: “93% reproducible in our current test framework")”. We are essentially building the same set of files twice in different environments.

Ensuring software projects can be deterministically built is all fine and dandy, but we grant users the ability to verify the work package maintainers are doing.

Reproducing packages

There are a few components needed to ensure packages can be rebuilt. We need to record the environment of the package build. This needs to be build paths, timestamps, all dependencies installed on the system and a bunch of other information. This is standardized in different formats across projects, but the format used for pacman can be found in BUILDINFO(5).

One can explore a BUILDINFO file from any package by extracting the top-level file.

bsdtar -xOf archlinux-keyring-20191018-1-any.pkg.tar.xz .BUILDINFO

Pacman also keeps tracks of some metadata inside .PKGINFO. On the surface this seems trivial, record some information and put it into a file. Like “How much space does this package take?”. But counting file sizes is a problem. Counting file sizes is hard. Like, very hard. Someone even wanted --skipbtrfshack in makepkg at some point. We have also had some problems sorting files, and bsdtar fflags embedding information unique to the system.

But these quirks should have been solved, and we haven’t found any other issues, yet. This means we got packages which can be consistently created, and we have recorded the build information. Now we need to recreate the build environment.

Arch Linux is a rolling release distribution. We probably build and release between 20 and 100 packages every day. With no set release schedule, packages can get rotated in, and out, of a repository on the same day. When recoding the installed packages in the BUILDINFO files, this could in theory change several times a day. How do we reproduce a package released 2 weeks ago?

The main solution to this is to record all released packages, and that is what we do on https://archive.archlinux.org. A lot of the older packages are uploaded to archive.org and linked from the archive. This enables us to rebuild packages back in time, as long as the installed pacman was >=5.2.

Recreating the build environment for the package is the next mission. Arguably this step came before other parts of this blog post, but technically this probably makes sense ¯\(ツ)/¯.

We need to create a chroot and apply the needed environment changes. In devtools we utilize systemd-nspawn to create these chroot containers, and this in turn was the starting point for archlinux-repro. It provides two tools, repro which takes a package file and attempts to reproduce the package given the package file, and buildinfo which is a helper tool to read the BUILDINFO file and download packages from the archive.

The goal of archlinux-repro is to be distribution agnostic. You should be able to reproduce Arch packages on any distribution and verify they are the same as the distributed package. There is still some work left on this as we utilize systemd-nspawn for chrooting, and we also need to inititalize a keyring. This means you can’t rebuild Arch packages on your preferred distribution just yet. We are also working on devtools additions which enables us to provide tooling more tightly integrated into our existing tools.

The input of repro is a package, however this alone is not enough to rebuild a package. We need the PKGBUILD file. Internally all Arch packages are stored in an SVN repository. This is a bit tedious to work with, but we provide a tool called asp which pulls a given package down from a Git mirror of the SVN repository.

And for the fun of it, the first package we have to try to reproduce is obviously pacman:

$ repro pacman-5.2.1-1-x86_64.pkg.tar.xz 
:: Synchronizing package databases...
 core is up to date
 extra is up to date
 community is up to date
 multilib is up to date
:: Starting full system upgrade...
 there is nothing to do
==> Starting build...
  -> Create snapshot for build...
[...]
:: Running post-transaction hooks...
(1/2) Reloading system manager configuration...
  Skipped: Current root is not booted.
(2/2) Arming ConditionNeedsUpdate...
  -> Preparing packages
Hit cache for acl-2.2.53-1-x86_64.pkg.tar.xz
Hit cache for archlinux-keyring-20191018-1-any.pkg.tar.xz
Hit cache for argon2-20190702-1-x86_64.pkg.tar.xz
[...]
  -> Finished preparing packages
==> Installing packages
loading packages...
warning: acl-2.2.53-1 is up to date -- reinstalling
warning: archlinux-keyring-20191018-1 is up to date -- reinstalling
warning: argon2-20190702-1 is up to date -- reinstalling
[...]
==> Making package: pacman 5.2.1-1 (Fri Nov  8 15:35:29 2019)
[...]
==> Finished making: pacman 5.2.1-1 (Fri Nov  8 15:36:35 2019)
==> Cleaning up...
  -> Delete snapshot for build...
==> Comparing hashes...
==> Package is reproducible!

$ sha256sum pacman-*.pkg.tar.xz build/pacman-*.pkg.tar.xz 
3a13aff27db6d671e9a816e5ed4c05cb76fe703d998e78121f43645a3f8f7bd3  
pacman-5.2.1-1-x86_64.pkg.tar.xz
3a13aff27db6d671e9a816e5ed4c05cb76fe703d998e78121f43645a3f8f7bd3
build/pacman-5.2.1-1-x86_64.pkg.tar.xz

If the result would have wound up as unreproducible, the reproducible builds project has created “diffoscope” a tool to compare many different types of files and discover the differences between them. It is used in the CI environment, so we can take a look at the diffoscope output from glib2.

The future?

The end goal should be to make any artifacts produced by Arch Linux reproducible. We currently have reproducible initramfs in mkinitcpio, and having this with the archiso releases would be beneficial as well. Having independent rebuilders verifying the distributed packages as they are published is also something we really want to accomplish. However we always need more hands to help!

If you are interested helping out in any of these efforts, please visit #archlinux-reproducible on Freenode, or #reproducible-builds on OFTC!

Thanks to Eli Schwartz, Santiago Torres-Arias, Holger Levsen and Viva for reviewing and nitpicking the draft!


Back to posts