Arch Linux has been involved with the reproducible builds efforts since 2016. The goal is to achieve deterministic building of software packages to enhance the security of the distribution.
After almost 3 years of continued effort, along with the release of pacman 5.2 and contributions from a lot of people, we are finally able to reproduce packages distributed by Arch Linux!
This enables users to build packages and compare them with the ones distributed by the Arch Linux team. Users can independently verify the work done by our packagers, and figure out if malicious code has been included in the pristine source during the build, which in turns enhances the overall supply chain security. We are one of the first binary distributions that has achieved this, and can provide tooling down to users.
That was the TL;DR! The rest of the blog post will explain the reproducible builds efforts, and the technical work that has gone into achieving this.
Reproducible Builds
The reproducible builds effort was started by Debian in 2013 and currently encompasses several projects across the spectrum, from coreboot, Bitcoin, NixOS and F-Droid to mention a few.
The goal of the effort is to figure out what makes software projects undeterministic, and come up with standards to solve these issues.
Two of these standards are
SOURCE_DATE_EPOCH
to ensure timestamps can be overridden during the build, and
BUILD_PATH_PREFIX_MAP
to ensure we can consistently trim build paths across toolchains.
SOURCE_DATE_EPOCH
has been integrated into
gcc and the
toolchain for several projects. BUILD_PATH_PREFIX_MAP
however has not seen a
lot of adoption yet.
Another project they work on is a Continuous Integration (CI) framework to test reproducible packages. The Debian one is probably more interesting to browse, but Arch Linux has a CI environment for the past two years which has been consistently rebuilding packages.
However, the CI does not address the real issue. It is a very nice tool for debugging and uncovering issues that cause undeterministic builds. As Holger explained on the debian-devel mailing list earlier this year, “these tests are done without looking at the actual .deb files distributed from ftp.debian.org (and we always knew that and pointed it out: “93% reproducible in our current test framework”)”. We are essentially building the same set of files twice in different environments.
Ensuring software projects can be deterministically built is all fine and dandy, but we grant users the ability to verify the work package maintainers are doing.
Reproducing packages
There are a few components needed to ensure packages can be rebuilt. We need to
record the environment of the package build. This needs to be build paths,
timestamps, all dependencies installed on the system and a bunch of other
information. This is standardized in different formats across
projects, but the format used
for pacman can be found in BUILDINFO(5)
.
One can explore a BUILDINFO file from any package by extracting the top-level file.
bsdtar -xOf archlinux-keyring-20191018-1-any.pkg.tar.xz .BUILDINFO
Pacman also keeps tracks of some metadata inside .PKGINFO
. On the surface this
seems trivial, record some information and put it into a file. Like “How much
space does this package take?”. But counting file sizes is a
problem.
Counting
file
sizes
is
hard.
Like,
very
hard.
Someone even wanted --skipbtrfshack
in
makepkg at some point. We have also had
some problems sorting
files,
and bsdtar fflags
embedding information unique to the
system.
But these quirks should have been solved, and we haven’t found any other issues, yet. This means we got packages which can be consistently created, and we have recorded the build information. Now we need to recreate the build environment.
Arch Linux is a rolling release distribution. We probably build and release
between 20 and 100 packages every day. With no set release schedule, packages
can get rotated in, and out, of a repository on the same day. When recoding the
installed packages in the BUILDINFO
files, this could in theory change several
times a day. How do we reproduce a package released 2 weeks ago?
The main solution to this is to record all released packages, and that is what
we do on https://archive.archlinux.org. A lot of the older packages are uploaded
to
archive.org
and linked from the archive. This enables us to rebuild packages back in time,
as long as the installed pacman was >=5.2
.
Recreating the build environment for the package is the next mission. Arguably this step came before other parts of this blog post, but technically this probably makes sense ¯\(ツ)/¯.
We need to create a chroot and apply the needed environment changes. In devtools
we utilize systemd-nspawn
to create these chroot containers, and this in turn
was the starting point for
archlinux-repro. It provides two
tools, repro
which takes a package file and attempts to reproduce the package
given the package file, and buildinfo
which is a helper tool to read the
BUILDINFO
file and download packages from the archive.
The goal of archlinux-repro
is to be distribution agnostic. You should be able
to reproduce Arch packages on any distribution and verify they are the same as
the distributed package. There is still some work left on this as we utilize
systemd-nspawn
for chrooting, and we also need to inititalize a keyring. This
means you can’t rebuild Arch packages on your preferred distribution just yet.
We are also working on devtools
additions
which enables us to provide tooling more tightly integrated into our existing
tools.
The input of repro
is a package, however this alone is not enough to rebuild a
package. We need the PKGBUILD
file. Internally all Arch packages are stored in
an SVN repository. This is a bit tedious to work with, but we provide a tool
called asp
which pulls a given package down from a Git mirror of the SVN
repository.
And for the fun of it, the first package we have to try to reproduce is obviously pacman:
$ repro pacman-5.2.1-1-x86_64.pkg.tar.xz
:: Synchronizing package databases...
core is up to date
extra is up to date
community is up to date
multilib is up to date
:: Starting full system upgrade...
there is nothing to do
==> Starting build...
-> Create snapshot for build...
[...]
:: Running post-transaction hooks...
(1/2) Reloading system manager configuration...
Skipped: Current root is not booted.
(2/2) Arming ConditionNeedsUpdate...
-> Preparing packages
Hit cache for acl-2.2.53-1-x86_64.pkg.tar.xz
Hit cache for archlinux-keyring-20191018-1-any.pkg.tar.xz
Hit cache for argon2-20190702-1-x86_64.pkg.tar.xz
[...]
-> Finished preparing packages
==> Installing packages
loading packages...
warning: acl-2.2.53-1 is up to date -- reinstalling
warning: archlinux-keyring-20191018-1 is up to date -- reinstalling
warning: argon2-20190702-1 is up to date -- reinstalling
[...]
==> Making package: pacman 5.2.1-1 (Fri Nov 8 15:35:29 2019)
[...]
==> Finished making: pacman 5.2.1-1 (Fri Nov 8 15:36:35 2019)
==> Cleaning up...
-> Delete snapshot for build...
==> Comparing hashes...
==> Package is reproducible!
$ sha256sum pacman-*.pkg.tar.xz build/pacman-*.pkg.tar.xz
3a13aff27db6d671e9a816e5ed4c05cb76fe703d998e78121f43645a3f8f7bd3
pacman-5.2.1-1-x86_64.pkg.tar.xz
3a13aff27db6d671e9a816e5ed4c05cb76fe703d998e78121f43645a3f8f7bd3
build/pacman-5.2.1-1-x86_64.pkg.tar.xz
If the result would have wound up as unreproducible, the reproducible builds project has created “diffoscope” a tool to compare many different types of files and discover the differences between them. It is used in the CI environment, so we can take a look at the diffoscope output from glib2.
The future?
The end goal should be to make any artifacts produced by Arch Linux reproducible. We currently have reproducible initramfs in mkinitcpio, and having this with the archiso releases would be beneficial as well. Having independent rebuilders verifying the distributed packages as they are published is also something we really want to accomplish. However we always need more hands to help!
If you are interested helping out in any of these efforts, please visit #archlinux-reproducible on Freenode, or #reproducible-builds on OFTC!
Thanks to Eli Schwartz, Santiago Torres-Arias, Holger Levsen and Viva for reviewing and nitpicking the draft!