I have spent a fair amount of time hacking on debug packages the past two years. This work resulted in Arch Linux announcing the public debuginfod server which allows users to download symbols and source code to debug software running on their system.
With this service users don’t need to figure out what the debug packages are called, installing them and maybe removing it afterwards. It also saves a fair amount of data you need to download. Generally just a great service with a good list of supported clients.
coredumpctl
is a tool from systemd
that effectively installs crash handlers
on your system. If a process dumps its core, it will keep track of it, store it
and make it easier for you to debug them through debuggers like gdb
and
lldb
. The core dump contains a memory dump with variables that where defined,
a stack trace so you can see what was executed.
As a maintainer of the Go compiler on Arch I also wanted to make sure the Go
specific debugger delve
also could make use of the tooling.
Thus in this example we are going to be using the debugger delve to debug uh… delve!
The binary we are going to crash is from the delve
package in Arch Linux. It
is stripped form all debug symbols, and there is no source code available for
the binary on this system. We want to use coredumpctl
to simplify dealing with
the core dump, and we want to debug this with delve
itself.
All the source and symbols is from debuginfod.archlinux.org
.
$ GOTRACEBACK=crash dlv dap &
[1] 226038
$ kill -SEGV $!
SIGSEGV: segmentation violation
PC=0x563722bbb981 m=0 sigcode=0
[...]
[1] + IOT instruction (core dumped) GOTRACEBACK=crash dlv dap
Here we just launch the dap
server in delve
in the background. We use the
$!
shorthand, which references the process ID, and simply kill it.
GOTRACEBACK=crash
instructs the Go runtime to
raise SIGABRT
when it exists which
will create a core dump. As we have systemd
installed we can
then use the coredumpctl
to interact with the core dump.
$ coredumpctl list dlv
TIME PID UID GID SIG COREFILE EXE SIZE
Thu 2022-11-17 21:27:53 CET 226038 1000 1000 SIGABRT present /usr/bin/dlv 2.1M
Here we see when the core dump happened. What process it was for, the PID and the user/group IDs.
Before we use coredumpctl
to inspect this core dump we need to install
debuginfod
so delve
can use the debuginfod-find
binary to download sources
for us.
# Note you need to re-exec the shell or source /etc/profile.d/debuginfod.sh
# after installing debuginfod.
$ pacman -S debuginfod
Earlier this year I wrote up the patches needed for delve
to understand source
listings from debuginfod
. Along with a tiny bit of refactoring so code
de-duplication was possible.
https://github.com/go-delve/delve/pull/2885
This is the feature we are going to be using to actually inspect the symbols and the source listing in the debugging session.
Please note I wrote a small patch so coredumpctl
can use delve
, as it expects
core dumps to be passed through a -c/-core
switch. It should be part of the
next delve release.
$ coredumpctl debug --debugger=./dlv -A core dlv
PID: 226038 (dlv)
UID: 1000 (fox)
GID: 1000 (fox)
Signal: 6 (ABRT)
Timestamp: Thu 2022-11-17 21:27:53 CET (16min ago)
Command Line: dlv dap
Executable: /usr/bin/dlv
Size on Disk: 2.1M
Message: Process 226038 (dlv) of user 1000 dumped core.
Stack trace of thread 226048:
#0 0x0000563722bbb401 n/a (dlv + 0x1a8401)
#1 0x0000563722b9f805 n/a (dlv + 0x18c805)
# ...
#17 0x0000563722bb77a5 n/a (dlv + 0x1a47a5)
ELF object binary architecture: AMD x86-64
Type 'help' for list of commands.
(dlv) bt
0 0x000055799c40a401 in runtime.raise
at /usr/lib/go/src/runtime/sys_linux_amd64.s:159
1 0x000055799c3ede65 in runtime.dieFromSignal
at /usr/lib/go/src/runtime/signal_unix.go:870
2 0x000055799c3ee805 in runtime.sigfwdgo
at /usr/lib/go/src/runtime/signal_unix.go:1086
3 0x000055799c3ecb47 in runtime.sigtrampgo
at /usr/lib/go/src/runtime/signal_unix.go:432
4 0x000055799c40a6e9 in runtime.sigtramp
at /usr/lib/go/src/runtime/sys_linux_amd64.s:359
5 0x00007f16ee0d0a00 in ???
at ?:-1
[...snip....]
19 0x000055799c92502e in ???
at ?:-1
error: error while reading spliced memory at 0x7f15ae7fd260: EOF
(truncated)
(dlv)
We have now asked coredumpctl
to debug the last core dump of the dlv
binary.
Here we can see the backtrace of the goroutine that failed. We can see
runtime.raise
was used and we hit the function dieFromSignal
. This makes
sense considering we killed the process.
The above stack trace contains code found locally installed. Our interest is to
look at the delve
source code though! Using grs
we can list the goroutines
from the crash, and as Goroutine 1
contains symbols to the delve
source we
will take a peak at that one.
(dlv) grs
Goroutine 1 - User: /usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go:831 github.com/go-delve/delve/cmd/dlv/cmds.waitForDisconnectSignal (0x5637230d382c) [select]
Goroutine 2 - User: /usr/lib/go/src/runtime/proc.go:364 runtime.gopark (0x563722b8baf6) [force gc (idle)]
Goroutine 3 - User: /usr/lib/go/src/runtime/proc.go:364 runtime.gopark (0x563722b8baf6) [GC sweep wait]
Goroutine 4 - User: /usr/lib/go/src/runtime/proc.go:364 runtime.gopark (0x563722b8baf6) [GC scavenge wait]
Goroutine 5 - User: /usr/lib/go/src/runtime/proc.go:364 runtime.gopark (0x563722b8baf6) [finalizer wait]
Goroutine 18 - User: /usr/lib/go/src/net/fd_unix.go:172 net.(*netFD).accept (0x563722c89a95) [IO wait]
Goroutine 19 - User: /usr/lib/go/src/runtime/proc.go:364 runtime.gopark (0x563722b8baf6) [select]
Goroutine 20 - User: /usr/lib/go/src/runtime/sigqueue.go:152 os/signal.signal_recv (0x563722bb62af) (thread 226044)
[8 goroutines]
(dlv) gr 1
Switched from 0 to 1 (thread 226048)
(dlv) bt
0 0x0000563722b8baf6 in runtime.gopark
at /usr/lib/go/src/runtime/proc.go:364
1 0x0000563722b9af7c in runtime.selectgo
at /usr/lib/go/src/runtime/select.go:328
2 0x00005637230d382c in github.com/go-delve/delve/cmd/dlv/cmds.waitForDisconnectSignal
at /usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go:831
3 0x00005637230d066f in github.com/go-delve/delve/cmd/dlv/cmds.dapCmd.func1
at /usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go:511
4 0x00005637230cfede in github.com/go-delve/delve/cmd/dlv/cmds.dapCmd
at /usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go:513
5 0x00005637230c50e3 in github.com/spf13/cobra.(*Command).execute
at /usr/src/debug/delve/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856
6 0x00005637230c56dd in github.com/spf13/cobra.(*Command).ExecuteC
at /usr/src/debug/delve/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:960
7 0x00005637230d54aa in github.com/spf13/cobra.(*Command).Execute
at /usr/src/debug/delve/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:897
8 0x00005637230d54aa in main.main
at /usr/src/debug/delve/delve-1.9.1/cmd/dlv/main.go:24
9 0x0000563722b8b733 in runtime.main
at /usr/lib/go/src/runtime/proc.go:250
10 0x0000563722bb9ae1 in runtime.goexit
at /usr/lib/go/src/runtime/asm_amd64.s:1594
This looks more interesting! We have references to /usr/src/debug/delve
which
is the source code from the debug packages. We can take a peak at frame 4.
(dlv) frame 4
> runtime.gopark() /usr/lib/go/src/runtime/proc.go:364 (PC: 0x563722b8baf6)
Frame 4: /usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go:513 (PC: 5637230cfede)
508: } else { // work with a predetermined client.
509: server.RunWithClient(conn)
510: }
511: waitForDisconnectSignal(disconnectChan)
512: return 0
=> 513: }()
514: os.Exit(status)
515: }
516:
517: func buildBinary(cmd *cobra.Command, args []string, isTest bool) (string, bool) {
518: debugname, err := filepath.Abs(cmd.Flag("output").Value.String())
(dlv)
Behind the scenes we have now fetched the file
/usr/src/debug/delve/delve-1.9.1/cmd/dlv/cmds/commands.go
from debuginfod.
This works transparently and it’s as if the source was always present on the
system. We can also look at module code! Lets check out frame 5.
(dlv) frame 5
> runtime.gopark() /usr/lib/go/src/runtime/proc.go:364 (PC: 0x563722b8baf6)
Frame 5: /usr/src/debug/delve/pkg/mod/github.com/spf13/cobra@v1.1.3/command.go:856 (PC: 5637230c50e3)
851: if c.RunE != nil {
852: if err := c.RunE(c, argWoFlags); err != nil {
853: return err
854: }
855: } else {
=> 856: c.Run(c, argWoFlags)
857: }
858: if c.PostRunE != nil {
859: if err := c.PostRunE(c, argWoFlags); err != nil {
860: return err
861: }
This shows us some code from the library cobra which is used for the flag and command handling inside delve. This means we have full insight into the code we ran through in the software that arrived from other modules.
And it just works!
I’m quite pleased how this tooling actually works, and hopefully people find it useful. A lot of these things seems largely unexplored in the case of minimalistic containers. Debug stripping is often done to save on space, both for containers and Linux distributions.
Having a kubernetes cluster with crash handlers, and a debuginfod server with delve support seems like a cool thing that should exist I guess.
I plan on writing a longer blog post how the infrastructure in Arch Linux was implemented, along a longer post describing what I learned implementing better debug package support in pacman.