"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

— Brian W. Kernighan

Kernel debugging¶

A great way to get your hands dirty with kernel code (or any codebase in general) is by fixing bugs. Debugging itself is an art, and the kernel is a particularly tricky target to master, but we'll go over some useful tools to help you along the journey.

The majority of your time debugging kernel crashes will be spent on navigating the source code, i.e. jumping around from function to function in order to understand the general flow of the codepath that led to a crash. Especially when you are debugging subsystems you're not very familiar with.

As such, regardless of which text editor you use, you will need to find the workflows/tools that work best for you. You need basically three essential features:

Jumping to function/macro/variable definitions
Opening header files quickly
Searching for strings in the kernel source

You can find below a minimal setup guide for vim and VS Code (please feel free to send a PR adding instructions for your favorite editor!)

vim

You can navigate code in vim using ctags, which are a code indexing tool (very useful for C code navigation).

Debian/Ubuntu and derivatives

sudo apt install universal-ctags

Fedora and derivatives

sudo dnf install ctags

Arch Linux and derivatives

sudo pacman -S --needed ctags

Now you need to index your source tree (i.e. use ctags to generate the index file that vim reads in order to allow you to jump to definitions). From the root of the source tree, simply run

ctags -R .

This will recursively walk through the source tree and generate the index files. It might also take a little while (you can look into installing ptags, which runs ctags on several parallel threads in order to speed up the process). When it finishes, you should have a (huge) file called tags on the root of your source tree.

Now open any .c file and hover on top of a function/variable/macro and press Ctrl + ], and you'll jump to the definition of that symbol. Press Ctrl + O to jump back to where you were before.

VS Code

In VS Code we will make use of clangd (which is what we call a language server) to index the source code and give us nice features (like autocompletion, function definitions on hover, among others). If you followed the First Contribution tutorial you should have everything configured already. There is a caveat, though:

Generating compile_commands.json

clangd requires a config file called compile_commands.json to be present in the root of the source tree. This file is not present in the source tree by default (primarily because it depends on your current .config), so you need to do the following every time you change your kernel config:

Generate your config (make defconfig or your custom config)
Compile the kernel (or at least run the beginning of the compilation so that it generates some arch-specific files that clangd needs to index)
Run ./scripts/clang-tools/gen_compile_commands.py in the root of the source tree to generate the compile_commands.json file

Now your clangd should be able to start indexing the kernel source.

Another useful tool to find your way around the source tree is ripgrep. This allows you to do a recursive grep through the source tree (i.e., search for a string in all the files form the source tree). For example, in the root of the kernel source, the following will show you all occurrences of module_init in the kernel code:

rg -F "module_init"

Anatomy of a kernel crash¶

Bullying the kernel for the sake of science¶

Let's artificially inject a panic into our kernel, while also trying to emulate a somewhat realistic scenario.

Find the function fat_fill_super inside fs/fat/inode.c. Add the following BUG_ON call on the first line of the function:

int fat_fill_super(struct super_block *sb, struct fs_context *fc,
           void (*setup)(struct super_block *))
{
    BUG_ON(true);
    struct fat_mount_options *opts = fc->fs_private;
    ...
}

You can read more about the BUG_ON assertion here, but basically what it does is abort the current thread of execution while dumping a stack trace to the kernel log (we'll take a closer look at this soon).

What this means in practice is that we're creating a crash in the kernel whenever we try to mount a FAT32 filesystem.

Let's create a file containing a FAT32 filesystem inside our shared folder and test this theory:

Dependencies on Arch Linux

sudo pacman -S --needed dosfstools

truncate -s 64M ../shared_folder/fat32_fs.raw
mkfs.fat -F 32 ../shared_folder/fat32_fs.raw

Finally, enable CONFIG_DEBUG_INFO=y in your .config (this is enabled for most distro kernels). Remember: do it through make menuconfig, avoid editing .config manually if at all possible.

CONFIG_DEBUG_INFO=y

This config option enables debug symbols in the final ELF kernel image (vmlinux). This allows you to easily map portions of the final kernel executable (e.g. where a crash happened) to a file/line number in the C code that the kernel was compiled from. Most distros enable this by default since it allows for much easier debugging.

Let's boot our VM:

Booting with QEMU

qemu-system-x86_64 \
    -drive file=../my_disk.raw,format=raw,index=0,media=disk \
    -m 2G -nographic \
    -kernel ./arch/x86_64/boot/bzImage \
    -append "root=/dev/sda rw console=ttyS0 loglevel=6" \
    -fsdev local,id=fs1,path=../shared_folder,security_model=none \
    -device virtio-9p-pci,fsdev=fs1,mount_tag=shared_folder \
    --enable-kvm

Then let's go ahead and mount the file we created:

cd host_folder
mount fat32_fs.raw /mnt

You should get a message like the following:

Stack trace

root@kugane:~/host_folder# mount fat32_fs.raw /mnt
[   20.521136] ------------[ cut here ]------------
[   20.524322] kernel BUG at fs/fat/inode.c:1536!
[   20.527254] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   20.529876] CPU: 0 UID: 0 PID: 179 Comm: mount Not tainted 6.11.0-rc5-00015-g3e9bff3bbe13-dirty #4
[   20.532587] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   20.536227] RIP: 0010:fat_fill_super+0x5/0x10
[   20.537943] Code: e7 5d 41 5c e9 9c 63 b6 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0
[   20.543116] RSP: 0018:ffffb20540437e30 EFLAGS: 00010246
[   20.544452] RAX: 0000000000000000 RBX: ffff99bd83d5d780 RCX: 0000000000000fff
[   20.546011] RDX: ffffffffb61f04e0 RSI: ffff99bd83d5d780 RDI: ffff99bd83d94800
[   20.547397] RBP: 0000000000000000 R08: 0000000000000005 R09: ffff99be83d94b97
[   20.548793] R10: ffffffffffffffff R11: fefefefefefefeff R12: ffff99bd83d94800
[   20.550148] R13: ffffffffb61f05a0 R14: 0000000000000000 R15: 00000000ffffffff
[   20.551676] FS:  00007f4d6a97f840(0000) GS:ffff99bdfdc00000(0000) knlGS:0000000000000000
[   20.553408] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.554466] CR2: 00007f4d6a854000 CR3: 000000000277c000 CR4: 00000000000006f0
[   20.555756] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   20.557044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   20.558340] Call Trace:
[   20.558811]  <TASK>
[   20.559193]  ? die+0x31/0x80
[   20.559676]  ? do_trap+0xd8/0x100
[   20.560134]  ? fat_fill_super+0x5/0x10
[   20.560623]  ? do_error_trap+0x60/0x80
[   20.561120]  ? fat_fill_super+0x5/0x10
[   20.561604]  ? exc_invalid_op+0x51/0x70
[   20.562106]  ? fat_fill_super+0x5/0x10
[   20.562591]  ? asm_exc_invalid_op+0x1a/0x20
[   20.563133]  ? __pfx_vfat_fill_super+0x10/0x10
[   20.563709]  ? __pfx_setup+0x10/0x10
[   20.564138]  ? fat_fill_super+0x5/0x10
[   20.564772]  get_tree_bdev+0x124/0x1c0
[   20.565510]  vfs_get_tree+0x24/0xe0
[   20.566203]  path_mount+0x2e1/0xab0
[   20.566906]  __x64_sys_mount+0x112/0x150
[   20.567514]  do_syscall_64+0x9e/0x1a0
[   20.567967]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   20.568600] RIP: 0033:0x7f4d6ab7ed3a
[   20.569042] Code: 48 8b 0d c9 80 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 8
[   20.571205] RSP: 002b:00007ffe0bd321e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[   20.572097] RAX: ffffffffffffffda RBX: 0000562e14edbe10 RCX: 00007f4d6ab7ed3a
[   20.572937] RDX: 0000562e14ee3d90 RSI: 0000562e14edc060 RDI: 0000562e14edfe70
[   20.573779] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffe0bd32250
[   20.574613] R10: 0000000000000000 R11: 0000000000000246 R12: 0000562e14edfe70
[   20.575454] R13: 0000562e14ee3d90 R14: 00007f4d6ace6264 R15: 0000562e14edbf28
[   20.576298]  </TASK>
[   20.576597] Modules linked in:
[   20.577168] ---[ end trace 0000000000000000 ]---
[   20.577972] RIP: 0010:fat_fill_super+0x5/0x10
[   20.578763] Code: e7 5d 41 5c e9 9c 63 b6 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0
[   20.582768] RSP: 0018:ffffb20540437e30 EFLAGS: 00010246
[   20.583464] RAX: 0000000000000000 RBX: ffff99bd83d5d780 RCX: 0000000000000fff
[   20.584341] RDX: ffffffffb61f04e0 RSI: ffff99bd83d5d780 RDI: ffff99bd83d94800
[   20.585266] RBP: 0000000000000000 R08: 0000000000000005 R09: ffff99be83d94b97
[   20.586182] R10: ffffffffffffffff R11: fefefefefefefeff R12: ffff99bd83d94800
[   20.587054] R13: ffffffffb61f05a0 R14: 0000000000000000 R15: 00000000ffffffff
[   20.587912] FS:  00007f4d6a97f840(0000) GS:ffff99bdfdc00000(0000) knlGS:0000000000000000
[   20.588873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.589558] CR2: 00007f4d6a854000 CR3: 000000000277c000 CR4: 00000000000006f0
[   20.590550] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   20.591730] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Segmentation fault

Making sense of the mess¶

Although that message can look ~~horrifying~~ a little intimidating, we can break it down bit by bit. The following:

kernel BUG at fs/fat/inode.c:1536!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI

tells us that the kernel ran into a BUG assertion (the one we inserted, but a similar message shows up when a NULL pointer dereference or other similarly evil C mishap takes place in kernel space). You'll notice that the stack trace gave you the exact file/line where the crash happened, but this isn't very useful -- what you want is to follow the trail that led to the crash.

RIP: 0010:fat_fill_super+0x5/0x10

RIP is the instruction pointer register in x86_64. This tells us exactly at which address the crash happened. In this case, since we compiled with CONFIG_DEBUG_INFO=y, the kernel was kind enough to substitute the raw memory 64-bit memory address contained in the register with a more convenient expression that tells us we crashed inside the fat_fill_super function.

Notation for instruction addresses in stack traces

The kernel is an ELF executable, just like most userspace programs (if you haven't read about how the ELF format works before, I highly recommend doing it).

When we compile executables with debug info, the compiler will generate a set of symbols, that are basically strings that identify where exactly (i.e. at what position, in bytes) inside an executable a certain piece of code (for example, a function) was placed. This allows us to translate memory addresses into a more convenient format, e.g. instead of "this crash happened at address 0xffffffffa03e1000", we get "this crash happened at an offset of 0x12 bytes inside the function kernel_do_foo", which is obviously more convenient.

The following notation:

<symbol_name>+<offset>/<length>

Means "<offset> bytes after the start of <symbol_name>; <symbol_name> has a total length of <length> bytes".

Next comes the (actual) good stuff, i.e., the stack trace:

Call Trace:
 <TASK>
 ? die+0x31/0x80
 ? do_trap+0xd8/0x100
 ? fat_fill_super+0x5/0x10
 ? do_error_trap+0x60/0x80
 ? fat_fill_super+0x5/0x10
 ? exc_invalid_op+0x51/0x70
 ? fat_fill_super+0x5/0x10
 ? asm_exc_invalid_op+0x1a/0x20
 ? __pfx_vfat_fill_super+0x10/0x10
 ? __pfx_setup+0x10/0x10
 ? fat_fill_super+0x5/0x10
 get_tree_bdev+0x124/0x1c0
 vfs_get_tree+0x24/0xe0
 path_mount+0x2e1/0xab0
 __x64_sys_mount+0x112/0x150
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Here we can see clearly what happened, in chronological order:

    ------- userspace -------
    // userspace executes the 'mount' syscall asking the kernel to
    // mount the file:
    mount("/root/host_folder/fat32_fs.raw", "/mnt", "vfat", ...)

    ------- kernel space -------
    entry_SYSCALL_64_after_hwframe
      do_syscall_64
        __x64_sys_mount             // kernel services the "mount" syscall
          path_mount                // verifies that what we're trying to mount is a file on disk
            vfs_get_tree            // hands control over to VFS
              get_tree_bdev
                fat_fill_super      // runs into our BUG_ON and dies

Although this helps a lot, what we really want is to find this path inside the actual kernel code. In order to do that, one option is to use scripts/decode_stacktrace.sh to translate from the symbol + offsets notation to actual files + lines from the source code. Save the crash log to a file called log.txt in the root of the kernel tree and run the following:

./scripts/decode_stacktrace.sh vmlinux . < log.txt

decode_stacktrace.sh usage

./scripts/decode_stacktrace.sh <path-to-vmlinux> <path-to-linux-source> < <path-to-stacktrace-log>

This will break the stack trace into a much more readable format:

Decoded stack trace example

[   20.521136] ------------[ cut here ]------------
[   20.524322] kernel BUG at fs/fat/inode.c:1536!
[   20.527254] Oops: invalid opcode: 0000 [#1] PREEMPT SMP PTI
[   20.529876] CPU: 0 UID: 0 PID: 179 Comm: mount Not tainted 6.11.0-rc5-00015-g3e9bff3bbe13-dirty #4
[   20.532587] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   20.536227] RIP: 0010:fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[ 20.537943] Code: e7 5d 41 5c e9 9c 63 b6 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0

Code starting with the faulting instruction
===========================================
0:  e7 5d                   out    %eax,$0x5d
2:  41 5c                   pop    %r12
4:  e9 9c 63 b6 00          jmp    0xb663a5
9:  66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
10: 00 00 00 00 
14: 90                      nop
15: 90                      nop
16: 90                      nop
17: 90                      nop
18: 90                      nop
19: 90                      nop
1a: 90                      nop
1b: 90                      nop
1c: 90                      nop
1d: 90                      nop
1e: 90                      nop
1f: 90                      nop
20: 90                      nop
21: 90                      nop
22: 90                      nop
23: 90                      nop
24: 90                      nop
25: f3 0f 1e fa             endbr64
    ...
[   20.543116] RSP: 0018:ffffb20540437e30 EFLAGS: 00010246
[   20.544452] RAX: 0000000000000000 RBX: ffff99bd83d5d780 RCX: 0000000000000fff
[   20.546011] RDX: ffffffffb61f04e0 RSI: ffff99bd83d5d780 RDI: ffff99bd83d94800
[   20.547397] RBP: 0000000000000000 R08: 0000000000000005 R09: ffff99be83d94b97
[   20.548793] R10: ffffffffffffffff R11: fefefefefefefeff R12: ffff99bd83d94800
[   20.550148] R13: ffffffffb61f05a0 R14: 0000000000000000 R15: 00000000ffffffff
[   20.551676] FS:  00007f4d6a97f840(0000) GS:ffff99bdfdc00000(0000) knlGS:0000000000000000
[   20.553408] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.554466] CR2: 00007f4d6a854000 CR3: 000000000277c000 CR4: 00000000000006f0
[   20.555756] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   20.557044] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   20.558340] Call Trace:
[   20.558811]  <TASK>
[   20.559193] ? die (/home/nuke/kernel/linux/arch/x86/kernel/dumpstack.c:421 /home/nuke/kernel/linux/arch/x86/kernel/dumpstack.c:434 /home/nuke/kernel/linux/arch/x86/kernel/dumpstack.c:447) 
[   20.559676] ? do_trap (/home/nuke/kernel/linux/arch/x86/kernel/traps.c:114 /home/nuke/kernel/linux/arch/x86/kernel/traps.c:155) 
[   20.560134] ? fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[   20.560623] ? do_error_trap (/home/nuke/kernel/linux/./arch/x86/include/asm/traps.h:58 /home/nuke/kernel/linux/arch/x86/kernel/traps.c:176) 
[   20.561120] ? fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[   20.561604] ? exc_invalid_op (/home/nuke/kernel/linux/arch/x86/kernel/traps.c:266) 
[   20.562106] ? fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[   20.562591] ? asm_exc_invalid_op (/home/nuke/kernel/linux/./arch/x86/include/asm/idtentry.h:621) 
[   20.563133] ? __pfx_vfat_fill_super (/home/nuke/kernel/linux/fs/fat/namei_vfat.c:1199) 
[   20.563709] ? __pfx_setup (/home/nuke/kernel/linux/fs/fat/namei_vfat.c:1190) 
[   20.564138] ? fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[   20.564772] get_tree_bdev (/home/nuke/kernel/linux/fs/super.c:1635) 
[   20.565510] vfs_get_tree (/home/nuke/kernel/linux/fs/super.c:1801) 
[   20.566203] path_mount (/home/nuke/kernel/linux/fs/namespace.c:3472 /home/nuke/kernel/linux/fs/namespace.c:3799) 
[   20.566906] __x64_sys_mount (/home/nuke/kernel/linux/fs/namespace.c:3813 /home/nuke/kernel/linux/fs/namespace.c:4020 /home/nuke/kernel/linux/fs/namespace.c:3997 /home/nuke/kernel/linux/fs/namespace.c:3997) 
[   20.567514] do_syscall_64 (/home/nuke/kernel/linux/arch/x86/entry/common.c:52 (discriminator 1) /home/nuke/kernel/linux/arch/x86/entry/common.c:83 (discriminator 1)) 
[   20.567967] entry_SYSCALL_64_after_hwframe (/home/nuke/kernel/linux/arch/x86/entry/entry_64.S:130) 
[   20.568600] RIP: 0033:0x7f4d6ab7ed3a
[ 20.569042] Code: 48 8b 0d c9 80 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 8

Code starting with the faulting instruction
===========================================
0:  48 8b 0d c9 80 0c 00    mov    0xc80c9(%rip),%rcx        # 0xc80d0
7:  f7 d8                   neg    %eax
9:  64 89 01                mov    %eax,%fs:(%rcx)
c:  48 83 c8 ff             or     $0xffffffffffffffff,%rax
10: c3                      ret
11: 66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
18: 00 00 00 
1b: 0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
20: 49 89 ca                mov    %rcx,%r10
23: b8 a5 00 00 00          mov    $0xa5,%eax
28: 0f 08                   invd
[   20.571205] RSP: 002b:00007ffe0bd321e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[   20.572097] RAX: ffffffffffffffda RBX: 0000562e14edbe10 RCX: 00007f4d6ab7ed3a
[   20.572937] RDX: 0000562e14ee3d90 RSI: 0000562e14edc060 RDI: 0000562e14edfe70
[   20.573779] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffe0bd32250
[   20.574613] R10: 0000000000000000 R11: 0000000000000246 R12: 0000562e14edfe70
[   20.575454] R13: 0000562e14ee3d90 R14: 00007f4d6ace6264 R15: 0000562e14edbf28
[   20.576298]  </TASK>
[   20.576597] Modules linked in:
[   20.577168] ---[ end trace 0000000000000000 ]---
[   20.577972] RIP: 0010:fat_fill_super (/home/nuke/kernel/linux/fs/fat/inode.c:1536 (discriminator 1)) 
[ 20.578763] Code: e7 5d 41 5c e9 9c 63 b6 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0

Code starting with the faulting instruction
===========================================
0:  e7 5d                   out    %eax,$0x5d
2:  41 5c                   pop    %r12
4:  e9 9c 63 b6 00          jmp    0xb663a5
9:  66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
10: 00 00 00 00 
14: 90                      nop
15: 90                      nop
16: 90                      nop
17: 90                      nop
18: 90                      nop
19: 90                      nop
1a: 90                      nop
1b: 90                      nop
1c: 90                      nop
1d: 90                      nop
1e: 90                      nop
1f: 90                      nop
20: 90                      nop
21: 90                      nop
22: 90                      nop
23: 90                      nop
24: 90                      nop
25: f3 0f 1e fa             endbr64
    ...
[   20.582768] RSP: 0018:ffffb20540437e30 EFLAGS: 00010246
[   20.583464] RAX: 0000000000000000 RBX: ffff99bd83d5d780 RCX: 0000000000000fff
[   20.584341] RDX: ffffffffb61f04e0 RSI: ffff99bd83d5d780 RDI: ffff99bd83d94800
[   20.585266] RBP: 0000000000000000 R08: 0000000000000005 R09: ffff99be83d94b97
[   20.586182] R10: ffffffffffffffff R11: fefefefefefefeff R12: ffff99bd83d94800
[   20.587054] R13: ffffffffb61f05a0 R14: 0000000000000000 R15: 00000000ffffffff
[   20.587912] FS:  00007f4d6a97f840(0000) GS:ffff99bdfdc00000(0000) knlGS:0000000000000000
[   20.588873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   20.589558] CR2: 00007f4d6a854000 CR3: 000000000277c000 CR4: 00000000000006f0
[   20.590550] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   20.591730] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Segmentation fault

Now that we have actual files + line numbers, we're in a much better position to go back and understand the problematic codepath that led to the crash.

Sometimes this doesn't work, though. Another popular option is to use gdb (the Gnu Debugger).

Installing gdb

Debian/Ubuntu and derivatives

sudo apt install gdb

Fedora and derivatives

sudo dnf install gdb

Arch and derivatives

sudo pacman -S --needed gdb

If you have familiarity with using gdb to debug userspace C programs, it's the exact same process for the kernel (mostly). For example, from the following stack trace line:

[   20.564138]  ? fat_fill_super+0x5/0x10

We can spin up gdb and show the correspondent piece of kernel code:

$ gdb vmlinux
    ...
(gdb) list *(fat_fill_super+0x5)

Output

0xffffffff813eda95 is in fat_fill_super (fs/fat/inode.c:1536).
1531     * Read the super block of an MS-DOS FS.
1532     */
1533    int fat_fill_super(struct super_block *sb, struct fs_context *fc,
1534               void (*setup)(struct super_block *))
1535    {
1536        BUG_ON(true);
1537        struct fat_mount_options *opts = fc->fs_private;
1538        int silent = fc->sb_flags & SB_SILENT;
1539        struct inode *root_inode = NULL, *fat_inode = NULL;
1540        struct inode *fsinfo_inode = NULL;
(gdb)

This works for any of the functions shown in the stack trace.

Going even further beyond¶

Now that we have at least some idea of how to make sense of a kernel stack trace, we can look into some more advanced debugging methods.

Live debugging with `gdb`¶

Ever used gdb to debug a C program you wrote for your Data Structures and Algorithms class? Enjoyed being able to step through it while setting breakpoints and inspecting variables in order to track down bugs? What if I told you that you can do the same with the kernel? It is, after all, just a big C program.

Caveats

Well, for the most part, at least. You can only do it inside QEMU, and doing it can be unreliable, especially if you're debugging sensitive kernel code (generally anything inside mm/, or anything that involves locks, or anything that involves asynchronous hardware operations, or... the list goes on).

Nevertheless, it's often a useful tool, although you'll probably notice you'll spend a lot more time staring at the code and trying to make sense of it instead of using the fancy features gdb offers.

Using the GDB live debug feature

You need three things in order to do live debugging with GDB:

CONFIG_DEBUG_INFO=y
CONFIG_GDB_SCRIPTS=y
Pass nokaslr to the kernel command line

In order to use this feature, pass the -s flag to QEMU when launching your VM (notice the extra nokaslr option):

qemu-system-x86_64 \
    -drive file=../my_disk.raw,format=raw,index=0,media=disk \
    -m 2G -nographic \
    -kernel ./arch/x86_64/boot/bzImage \
    -append "root=/dev/sda rw console=ttyS0 loglevel=6 nokaslr" \
    -fsdev local,id=fs1,path=../shared_folder,security_model=none \
    -device virtio-9p-pci,fsdev=fs1,mount_tag=shared_folder \
    --enable-kvm \
    -s

Your kernel will start booting, and QEMU will open what we call a gdbserver on port 1234, which we can connect to in order to debug the kernel interactively. To do that, run the following (on another terminal window, while still inside the kernel source tree root):

gdb -tui -ex "target remote :1234" vmlinux

Once your debug session starts, you'll notice that your VM will freeze. This is because gdb halted the execution of the kernel, and we have to resume it. Let's add a breakpoint on the path_mount function and then resume execution:

(gdb) break path_mount

Now log in to your VM and redo the steps we took before to crash the kernel:

$ cd host_folder
$ mount fat32_fs.raw /mnt

You will notice that your VM will hang; that's because we hit the breakpoint on path_mount. From then on you can execute the code line by line:

typing n (next) will jump to the next line (without entering functions)
typing s (step) will jump to the next line (but will enter functions)

Try stepping your way through until you reach the BUG_ON call you added earlier.

Moving to a real kernel bug¶

Syzbot is a bot that tries to find vulnerabilities in the kernel via the fuzzing of syscalls. Whenever it manages to produce a C program that is able to consistently crash the kernel, it sends automated bug reports to the mailing lists.

The good thing is that a lot of those reports look similar to what we did here: the bot generates a program that will expose bugs in the kernel, such as NULL pointer dereferences, use-after-frees, BUG_ON triggers, or warnings about the usage of uninitialized values. A lot of them are relatively straightforward to debug, and you have already learned all the necessary tools to tackle and solve those bugs.

Go read this truly superb article by Javier Carrasco Cruz (a known mentor from the kernel community) about debugging + coming up with fixes for syzbot reports.

You have already learned everything you need to go fix a real kernel bug. Really! Now go and do it :-)