Talk: How the kernel boots
(1) Minimal rootfs¶
The kernel doesn't do much to initialize your system, after booting it searches for an executable refered as 'init' in your disk and just executes it. It is the first process to run from userspace, thus its Process ID is 1, i.e PID=1, and it is responsible for initializing the rest of your system, until you see your graphical interface, internet working and all the other services.
Lets build a minimal rootfs and boot using a virtual machine so we can understand this better.
First we create a disk for our virtual machine:
# Create a raw file named disk.raw of 100MB
$ dd if=/dev/zero of=disk.raw bs=1M count=100
# Format it to ext4
$ sudo mkfs.ext4 disk.raw
Now lets put some content inside this virtual disk.
To be able to use the basic command line tool, we are going to use a busybox.
Busybox is single binary where you can execute the basic comman line tools. So instead of executing ls
or echo
, we can execute busybox ls
or busybox echo
.
# Lets mount the disk so we can write files in the image
$ mkdir mnt && sudo mount disk.raw mnt
# Create a /bin folder inside disk.raw
$ sudo mkdir mnt/bin
# Download busybox binary
$ sudo wget https://busybox.net/downloads/binaries/1.21.1/busybox-x86_64 -O mnt/bin/busybox
Now lets create a simple script that will be the first program executed by the kernel
$ sudo vim mnt/bin/init.sh
#!/bin/busybox sh
busybox echo "Hello from userspace!"
busybox poweroff -f
Make it executable and unmount:
$ sudo chmod +x mnt/bin/*
$ sudo umount mnt
Thats it, now lets compile a minimal kernel and boot from this rootfs.
From your kernel tree:
# Compile the kernel
$ make x86_64_defconfig
$ make -j4
# Execute a virtual machine passing our disk.raw as its disk and point the init.sh to be executed on boot
$ qemu-system-x86_64 -hda ${PATH_TO_DISK}/disk.raw -m 2G --kernel arch/x86/boot/bzImage --append "console=ttyS0 root=/dev/sda rw init=/bin/init.sh" -nographic
You should see the "Hello from userspace!" message in your console.
Now, is would be much more usefull to have access to the shell and all your /dev/ /proc/ /sys/ populated in your system, so lets change the init.sh to be a bit more complicated. Mount your disk again and edit the init.sh file with the following content:
#!/bin/busybox sh
# lets create symbolic links, so we can use the tools
# "ls" or "echo" directly instead of "busybox ls" or
# "busybox echo"
for i in $(busybox --list); do
busybox ln -sf /bin/busybox /bin/$i
done
# Create mount points if they don't exist yet
mkdir -p /proc
mkdir -p /sys
mkdir -p /pts
mkdir -p /dev
mkdir -p /mnt
mkdir -p /nfs
# Mount these special "disks" to the folders we just created
mount -t proc none /proc
mount -t sysfs none /sys
mount -t devpts none /pts
mount -t debugfs none /sys/kernel/debug
ln -sf /sys/kernel/debug /debug
# Lets populate /dev/
echo /bin/mdev > /proc/sys/kernel/hotplug
mdev -s
# Initial message
echo 'Hello from Userspace!'
# Open shell
sh
# Turn off when done
poweroff -f
Relaunch your virtumal machine with qemu and you should be able to get a basic shell
(2) Using GDB to find the buggy line¶
Lets provoke an error by accessing a NULL pointer.
Open the file init/main.c
, find the function called kernel_init
and add the following in the middle of the function:
int *ops = NULL;
printk("%d", *ops);
Now, open make menuconfig
, search for DEBUG_INFO
, enable it, compile and execute your kernel. You should see a kernel panic similar to:
[ 1.375012] RIP: 0010:kernel_init+0x77/0x106
[ 1.375548] Code: 5c e9 00 48 85 ff 74 22 e8 c8 28 61 ff 85 c0 0f 84 9e 00 00 00 48 8b 35 a5 5c e9 00 89 c2 48 c7 c7 28 f1 dd a4 e8 c8 04 6c ff <8b> 34 25 00 00 00 00 48 c7 c7 91 51 e7 a4 e8 b5 04 6c ff 48 8b 3d
[ 1.377970] RSP: 0000:ffffa711c031ff50 EFLAGS: 00010246
[ 1.378633] RAX: 0000000000000000 RBX: ffffffffa47ee31a RCX: 0000000000000000
[ 1.379434] RDX: 0000000000000005 RSI: ffff994e7d43d018 RDI: 0000000000000000
[ 1.380231] RBP: 0000000000000000 R08: 0000000000024de0 R09: ffffffffa3fa0a90
[ 1.381029] R10: ffffde1e81f50f40 R11: 0000000000048209 R12: 0000000000000000
[ 1.381826] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1.382631] FS: 0000000000000000(0000) GS:ffff994e7da00000(0000) knlGS:0000000000000000
[ 1.383534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.384186] CR2: 0000000000000000 CR3: 000000004880a000 CR4: 00000000000006f0
[ 1.384996] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.385801] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.386627] Call Trace:
[ 1.386930] ret_from_fork+0x35/0x40
[ 1.387371] Modules linked in:
[ 1.387758] CR2: 0000000000000000
[ 1.388227] ---[ end trace 1630ed4b71d91315 ]---
[ 1.388854] RIP: 0010:kernel_init+0x77/0x106
[ 1.389435] Code: 5c e9 00 48 85 ff 74 22 e8 c8 28 61 ff 85 c0 0f 84 9e 00 00 00 48 8b 35 a5 5c e9 00 89 c2 48 c7 c7 28 f1 dd a4 e8 c8 04 6c ff <8b> 34 25 00 00 00 00 48 c7 c7 91 51 e7 a4 e8 b5 04 6c ff 48 8b 3d
[ 1.391688] RSP: 0000:ffffa711c031ff50 EFLAGS: 00010246
[ 1.392361] RAX: 0000000000000000 RBX: ffffffffa47ee31a RCX: 0000000000000000
[ 1.393242] RDX: 0000000000000005 RSI: ffff994e7d43d018 RDI: 0000000000000000
[ 1.394043] RBP: 0000000000000000 R08: 0000000000024de0 R09: ffffffffa3fa0a90
[ 1.394903] R10: ffffde1e81f50f40 R11: 0000000000048209 R12: 0000000000000000
[ 1.395710] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 1.396563] FS: 0000000000000000(0000) GS:ffff994e7da00000(0000) knlGS:0000000000000000
[ 1.397470] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.398125] CR2: 0000000000000000 CR3: 000000004880a000 CR4: 00000000000006f0
[ 1.398914] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1.399706] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1.400541] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 1.401512] Kernel Offset: 0x22e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1.402589] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
In the begining of this error, there is an address kernel_init+0x77
, lets use gdb to find out where this error is by using the comand l *(kernel_init+0x77)
:
$ gdb vmlinux
(gdb) l *(kernel_init+0x77)
0xffffffff819ee391 is in kernel_init (init/main.c:1100).
1095 ramdisk_execute_command, ret);
1096 }
1097
1098 int *ops = NULL;
1099
1100 printk("%d", *ops);
1101
1102 /*
1103 * We try each of these until one succeeds.
1104 *
Read for more details http://helenfornazier.blogspot.com.br/2015/07/linux-kernel-memory-corruption-debug.html
Done?¶
Now, instead or memorizing all these commands and parameters by heart, write a script to optimize your development, check an example here:
https://github.com/helen-fornazier/mug-scripts/blob/master/mug-kern-vm.sh
Recommended Reading: Linux Device Drivers, Chapter 4: Debugging Techniques
Don’t forget to update the spreadsheet for tracking our progress.