Index ¦ Archives ¦ Atom

Kernel livepatching on Fedora part 2

Introduction

This is part of a serial experiment to get familiar with kernel livepatching. The blog post aims to demonstrate how kernel livepatching could be useful to address problems where a kernel reboot is not an option and other options cannot be used.

The title is based on the Phoronix article [1] for the fix as noted in [2].

Let's see if the fix can be applied via livepatching to an example system.

Transparent HugePages (THP) are a mixed bag

PostgreSQL for example will ask on startup to disable THP support [4]. This is usually achievable by doing [3]:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

To persist the change across reboots add to /etc/default/grub:

transparent_hugepage=never

And then run the appropriate command with the right arguments to regenerate the grub configuration (update-grub or grub2-mkconfig)

While THP is enabled by default, a simple solution (if running on an affected kernel) would be to try and see if disabling hugepages helps in a non-production environment.

Figuring out if it's possible to livepatch a function

Inlined functions obviously can't be patched. One easy way to figure out if a function is [live]patchable is to look at the kernel symbols endpoint:

cat /proc/kallsyms | grep "__get_unmapped_area"
0000000000000000 T __pfx___get_unmapped_area
0000000000000000 T __get_unmapped_area

As we can see, __get_unmapped_area is listed so it's a good bet it's going to be possible to replace it.

non-exhaustive list of affected kernels

To demonstrate livepatching it is necessary to find a non-patched version.

The issue seems to affect unpatched kernels from 6.7 to 6.11 and up to 6.12rc5. Due to the number of commits between the bad commit and the fix:

git log efa7df3e3bb5..d4148ae | wc -l
1673937

For the sake of brevity I am going to be fuzzy about the exact kernels which are affected. The bad commit is from 2023-12-14 [5] so 6.7+ seems affected (which matches the bug reports [7] from the commit that fixes the issue [2]). The following command is useful [6] to figure out the release dates for specific tags:

git log --simplify-by-decoration --tags --pretty='tformat:%C(auto)%h %as%d %s'

To figure out which of the minor releases fixes the issue one can clone a specific branch of the stable kernel:

git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git --depth=2000 -b linux-6.11.y  linux-stable-6.11

Then rerun git log --simplify-by-decoration --tags --pretty='tformat:%C(auto)%h %as%d %s' From the tag list, find the tags released after the date the patch was submitted to the kernel list. For example in this case I would use git log v6.11.5..v6.11.8 --oneline >> log.txt and then check the log.txt file to see if the string "limit THP alignment" can be found. Based on that we can see that v6.11.7 contains the fix.

Which kernel to use to demonstrate how to fix the issue with livepatching?

One option would be to downgrade the kernel on fedora 41 to v6.11.6 and use livepatching to see the difference. Another option is to use ubuntu since one of the affected distributions is Ubuntu 24.04 which uses a 6.8 kernel. Note that Ubuntu has livepatching included as part of its Ubuntu Pro offering [7].

Note that 6.8 is end of life so there is a bit more work to figure out if the fix has been backported:

git clone https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ -b linux-rolling-stable linux-rolling-stable
cd linux-rolling-stable
git tag | grep "v6.8"
git checkout v6.8.12

It's possible to see that the original commit (git show efa7df3e3bb5) is there but the fix isn't (which can be confirmed by looking at mm/mmap.c). Let's check on Ubuntu 24.04 what the linux-source package says:

apt-get -y install linux-source
cd /usr/src/linux-6.8.0/
tar -xvf linux-source-6.8.0.tar.bz2
cd /usr/src/linux-source-6.8.0/linux-source-6.8.0
cat mm/mmap.c

Sure enough, the fix isn't there. The kernel version I am using is:

uname -a
Linux ubuntu-s-1vcpu-1gb-ams3-01 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

And the linux-source I am looking at is:

dpkg -l | grep linux-source
ii  linux-source                    6.8.0-48.48                             all          Linux kernel source with Ubuntu patches
ii  linux-source-6.8.0              6.8.0-48.48                             all          Linux kernel source for version 6.8.0 with Ubuntu patches

This gives a pretty good indication that the issue isn't patched on ubuntu 24.04.

Using gdb to confirm what the function looks like on the running kernel

The fix adds a condition based on the macro IS_ALIGNED defined in include/linux/align.h. Now it's possible to figure out using theory what the assembly should look like in the fixed version. Personally, it's easier to look at the dissassembled function in gdb on a fixed kernel and compare it to an unfixed version.

References

[1] https://www.phoronix.com/news/Intel-Linux-3888.9-Performance [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d4148aeab412432bf928f311eca8a2ba52bb05df [3] https://docs.kernel.org/admin-guide/mm/transhuge.html [4] https://www.postgresql.org/docs/current/runtime-config-resource.html [5] https://xkcd.com/1179/ [6] https://stackoverflow.com/a/64173940 [7] https://ubuntu.com/security/livepatch

© Bruno Henc. Built using Pelican. Theme by Giulio Fidente on github.