Nov 26, 2025
I have a small board farm, with one of every device I have deployed, connected to many of the same sensors I have deployed, which I use to test new builds before deploying them. I've had too many cases where an update will cause a device to fail (some configuration does not align with some new software) or fail to be reachable (a network device's name will change and the interface won't come up) therefore I want to be able to test updates on devices within easy reach before deploying to locations that would be harder to reach physically.
The devices are mostly Rockchip-based deivces: rock-pi-e, rock-pi-s, and radxa zero-3e. Each one is connected to a different set of various sensors, mostly from Adafruit, that sense properties such as temperature, humidity, light, and barometric pressure. Sensors such as the bh1750, sht30 (indoor), sht30 (outdoor), shtc3, aht20, lps25, mcp3002, ds18b20 (reg temp), and ds18b20 (high temp) (plus some experiments with others).
A "server" device (rock 5b) polls each unit for its data, and this data (along with the poll latency) is recorded on the server. Recently I started adding more data to the list of values each device was reporting such as cpu temperature, uptime, loadavg, and various pieces of memory (RAM) information. Luckily I started adding RAM information just in time before one of my devices performed a kernel panic due to running out of memory (provided below).
Most of the boards can be purchased with various RAM amounts. For example the rock-pi-s can either have 256MB or 512MB of RAM. The boards I bought have 256MB. The rock-pi-e and zero-3e devices I use come with 1GB of RAM.
I had noticed memory issues with the latest kernel (6.17) on my rock-pi-s devices and decided to configure them to continue using the 6.12 kernel. systemd has a tendency to use a lot more RAM than sysvinit, therefore I had made some tweaks to the amount of memory it is allowed to use. In order to test to see if these changes had a positive effect on these devices, I updated the one in my board farm to use the 6.17 kernel again. Unfortunately the rock-pi-s device in my board farm still throws an out-of-memory kernel panic running the 6.17 kernel despite my systemd RAM-usage tweaks.
It is worth pointing out that in all of my tests below, the only difference between the updates that encountered issues and those that did not, is the version of the kernel. All of the user-space software remained the same, and is run in the same way. The only differences between one update and the previous is that the kernel went from 6.12.55 to 6.17.6.
Here is a graph of the memory use of my test-rockpis boardfarm system (with 256MB of RAM) running the 6.12.55 kernel pretty much with the in-kernel defconfig. In this graph the system ran for almost 16 days. It didn't crash, after 16 days I simply uploaded a new RAUC bundle. The purple line is the total memory (230428), the blue line is the amount of available memory, and the green line is the amount of free memory:
Here's a graph of the same device running the next update, also running a 6.12.55 kernel, which ran for almost 22 days:
For the next update the only change I made was to switch from the 6.12.55 kernel to the 6.17.6 kernel. The total memory decreased from 230428 to 229636:
It only ran for about 4.5 days before issuing the following kernel panic:
[508644.063311] Kernel panic - not syncing: System is deadlocked on memory [508644.063928] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.17.6-yocto-standard-00102-g5a817ec7a7 96-dirty #1 PREEMPT [508644.064898] Hardware name: Radxa ROCK Pi S (DT) [508644.065319] Call trace: [508644.065557] show_stack+0x18/0x30 (C) [508644.065921] dump_stack_lvl+0x60/0x80 [508644.066277] dump_stack+0x18/0x24 [508644.066602] vpanic+0x124/0x2e8 [508644.066909] abort+0x0/0x4 [508644.067176] out_of_memory+0x560/0x580 [508644.067540] __alloc_frozen_pages_noprof+0xc24/0xcf4 [508644.068007] alloc_pages_mpol+0xb4/0x1a4 [508644.068384] alloc_frozen_pages_noprof+0x44/0xc0 [508644.068822] new_slab+0x328/0x3b0 [508644.069142] ___slab_alloc+0x5dc/0x9c0 [508644.069503] __slab_alloc.isra.0+0x34/0x68 [508644.069893] __kmalloc_cache_noprof+0x168/0x2c0 [508644.070322] copy_verifier_state+0x1bc/0x1f8 [508644.070732] push_stack+0x7c/0x100 [508644.071062] check_cond_jmp_op+0x3d8/0x13cc [508644.071463] do_check_common+0x28ac/0x2cec [508644.071856] bpf_check+0x247c/0x3220 [508644.072204] bpf_prog_load+0x620/0xbe0 [508644.072564] __sys_bpf+0x7b8/0x205c [508644.072902] __arm64_sys_bpf+0x20/0x30 [508644.073264] invoke_syscall.constprop.0+0x40/0xf0 [508644.073712] el0_svc_common.constprop.0+0x38/0xd8 [508644.074160] do_el0_svc+0x1c/0x28 [508644.074485] el0_svc+0x34/0xe8 [508644.074789] el0t_64_sync_handler+0xa0/0xe4 [508644.075189] el0t_64_sync+0x198/0x19c [508644.075545] SMP: stopping secondary CPUs [508644.076001] Kernel Offset: 0x391170c00000 from 0xffff800080000000 [508644.076560] PHYS_OFFSET: 0xfff1000000000000 [508644.076952] CPU features: 0x000000,00010000,20002000,0400421b [508644.077484] Memory Limit: none [508644.077787] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---
For comparison, here is a plot of the memory usage of my rock-pi-e boardfarm device (1GB RAM) using a 6.17 kernel running for over 21 days:
[purple=total blue=available green=free]
Here's a plot of a zero-3e device (1GB RAM) on a 6.17 kernel running for the same 21-22 days:
Nov 27, 2025
Today I noticed that my "server" device (4GB RAM, rock-5b, on the 6.17 kernel) did not throw a kernel panic, but it went crazy momentarily (loadavg very high) after running for almost 9 days (it is on a different update schedule than the devices in the boardfarm) and the log shows that it invoked the OOM killer aggressively:
Nov 27 05:28:38 server kernel: Out of memory: Killed process 2244 (systemd) total-vm:17296kB, anon-rss:1792kB, file-rss:608kB, shmem-rss:0kB, UID:0 pgtables:72kB oom_score_adj:100 Nov 27 05:28:37 server (sd-pam)[2247]: pam_systemd(systemd-user:session): Failed to release session: No session '1' known Nov 27 05:41:55 server kernel: Out of memory: Killed process 2985663 (png_shelly_humi) total-vm:36624kB, anon-rss:15360kB, file-rss:1636kB, shmem-rss:0kB, UID:0 pgtables:112kB oom_score_a> Nov 27 05:55:45 server kernel: Out of memory: Killed process 3047554 (png_temp_1w_out) total-vm:32000kB, anon-rss:10752kB, file-rss:444kB, shmem-rss:0kB, UID:0 pgtables:104kB oom_score_ad> Nov 27 05:56:35 server kernel: Out of memory: Killed process 3050620 (png_light_separ) total-vm:31824kB, anon-rss:10112kB, file-rss:852kB, shmem-rss:0kB, UID:0 pgtables:100kB oom_score_ad> Nov 27 05:59:19 server kernel: Out of memory: Killed process 3053441 (convert) total-vm:20760kB, anon-rss:3328kB, file-rss:812kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 Nov 27 05:59:20 server kernel: Out of memory: Killed process 3053435 (convert) total-vm:20760kB, anon-rss:3328kB, file-rss:636kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 Nov 27 05:59:22 server kernel: Out of memory: Killed process 3053443 (convert) total-vm:20760kB, anon-rss:3328kB, file-rss:656kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 Nov 27 05:59:22 server kernel: Out of memory: Killed process 3053436 (convert) total-vm:20760kB, anon-rss:3456kB, file-rss:632kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 Nov 27 05:59:22 server kernel: Out of memory: Killed process 3053439 (convert) total-vm:20900kB, anon-rss:3500kB, file-rss:516kB, shmem-rss:0kB, UID:0 pgtables:80kB oom_score_adj:0 Nov 27 05:59:22 server kernel: Out of memory: Killed process 3053438 (convert) total-vm:31900kB, anon-rss:14716kB, file-rss:416kB, shmem-rss:0kB, UID:0 pgtables:100kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646649 (convert) total-vm:23604kB, anon-rss:6144kB, file-rss:132kB, shmem-rss:0kB, UID:0 pgtables:84kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646648 (convert) total-vm:22028kB, anon-rss:4648kB, file-rss:156kB, shmem-rss:0kB, UID:0 pgtables:88kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646646 (convert) total-vm:31900kB, anon-rss:9968kB, file-rss:404kB, shmem-rss:0kB, UID:0 pgtables:92kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646643 (convert) total-vm:31900kB, anon-rss:14252kB, file-rss:116kB, shmem-rss:0kB, UID:0 pgtables:100kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646642 (convert) total-vm:31900kB, anon-rss:14360kB, file-rss:40kB, shmem-rss:0kB, UID:0 pgtables:100kB oom_score_adj:0 Nov 27 12:33:05 server kernel: Out of memory: Killed process 646645 (convert) total-vm:31900kB, anon-rss:14288kB, file-rss:84kB, shmem-rss:0kB, UID:0 pgtables:108kB oom_score_adj:0
This system did not panic and halt, but clearly was demonstrating memory issues.
This is not an issue that is unique to the rock-pi-s board that I was using, it's just that, with only 256MB of RAM, this is where the issue was spotted the earliest. The graphs above (from Nov 26) where my 1GB devices are running the 6.17.6 kernel are (in retrospect) showing the same problem I saw on the rock-pi-s, they're just doing it a lot slower, to the point where the graphs look better than the graph for the 256MB device.
Arnd, on #armlinux, suggested I start tracking slabinfo, slabtop -s c, and lsof. I did. He also suggested I flush the cache before recording any information:
if you run 'echo 2 > /proc/sys/vm/drop_caches' to free all reclaimable slab caches before looking at /proc/slabinfo, you can rule out the ones that would be freed before oom
Dec 01, 2025
After letting my system run for a couple days, I started by running the cache flush line as recommended by Arnd, then:
# slabtop -s c Active / Total Objects (% used) : 483354 / 495272 (97.6%) Active / Total Slabs (% used) : 19450 / 19450 (100.0%) Active / Total Caches (% used) : 108 / 161 (67.1%) Active / Total Size (% used) : 89564 / 92540 (96.8%) Minimum / Average / Maximum Object : 0.01K / 0.19K / 8.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 218820 218820 100% 0.19K 10420 21 41680 filp 60564 60555 99% 0.19K 2884 21 11536 kmalloc-cg-192 17136 16958 98% 0.25K 1071 16 4284 maple_node 20958 20841 99% 0.19K 998 21 3992 kmalloc-192 5575 5575 100% 0.62K 223 25 3568 debugfs_inode_cache ...
This indicates that the filp structure in the kernel thinks that there are 218,820 "active objects" (i.e. files) being used by the system. However:
# lsof -n | wc -l 1260
lsof is only aware of 1,260 opened files being used by the system; a bit of a discrepancy. Thanks to Arnd for helping me interpret this information.
Dec 02, 2025
On #kernelnewbies, silurian_invader suggested I enable CONFIG_DEBUG_KMEMLEAK. I did, updated my system, and rebooted (Yocto makes this cycle so easy!).
After about 2 hours I did the drop_caches thing, then:
# cat /sys/kernel/debug/kmemleak | grep "^unreferenced object" | wc -l 6324
After only 2 hours the kernel is already aware of 6,324 unreferenced kernel objects! One of which looks like:
unreferenced object 0xffff00000246b6c0 (size 192): comm "systemd", pid 1, jiffies 4294893229 hex dump (first 32 bytes): 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 133687fb): kmemleak_alloc+0x38/0x44 kmem_cache_alloc_noprof+0x214/0x300 prepare_creds+0x24/0x338 copy_creds+0x2c/0x1c0 copy_process+0x354/0x14b4 kernel_clone+0x68/0x36c __do_sys_clone3+0xe0/0x140 __arm64_sys_clone3+0x14/0x20 invoke_syscall.constprop.0+0x40/0xf0 el0_svc_common.constprop.0+0x38/0xd8 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xe8 el0t_64_sync_handler+0xa0/0xe4 el0t_64_sync+0x198/0x19c
I'm guessing the answer lies in examining the backtraces. Most of them look quite similar, I'll try to summarize them to see where the similarities and differences show up.
On #mm dhansen and heat had a look. One thing that was suggested was:
# cat /sys/kernel/debug/kmemleak | sort | uniq -c | sort -rn | head -n50 22356 kmemleak_alloc+0x38/0x44 22356 invoke_syscall.constprop.0+0x40/0xf0 22356 el0t_64_sync_handler+0xa0/0xe4 22356 el0_svc+0x34/0xe8 22356 do_el0_svc+0x1c/0x28 22277 el0_svc_common.constprop.0+0xb8/0xd8 21193 hex dump (first 32 bytes): 21193 kmem_cache_alloc_noprof+0x214/0x300 19464 el0t_64_sync+0x198/0x19c 16704 path_openat+0x48/0xfd0 16704 do_filp_open+0xa8/0x170 16704 alloc_empty_file+0x54/0x11c 16704 00 00 00 00 1d 80 4a 0c 40 3e b4 44 47 a0 ff ff ......J.@>.DG... 12550 do_sys_openat2+0x90/0xf8 12550 __arm64_sys_openat+0x68/0xc0 6179 __arm64_sys_execve+0x40/0x5c 4917 do_execveat_common.isra.0+0x1a0/0x1e0 4489 prepare_creds+0x24/0x338 4489 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 4154 do_open_execat+0x64/0x16c 3675 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 2892 open_exec+0x30/0x70 2892 bprm_execve+0x280/0x4a4 2077 load_elf_binary+0x1d8/0x1578 2077 98 29 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .).............. 2076 a8 43 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .C.............. 2025 prepare_exec_creds+0x14/0x58 2025 bprm_execve+0x44/0x4a4 1262 do_execveat_common.isra.0+0x6c/0x1e0 1262 alloc_bprm+0x28/0x240 1230 set_current_groups+0x1c/0x90 1230 __arm64_sys_setgroups+0x16c/0x23c 1228 18 6f a2 02 00 00 ff ff 00 00 00 00 00 00 00 00 .o.............. 1227 00 48 5e 05 00 00 ff ff 00 00 00 00 00 00 00 00 .H^............. 1163 __kvmalloc_node_noprof+0x3ac/0x4fc 1163 __arm64_sys_setgroups+0x8c/0x23c 846 18 6f 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .o.............. 834 e0 13 ba 02 00 00 ff ff 00 00 00 00 00 00 00 00 ................ 822 kernel_clone+0x68/0x36c 822 copy_process+0x354/0x14b4 822 copy_creds+0x2c/0x1c0 815 load_script+0x1fc/0x2e0 807 __do_sys_clone+0x70/0xb4 807 __arm64_sys_clone+0x1c/0x28 791 hex dump (first 8 bytes): 432 c8 77 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .w.............. 432 10 62 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .b.............. 430 c0 6a 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .j.............. 429 b0 50 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 .P.............. 428 60 59 96 01 00 00 ff ff 00 00 00 00 00 00 00 00 `Y..............
Dec 04, 2025
At this point I am quite certain that the problem is in the linux-yocto kernel specifically, and not with upstream linux-stable (on which linux-yocto is based). I created a linux-stable recipe that I used to work with kernels directly from upstream linux-stable (without any Yocto patches applied), and I was unable to get any of them to demonstrate the leaking memory.
If you are running a 6.17-based linux-yocto kernel on your device, try out the slabtop -s c command demonstrated above. If the top item is filp, your kernel is leaking memory. Specifically it is leaking open file structures and not closing and cleaning them up properly.
Now the bisection begins...
Dec 06, 2026
Bisecting linux-yocto, in my spare time, I landed on https://git.yoctoproject.org/linux-yocto/commit/?h=v6.17/standard/base&id=ca4826d81209e0cd0a5521dbdb194de3a40ec650
This patch is part of a series to add aufs support to the linux-yocto kernel. Aufs is one instance of a set of separate solutions to provide union mount capabilities to Linux; the others being UnionFS and OverlayFS. UnionFS was the original, then aufs and OverlayFS came along. Aufs, however, was never merged into the kernel. Several distributions include the out-of-kernel aufs, and, seemingly, the linux-yocto kernel does as well but these patches are not maintained with the rest of the kernel and appear to have developed a leak.
Filesystem union mounts allows you to take two directories (for example), from two completely different filesystems, and mount them on top of each other so that they are presented to the user as one filesystem containing the union of the individual files on which it is built. Performing filesystem union mounts is useful in several situations. For example, if you have a read-only filesystem (e.g. SquashFS) but you need to write to it from time to time, you can mount a second, writable, filesystem on top of it to make it writable. You could configure it so that the writes end up in the second filesystem, leaving the underlying RO filesystem unchanged. Union filesystem mounts are also useful for A/B update mechanisms. In a full-disk A/B update system (such as RAUC can be configured to do) the entire filesystem is updated on each update, and you boot from one partition or the other. Therefore, for example, if you add a user and set their password (which is stored in /etc) then update and boot into the new bundle, those changes to /etc only exist on the old partition and are lost. Configuring a union overlay such that changes such as these are stored on a non-updated data partition then overlaid onto whichever partition's filesystem is currently running, means any configuration changes performed in one bundle are available in the next.
In any case this out-of-kernel patch set will need some attention, or it will need to be dropped from linux-yocto.
WARNING: Speculation, more investigation needed
I did not notice any kernel memory leaks with linux-yocto when running the previous linux-yocto kernel: 6.12. Maybe it was there but just slower? I don't know and will need to investigate.
Also, note that I am not (currently) using any union/overlay filesystem mechanism in my devices, including aufs. In fact, I don't even think it is enabled in my 6.17 kernel. So it appears as though simply having this patch in the kernel source tree (even without enabling or using it in any way) is enough to trigger this leak.
























