I have two virtual servers that are configured very similarly: Debian bullseye (testing), 5.6.0 kernel, 512 MB RAM. They’re both running a similar workload: MySQL, PowerDNS, WireGuard, dnstools.ws worker, and Netdata for monitoring.
On just one of the servers, the unreclaimable slab memory is growing linearly over time, until it hits a maximum (when the server’s memory is 100% allocated):
The drop is when I rebooted the VPS.
slabtop
on the bad server:Active / Total Objects (% used) : 1350709 / 1363259 (99.1%) Active / Total Slabs (% used) : 25358 / 25358 (100.0%) Active / Total Caches (% used) : 96 / 124 (77.4%) Active / Total Size (% used) : 113513.48K / 117444.72K (96.7%) Minimum / Average / Maximum Object : 0.01K / 0.09K / 8.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 1173504 1173504 100% 0.06K 18336 64 73344K kmalloc-64 22512 18020 80% 0.19K 1072 21 4288K dentry 16640 16640 100% 0.12K 520 32 2080K kernfs_node_cache 15872 15872 100% 0.01K 31 512 124K kmalloc-8 15616 15616 100% 0.03K 122 128 488K kmalloc-32 11904 11283 94% 0.06K 186 64 744K anon_vma_chain 10218 9953 97% 0.59K 786 13 6288K inode_cache 9867 8354 84% 0.10K 253 39 1012K buffer_head 9272 9111 98% 0.20K 488 19 1952K vm_area_struct 6808 6808 100% 0.09K 148 46 592K anon_vma 5632 5632 100% 0.02K 22 256 88K kmalloc-16 5145 5145 100% 0.19K 245 21 980K kmalloc-192 4845 3758 77% 1.05K 651 15 10416K ext4_inode_cache 4830 3795 78% 0.57K 347 14 2776K radix_tree_node 4144 3380 81% 0.25K 259 16 1036K filp 3825 3825 100% 0.05K 45 85 180K ftrace_event_field 3584 3072 85% 0.03K 28 128 112K ext4_pending_reservation 2448 2448 100% 0.04K 24 102 96K ext4_extent_status 2368 1058 44% 0.06K 37 64 148K vmap_area
slabtop
on the good server:Active / Total Objects (% used) : 236705 / 264128 (89.6%) Active / Total Slabs (% used) : 7119 / 7119 (100.0%) Active / Total Caches (% used) : 96 / 124 (77.4%) Active / Total Size (% used) : 31422.39K / 36341.06K (86.5%) Minimum / Average / Maximum Object : 0.01K / 0.14K / 8.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 35551 27123 76% 0.05K 487 73 1948K buffer_head 30368 19821 65% 0.12K 949 32 3796K dentry 19872 19872 100% 0.12K 621 32 2484K kmalloc-128 18666 18666 100% 0.08K 366 51 1464K kernfs_node_cache 17664 17664 100% 0.02K 69 256 276K kmalloc-16 16128 16128 100% 0.03K 126 128 504K anon_vma_chain 15872 15872 100% 0.01K 31 512 124K kmalloc-8 11136 11136 100% 0.03K 87 128 348K kmalloc-32 10330 10112 97% 0.38K 1033 10 4132K inode_cache 9945 9597 96% 0.10K 255 39 1020K vm_area_struct 7798 4753 60% 0.71K 710 11 5680K ext4_inode_cache 7008 7008 100% 0.05K 96 73 384K anon_vma 5859 5668 96% 0.19K 279 21 1116K filp 5120 2894 56% 0.03K 40 128 160K jbd2_revoke_record_s 4680 3510 75% 0.30K 360 13 1440K radix_tree_node 4224 4224 100% 0.03K 33 128 132K vmap_area 3392 3392 100% 0.06K 53 64 212K kmalloc-64 3264 3264 100% 0.04K 32 102 128K jbd2_inode 3145 3145 100% 0.05K 37 85 148K trace_event_file
Noticeably, there’s something allocating a lot of
kmalloc-64
slabs.How do I debug what’s causing these allocations?
Answer
Read through some of the kernel docs and figured out I could add slub_debug=U
to the GRUB Kernel command line to track slab allocations.
After doing this on both servers and rebooting both of them, something that stood out in /sys/kernel/slab/kmalloc-64/alloc_calls
on the ‘bad’ server that wasn’t present on the ‘good’ server at all is this:
13212 kvm_async_pf_task_wake+0x6e/0x100 age=3329/480991/1729176 pid=0-13255
Searched around and I found a post where someone encountered exactly the same problem: https://darkimmortal.com/debian-10-kernel-slab-memory-leak/. This post documents a workaround of adding no-kvmapf
to the Linux command line. Not sure if this will have any other side effects.
Attribution
Source : Link , Question Author : Daniel Lo Nigro , Answer Author : Daniel Lo Nigro