Debugging kmalloc-64 slab allocations / memory leak

I have two virtual servers that are configured very similarly: Debian bullseye (testing), 5.6.0 kernel, 512 MB RAM. They’re both running a similar workload: MySQL, PowerDNS, WireGuard, dnstools.ws worker, and Netdata for monitoring.

On just one of the servers, the unreclaimable slab memory is growing linearly over time, until it hits a maximum (when the server’s memory is 100% allocated):

The drop is when I rebooted the VPS.

slabtop on the bad server:

 Active / Total Objects (% used)    : 1350709 / 1363259 (99.1%)
 Active / Total Slabs (% used)      : 25358 / 25358 (100.0%)
 Active / Total Caches (% used)     : 96 / 124 (77.4%)
 Active / Total Size (% used)       : 113513.48K / 117444.72K (96.7%)
 Minimum / Average / Maximum Object : 0.01K / 0.09K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
1173504 1173504 100%    0.06K  18336       64     73344K kmalloc-64
 22512  18020  80%    0.19K   1072       21      4288K dentry
 16640  16640 100%    0.12K    520       32      2080K kernfs_node_cache
 15872  15872 100%    0.01K     31      512       124K kmalloc-8
 15616  15616 100%    0.03K    122      128       488K kmalloc-32
 11904  11283  94%    0.06K    186       64       744K anon_vma_chain
 10218   9953  97%    0.59K    786       13      6288K inode_cache
  9867   8354  84%    0.10K    253       39      1012K buffer_head
  9272   9111  98%    0.20K    488       19      1952K vm_area_struct
  6808   6808 100%    0.09K    148       46       592K anon_vma
  5632   5632 100%    0.02K     22      256        88K kmalloc-16
  5145   5145 100%    0.19K    245       21       980K kmalloc-192
  4845   3758  77%    1.05K    651       15     10416K ext4_inode_cache
  4830   3795  78%    0.57K    347       14      2776K radix_tree_node
  4144   3380  81%    0.25K    259       16      1036K filp
  3825   3825 100%    0.05K     45       85       180K ftrace_event_field
  3584   3072  85%    0.03K     28      128       112K ext4_pending_reservation
  2448   2448 100%    0.04K     24      102        96K ext4_extent_status
  2368   1058  44%    0.06K     37       64       148K vmap_area

slabtop on the good server:

 Active / Total Objects (% used)    : 236705 / 264128 (89.6%)
 Active / Total Slabs (% used)      : 7119 / 7119 (100.0%)
 Active / Total Caches (% used)     : 96 / 124 (77.4%)
 Active / Total Size (% used)       : 31422.39K / 36341.06K (86.5%)
 Minimum / Average / Maximum Object : 0.01K / 0.14K / 8.00K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
 35551  27123  76%    0.05K    487       73      1948K buffer_head
 30368  19821  65%    0.12K    949       32      3796K dentry
 19872  19872 100%    0.12K    621       32      2484K kmalloc-128
 18666  18666 100%    0.08K    366       51      1464K kernfs_node_cache
 17664  17664 100%    0.02K     69      256       276K kmalloc-16
 16128  16128 100%    0.03K    126      128       504K anon_vma_chain
 15872  15872 100%    0.01K     31      512       124K kmalloc-8
 11136  11136 100%    0.03K     87      128       348K kmalloc-32
 10330  10112  97%    0.38K   1033       10      4132K inode_cache
  9945   9597  96%    0.10K    255       39      1020K vm_area_struct
  7798   4753  60%    0.71K    710       11      5680K ext4_inode_cache
  7008   7008 100%    0.05K     96       73       384K anon_vma
  5859   5668  96%    0.19K    279       21      1116K filp
  5120   2894  56%    0.03K     40      128       160K jbd2_revoke_record_s
  4680   3510  75%    0.30K    360       13      1440K radix_tree_node
  4224   4224 100%    0.03K     33      128       132K vmap_area
  3392   3392 100%    0.06K     53       64       212K kmalloc-64
  3264   3264 100%    0.04K     32      102       128K jbd2_inode
  3145   3145 100%    0.05K     37       85       148K trace_event_file

Noticeably, there’s something allocating a lot of kmalloc-64 slabs.

How do I debug what’s causing these allocations?

Answer

Read through some of the kernel docs and figured out I could add slub_debug=U to the GRUB Kernel command line to track slab allocations.

After doing this on both servers and rebooting both of them, something that stood out in /sys/kernel/slab/kmalloc-64/alloc_calls on the ‘bad’ server that wasn’t present on the ‘good’ server at all is this:

  13212 kvm_async_pf_task_wake+0x6e/0x100 age=3329/480991/1729176 pid=0-13255

Searched around and I found a post where someone encountered exactly the same problem: https://darkimmortal.com/debian-10-kernel-slab-memory-leak/. This post documents a workaround of adding no-kvmapf to the Linux command line. Not sure if this will have any other side effects.

Attribution
Source : Link , Question Author : Daniel Lo Nigro , Answer Author : Daniel Lo Nigro

Leave a Comment