Skip to content

内存错误检测

KASAN — 内核地址消毒剂

KASAN(Kernel Address SANitizer)是最强大的内存错误检测工具,能检测:

  • 堆越界访问(heap out-of-bounds)
  • 栈越界访问(stack out-of-bounds)
  • 释放后使用(use-after-free)
  • 释放后返回(use-after-return)

启用 KASAN

CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y   # 通用模式(x86/arm64)
# 或
CONFIG_KASAN_SW_TAGS=y   # 软件标签模式(arm64,性能更好)

KASAN 报告解读

==================================================================
BUG: KASAN: heap-out-of-bounds in my_driver_write+0x45/0x120
Write of size 4 at addr ffff888012345678 by task my_app/1234

CPU: 2 PID: 1234 Comm: my_app
Call Trace:
 dump_stack+0x6b/0x8b
 print_address_description+0x1f/0x1f0
 kasan_report+0x138/0x160
 my_driver_write+0x45/0x120
 vfs_write+0xb5/0x1f0

Allocated by task 1234:
 kmalloc+0x1f/0x30
 my_driver_probe+0x89/0x200

The buggy address belongs to the object at ffff888012345670
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 8 bytes to the right of
 64-byte region [ffff888012345670, ffff888012345670+0x40)
==================================================================

分析:在 my_driver_write 中,向 64 字节缓冲区的第 72 字节(偏移 8 字节越界)写入了 4 字节。

常见 KASAN 错误类型

heap-out-of-bounds    — 堆内存越界
stack-out-of-bounds   — 栈内存越界
use-after-free        — 释放后使用
use-after-return      — 函数返回后使用栈变量
global-out-of-bounds  — 全局变量越界

KFENCE — 轻量级内存错误检测

KFENCE(Kernel Electric Fence)是 KASAN 的轻量替代,适合生产环境:

CONFIG_KFENCE=y
CONFIG_KFENCE_SAMPLE_INTERVAL=100   # 每 100ms 保护一个对象
bash
# 查看 KFENCE 统计
cat /sys/kernel/debug/kfence/stats

# 查看错误报告
cat /sys/kernel/debug/kfence/objects

kmemleak — 内存泄漏检测

CONFIG_DEBUG_KMEMLEAK=y
bash
# 触发扫描
echo scan > /sys/kernel/debug/kmemleak

# 查看泄漏报告
cat /sys/kernel/debug/kmemleak

# 清除已知泄漏(重新开始追踪)
echo clear > /sys/kernel/debug/kmemleak

报告示例:

unreferenced object 0xffff888012345678 (size 64):
  comm "my_app", pid 1234, jiffies 4294967295
  backtrace:
    kmalloc+0x1f/0x30
    my_driver_open+0x45/0x80    ← 分配点
    chrdev_open+0x89/0x200
    do_open+0x1f/0x30

分析my_driver_open 中分配的 64 字节内存没有被释放(my_driver_release 中忘记 kfree)。

UBSAN — 未定义行为检测

CONFIG_UBSAN=y
CONFIG_UBSAN_SANITIZE_ALL=y

检测:整数溢出、数组越界、空指针解引用、未对齐访问等。

UBSAN: Undefined behaviour in drivers/mydriver/my_driver.c:42:15
signed integer overflow:
2147483647 + 1 cannot be represented in type 'int'

lockdep — 死锁检测

CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCKDEP=y

lockdep 在运行时追踪所有锁的获取顺序,检测潜在死锁:

WARNING: possible circular locking dependency detected
my_driver/1234 is trying to acquire lock:
 (&priv->lock){+.+.}, at: my_driver_write+0x45

but task is already holding lock:
 (&dev->mutex){+.+.}, at: my_driver_ioctl+0x23

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #1 (&dev->mutex){+.+.}:
       my_driver_ioctl+0x23
-> #0 (&priv->lock){+.+.}:
       my_driver_write+0x45

Possible unsafe locking scenario:
  CPU0                    CPU1
  ----                    ----
  lock(&dev->mutex);
                          lock(&priv->lock);
                          lock(&dev->mutex);  ← 等待 CPU0
  lock(&priv->lock);      ← 等待 CPU1
                          DEADLOCK

综合调试配置

开发阶段推荐的内核配置组合:

# 内存调试
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_SLUB_DEBUG=y
CONFIG_DEBUG_PAGEALLOC=y

# 锁调试
CONFIG_LOCKDEP=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y

# 通用调试
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_INFO=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_DYNAMIC_DEBUG=y
CONFIG_FRAME_POINTER=y

# 注意:以上配置会显著降低性能,仅用于开发调试

褚成志的笔记