實例演示 - 用 Kdump 分析內核奔潰原因
本文主要介紹 kdump 服務和 crash 的使用,並結合一個簡單的實例演示如何分析內核奔潰的原因。本文基於 linux kernel 4.19, 體系結構爲 aarch64。
kdump 概述
- kdump
kdump 是一種先進的基於 kexec 的內核崩潰轉儲機制,用來捕獲 kernel crash(內核崩潰)的時候產生的 crash dump。當內核產生錯誤時,kdump 會將內存導出爲 vmcore 保存到磁盤。
- kdump 流程
當系統崩潰時,kdump 使用 kexec 啓動到第二個內核。第二個內核通常叫做捕獲內核,以很小內存啓動以捕獲轉儲鏡像。第一個內核啓動時會保留一段內存給 kdump 用。
- kdump 的配置
- 系統啓動時爲 crashkernel 保留內存
可以在 kernel command line 中加入如下參數:crashkernel=size[@offset]。保留內存是否預留成功,可以通過 cat /proc/meminfo 查看。。
cat /proc/meminfo | grep Crash
- 安裝 kexec-toools
yum install kexec-tools
kexec-tool 推薦使用 rpm 方式安裝,使用時需要和內核版本配套。
- 啓動 kdump 服務
systemctl start kdump.service // 啓動 kdump 服務
service kdump status // 查看 kdump 狀態
- 測試 kdump 是否可以正常 dump
echo c > /proc/sysrq-trigger
如果沒有問題,系統會自動重啓,重啓後可以看到在 / var/crash / 目錄下生成了 coredump 文件。
qemu 使用 kdump
我們經常會使用 qemu 去啓動虛擬機。qemu 啓動的內核發生錯誤也可以用 kdump 生成 vmcore 文件。
- 首先先將 qemu 的 panic 重啓關閉,防止 coredump 的時候發生了 reboot
echo 0 > /proc/sys/kernel/panic
- 觸發 kernel panic
echo c > /proc/sysrq-trigger
- kernel panic 後,使得 qemu 進入 monitor 模式
ctrl + A, ---> c, qemu 進入 monitor 模式
- 進入 monitor 模式後,進行 coredump
dump-guest-memory -z xxx-vmcore
如下圖所示,成功在 qemu 的 kernel panic 後,獲得了 coredump 文件。
使用 crash 分析內核奔潰轉儲文件
在內核奔潰後,如果部署了 kdump, 會在 / var/crash 目錄中找到 vmcore 轉儲文件,vmcore 文件可以配合 crash 工具進行分析。
crash 的版本要和內核的版本保持一致, 比如上面成功 dump 了 qemu arm64 的 coredump 文件,就需要配套的 arm64 的 crash 工具進行分析,否則會報兼容性錯誤。
編譯 arm64 crash 工具:
下載:https://github.com/crash-utility/crash/releases
編譯安裝:
$ tar -xf crash-7.2.8.tar.gz
$ cd crash-7.2.8/
$ make target=arm64
安裝完成後,使用 crash 工具分析 vmcore 文件, vmlinux 在編譯內核時會在根目錄下生成。
crash vmcore vmlinux
crash 常用命令
- bt: 查看函數調用棧
crash> bt
PID: 1452 TASK: ffff80007b0f1a80 CPU: 1 COMMAND: "sh"
#0 [ffff00000aeb3900] __delay at ffff000008af2528
#1 [ffff00000aeb3930] __const_udelay at ffff000008af2488
#2 [ffff00000aeb3940] panic at ffff0000080d7f04
#3 [ffff00000aeb3a20] die at ffff00000808cb18
#4 [ffff00000aeb3a60] die_kernel_fault at ffff00000809f7e8
#5 [ffff00000aeb3a90] __do_kernel_fault at ffff00000809f07c
#6 [ffff00000aeb3ac0] do_page_fault at ffff00000809f12c
#7 [ffff00000aeb3b30] do_translation_fault at ffff00000809f574
#8 [ffff00000aeb3b40] do_mem_abort at ffff000008081448
#9 [ffff00000aeb3ca0] el1_ia at ffff00000808318c
PC: ffff0000085dc0d0 [sysrq_handle_crash+32]
LR: ffff0000085dc0bc [sysrq_handle_crash+12]
SP: ffff00000aeb3cb0 PSTATE: 40000005
X29: ffff00000aeb3cb0 X28: ffff80007b0f1a80 X27: 0000000000000000
X26: 0000000000000000 X25: 0000000056000000 X24: 0000000000000000
X23: 0000000000000007 X22: ffff000009289000 X21: ffff000009289400
X20: 0000000000000063 X19: ffff0000091a1000 X18: ffffffffffffffff
X17: 0000000000000000 X16: 0000000000000000 X15: ffff0000091896c8
X14: ffff0000892ed70f X13: ffff0000092ed71d X12: ffff0000091a1000
X11: 0000000005f5e0ff X10: ffff000009189940 X9: 00000000ffffffd0
X8: ffff000008602b08 X7: 54203a2071527379 X6: 00000000000000d2
X5: 0000000000000000 X4: 0000000000000000 X3: ffffffffffffffff
X2: 2c501196acfc7700 X1: 0000000000000000 X0: 0000000000000001
#10 [ffff00000aeb3cb0] sysrq_handle_crash at ffff0000085dc0cc
#11 [ffff00000aeb3cc0] __handle_sysrq at ffff0000085dc6cc
#12 [ffff00000aeb3d00] write_sysrq_trigger at ffff0000085dcc60
#13 [ffff00000aeb3d20] proc_reg_write at ffff0000082ac7e4
#14 [ffff00000aeb3d40] __vfs_write at ffff00000823a9cc
#15 [ffff00000aeb3de0] vfs_write at ffff00000823ace0
#16 [ffff00000aeb3e20] ksys_write at ffff00000823afd4
#17 [ffff00000aeb3e70] __arm64_sys_write at ffff00000823b064
#18 [ffff00000aeb3e80] el0_svc_common at ffff000008094ef4
#19 [ffff00000aeb3eb0] el0_svc_handler at ffff000008094fa8
#20 [ffff00000aeb3ff0] el0_svc at ffff000008084044
PC: 0000000000401a58 LR: 00000000004b2be4 SP: 0000ffffe68f8e10
X29: 0000ffffe68f9500 X28: 0000ffffe68f9fba X27: 000000000056f9c0
X26: 00000000005ed000 X25: 0000000000000000 X24: 0000000000000020
X23: 0000000011710110 X22: 00000000005ed000 X21: 0000000000000002
X20: 0000000011710110 X19: 0000000000000001 X18: 0000000000000001
X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000008
X14: 0000000000000012 X13: 726567676972742d X12: 0101010101010101
X11: 0000005000564818 X10: 0101010101010101 X9: fffffffffffffff0
X8: 0000000000000040 X7: 0000000011710120 X6: 0080808080808080
X5: 0000000000000000 X4: 0000000000000063 X3: 0000000011710111
X2: 0000000000000002 X1: 0000000011710110 X0: 0000000000000001
ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 80000000
- log: 查看內核 dmesg 日誌
crash> log
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd070]
[ 0.000000] Linux version 4.20.0-rc4-00007-gef78e5e (root@localhost.localdomain) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #3 SMP PREEMPT Wed Jan 15 07:52:10 PST 2020
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xbdfea840-0xbdfebfff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] On node 0 totalpages: 524288
[ 0.000000] DMA32 zone: 8192 pages used for memmap
[ 0.000000] DMA32 zone: 0 pages reserved
[ 0.000000] DMA32 zone: 524288 pages, LIFO batch:63
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv0.2 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] random: get_random_bytes called from start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____) s55704 r8192 d30312 u94208
[ 0.000000] pcpu-alloc: s55704 r8192 d30312 u94208 alloc=23*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] CPU features: enabling workaround for ARM erratum 832075
[ 0.000000] CPU features: enabling workaround for ARM erratum 834220
[ 0.000000] CPU features: enabling workaround for EL2 vector hardening
[ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 516096
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=/linuxrc console=ttyAMA0
[ 0.000000] Memory: 2009884K/2097152K available (10876K kernel code, 1414K rwdata, 5100K rodata, 1344K init, 380K bss, 54500K reserved, 32768K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[ 0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
- struct: 查看數據結構
crash> struct task_struct ffff0000085dc0d0 -x
struct task_struct {
thread_info = {
flags = 0xa8c17bfd39000020,
addr_limit = 0xd503201fd65f03c0,
preempt_count = 0xa9bf7bfd
},
state = 0x97ec827fd50342ff,
stack = 0xd65f03c0a8c17bfd,
usage = {
counter = 0xa9bd7bfd
},
flags = 0x910003fd,
ptrace = 0xa90153f3,
wake_entry = {
next = 0xaa0103f4911b2262
},
on_cpu = 0xf9400041,
cpu = 0xf90017a1,
wakee_flips = 0xd2800001,
wakee_flip_decay_ts = 0x37f8018097f909a2,
last_wakee = 0xf10bfc7ff94013a3,
recent_used_cpu = 0x54000228,
wake_cpu = 0xb0006561,
on_rq = 0x91018021,
prio = 0xf9401284,
static_prio = 0x52800000,
normal_prio = 0xb9404022,
rt_priority = 0x79000083,
sched_class = 0x911b2273b9004022,
se = {
load = {
weight = 0x940b05adf9400013,
inv_weight = 0x91012260
},
runnable_weight = 0x97edbddd91052260,
run_node = {
__rb_parent_color = 0x940b05c5aa1403e0,
rb_right = 0x97f0ec85aa1303e0,
rb_left = 0xa8c27bfda94153f3
},
group_node = {
next = 0xd503201fd65f03c0,
prev = 0x52800021a9bf7bfd
},
on_rq = 0x910003fd,
exec_start = 0xd280000097f251d0,
sum_exec_runtime = 0xa8c17bfd97ec8318,
vruntime = 0xd503201fd65f03c0,
prev_sum_exec_runtime = 0x910003fda9be7bfd,
nr_migrations = 0xd1012013f9000bf3,
statistics = {<No data fields>},
depth = 0x39434660,
parent = 0x52800020f9000fb4,
cfs_rq = 0xb940ce7439034a60,
my_q = 0x52800023d5033f9f,
avg = {
last_update_time = 0x940b0cf552800001,
load_sum = 0x52800003aa1303e0,
runnable_load_sum = 0x5280002152800c62,
util_sum = 0x940b0cf0,
struct -o [struct] : 顯示結構體中成員的偏移
struct [struct] [address] : 顯示對應地址結構體的值
[struct] [address] :簡化形式顯示對應地址結構體的值
[struct] [address] -xo: 打印結構體定義和大小
[struct].member[address]: 顯示某個成員的值
- rd: 讀取內存內容
crash> rd ffff0000085dc0d0 32
ffff0000085dc0d0: a8c17bfd39000020 d503201fd65f03c0 ..9.{...._.. ..
ffff0000085dc0e0: 910003fda9bf7bfd 97ec827fd50342ff .{.......B......
ffff0000085dc0f0: d65f03c0a8c17bfd 910003fda9bd7bfd .{...._..{......
ffff0000085dc100: b0005d73a90153f3 aa0103f4911b2262 .S..s]..b"......
ffff0000085dc110: f90017a1f9400041 910083a2d2800001 A.@.............
ffff0000085dc120: 37f8018097f909a2 f10bfc7ff94013a3 .......7..@.....
ffff0000085dc130: b000656154000228 f940128491018021 (..Tae..!.....@.
ffff0000085dc140: b940402252800000 1100044279000083 ...R"@@....yB...
ffff0000085dc150: 911b2273b9004022 f9400261f94017a2 "@..s"....@.a.@.
ffff0000085dc160: b50000c1ca010041 a8c37bfda94153f3 A........SA..{..
ffff0000085dc170: 128002a0d65f03c0 97ebee0b17fffff7 .._.............
ffff0000085dc180: 910003fda9be7bfd aa0003f4a90153f3 .{.......S......
ffff0000085dc190: 940b05adf9400013 97ec5fb991012260 ..@.....`"..._..
ffff0000085dc1a0: 97edbddd91052260 940b05c5aa1403e0 `"..............
ffff0000085dc1b0: 97f0ec85aa1303e0 a8c27bfda94153f3 .........SA..{..
ffff0000085dc1c0: d503201fd65f03c0 52800021a9bf7bfd .._.. ...{..!..R
rd [addr] [len]: 查看指定地址,長度爲 len 的內存
rd -S [addr][len]: 嘗試將地址轉換爲對應的符號
rd [addr] -e [addr] : 查看指定內存區域內容
- dis: 進行返彙編,查看對應地址的代碼邏輯
crash> dis -r ffff0000085dc0d0
0xffff0000085dc0b0 <sysrq_handle_crash>: stp x29, x30, [sp,#-16]!
0xffff0000085dc0b4 <sysrq_handle_crash+4>: mov x29, sp
0xffff0000085dc0b8 <sysrq_handle_crash+8>: bl 0xffff000008141a48 <__rcu_read_unlock>
0xffff0000085dc0bc <sysrq_handle_crash+12>: adrp x1, 0xffff0000092e9000 <xen_dummy_shared_info+984>
0xffff0000085dc0c0 <sysrq_handle_crash+16>: mov w0, #0x1 // #1
0xffff0000085dc0c4 <sysrq_handle_crash+20>: str w0, [x1,#1448]
0xffff0000085dc0c8 <sysrq_handle_crash+24>: dsb st
0xffff0000085dc0cc <sysrq_handle_crash+28>: mov x1, #0x0 // #0
0xffff0000085dc0d0 <sysrq_handle_crash+32>: strb w0, [x1]
crash> dis -f ffff0000085dc0d0
0xffff0000085dc0d0 <sysrq_handle_crash+32>: strb w0, [x1]
0xffff0000085dc0d4 <sysrq_handle_crash+36>: ldp x29, x30, [sp],#16
0xffff0000085dc0d8 <sysrq_handle_crash+40>: ret
- ps: 查看線程狀態
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffff000009192580 RU 0.0 0 0 [swapper/0]
0 0 1 ffff80007bbc1a80 RU 0.0 0 0 [swapper/1]
1 0 0 ffff80007bb68000 IN 0.0 2196 60 linuxrc
2 0 0 ffff80007bb68d40 IN 0.0 0 0 [kthreadd]
3 2 0 ffff80007bb69a80 ID 0.0 0 0 [rcu_gp]
4 2 0 ffff80007bb6a7c0 ID 0.0 0 0 [rcu_par_gp]
5 2 0 ffff80007bb6b500 ID 0.0 0 0 [kworker/0:0]
6 2 0 ffff80007bb6c240 ID 0.0 0 0 [kworker/0:0H]
7 2 0 ffff80007bb6cf80 ID 0.0 0 0 [kworker/u4:0]
8 2 0 ffff80007bb6dcc0 ID 0.0 0 0 [mm_percpu_wq]
9 2 0 ffff80007bb6ea00 IN 0.0 0 0 [ksoftirqd/0]
10 2 0 ffff80007bbc0000 ID 0.0 0 0 [rcu_preempt]
11 2 0 ffff80007bbc0d40 IN 0.0 0 0 [migration/0]
12 2 0 ffff80007bbc27c0 IN 0.0 0 0 [cpuhp/0]
13 2 1 ffff80007bbc3500 IN 0.0 0 0 [cpuhp/1]
14 2 1 ffff80007bbc4240 IN 0.0 0 0 [migration/1]
15 2 1 ffff80007bbc4f80 IN 0.0 0 0 [ksoftirqd/1]
16 2 1 ffff80007bbc5cc0 ID 0.0 0 0 [kworker/1:0]
17 2 1 ffff80007bbc6a00 ID 0.0 0 0 [kworker/1:0H]
18 2 0 ffff80007bbd0000 IN 0.0 0 0 [kdevtmpfs]
19 2 0 ffff80007bbd0d40 ID 0.0 0 0 [netns]
20 2 0 ffff80007b040000 ID 0.0 0 0 [kworker/u4:1]
21 2 1 ffff80007b040d40 IN 0.0 0 0 [rcu_tasks_kthre]
42 2 1 ffff80007b0f3500 ID 0.0 0 0 [kworker/1:1]
43 2 0 ffff80007b0f4240 ID 0.0 0 0 [kworker/0:1]
49 2 1 ffff80007b0f4f80 ID 0.0 0 0 [kworker/u4:2]
56 2 1 ffff80007b140000 IN 0.0 0 0 [kauditd]
212 2 0 ffff80007b26ea00 ID 0.0 0 0 [kworker/u4:3]
256 2 0 ffff80007b336a00 ID 0.0 0 0 [kworker/u4:4]
471 2 1 ffff80007b2d6a00 IN 0.0 0 0 [oom_reaper]
472 2 1 ffff80007b2d5cc0 ID 0.0 0 0 [writeback]
474 2 0 ffff80007b330d40 IN 0.0 0 0 [kcompactd0]
475 2 0 ffff80007b3327c0 IN 0.0 0 0 [ksmd]
476 2 0 ffff80007b2d1a80 IN 0.0 0 0 [khugepaged]
477 2 0 ffff80007b2d0000 ID 0.0 0 0 [crypto]
478 2 1 ffff80007b2d0d40 ID 0.0 0 0 [kintegrityd]
480 2 1 ffff80007b2d27c0 ID 0.0 0 0 [kblockd]
501 2 1 ffff80007b2d3500 ID 0.0 0 0 [tpm_dev_wq]
508 2 1 ffff80007b2d4240 ID 0.0 0 0 [ata_sff]
541 2 0 ffff80007ac98000 ID 0.0 0 0 [edac-poller]
551 2 1 ffff80007b044240 ID 0.0 0 0 [devfreq_wq]
561 2 1 ffff80007b268000 IN 0.0 0 0 [watchdogd]
647 2 0 ffff80007b268d40 ID 0.0 0 0 [rpciod]
648 2 1 ffff80007b26c240 ID 0.0 0 0 [kworker/u5:0]
649 2 0 ffff80007ad04f80 ID 0.0 0 0 [xprtiod]
718 2 1 ffff80007bbd3500 IN 0.0 0 0 [kswapd0]
815 2 1 ffff80007ad00000 ID 0.0 0 0 [nfsiod]
1250 2 0 ffff80007b26dcc0 ID 0.0 0 0 [vfio-irqfd-clea]
> 1452 1 1 ffff80007b0f1a80 RU 0.0 2196 76 sh
ps -p [pid]: 顯示進程父子關係
ps -t [pid]: 顯示進程運行時間
- kmem: 查看內核內存使用情況
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 511276 2 GB ----
FREE 506631 1.9 GB 99% of TOTAL MEM
USED 4645 18.1 MB 0% of TOTAL MEM
SHARED 353 1.4 MB 0% of TOTAL MEM
BUFFERS 0 0 0% of TOTAL MEM
CACHED 480 1.9 MB 0% of TOTAL MEM
SLAB 1930 7.5 MB 0% of TOTAL MEM
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE
TOTAL SWAP 0 0 ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 0 0 0% of TOTAL SWAP
COMMIT LIMIT 255638 998.6 MB ----
COMMITTED 479 1.9 MB 0% of TOTAL LIMIT
crash>
kmem -i: 查看內存整體使用情況
kmem -s: 查看 slab 使用情況
kmem [addr]: 搜索地址所屬的內存結構
- 更多其它命令通過 help 查看
內核 panic 實例
內核訪問空指針產生 panic。
- 驅動製作
編寫一個驅動,構造一個內核模塊訪問空指針的異常,演示如何使用 crash 分析內核奔潰的原因。
include <linux/module.h>
#include <linux/kernel.h>
#include <linux/atomic.h>
#include <linux/slab.h>
struct my_struct {
unsigned long head;
spinlock_t lock;
};
int *addr = 0; //null pointer
void panic_foo(struct my_struct *ms)
{
int *p = addr;
spin_lock(&ms->lock);
if (ms->head == 10) {
*p = 0xFFFF;
} else if (ms->head = 0) {
// do sth
} else {
// do sth
}
spin_unlock(&ms->lock);
}
int panic_kernel_init(void)
{
struct my_struct *ms = kzalloc(sizeof(struct my_struct), GFP_KERNEL);
spin_lock_init(&ms->lock);
ms->head = 10;
panic_foo(ms);
return 0;
}
void panic_kernel_exit(void)
{
}
module_init(panic_kernel_init);
module_exit(panic_kernel_exit);
obj-m := panic-kernel.o
KERNEL_DIR := /home/linux
PWD := $(shell pwd)
all:
make -C $(KERNEL_DIR) SUBDIRS=$(PWD) modules
clean:
rm *.o *.ko *.mod.c
.PHONY: clean
將編好的驅動打包進根文件系統, 啓動後插入內核模塊。
- panic 分析
內核的 call trace 如上圖所示, 將對應的文件反彙編,找到問題出現對應的代碼。
aarch64-linux-gnu-objdump -S panic-kernel.o > test.txt
截取部分反彙編如下:
Disassembly of section .text:
0000000000000000 <panic_foo>:
int *addr = 0; //null pointer
void panic_foo(struct my_struct *ms)
{
0: a9bd7bfd stp x29, x30, [sp, #-48]!
4: 910003fd mov x29, sp
8: a90153f3 stp x19, x20, [sp, #16]
c: aa0003f3 mov x19, x0
int *p = addr;
10: 90000000 adrp x0, 0 <panic_foo>
raw_spin_lock_init(&(_lock)->rlock); \
} while (0)
從彙編代碼可以看出, panic_foo 函數的參數 (x0) 最終保存在 x19 寄存器。我們現在想要知道出現問題時,代碼走的是哪一個分支。
配合 crash 進行分析,先導入模塊符號表:
crash> mod -S my_module
MODULE NAME SIZE OBJECT FILE
ffff000000ae2000 panic_kernel 16384 my_module/panic-kernel.o
使用 crash 查看出問題時結構體的值,確認函數走的是哪個分支。函數的參數是 x19:
crash> struct my_struct ffff8000fa4d9780
struct my_struct {
head = 10,
lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 1
},
{
locked = 1 '\001',
pending = 0 '\000'
},
{
locked_pending = 1,
tail = 0
}
}
}
}
}
}
}
從打印的之來看,head 成員的值爲 10, 可以確定代碼走的是哪一個分支。
再結合之前的反彙編代碼, 出錯的位置在 pc: panic_foo +0x54。pc 保存的是棧頂指針,lr 保存的是函數返回的地址(x30)
static __always_inline void spin_unlock(spinlock_t *lock)
{
raw_spin_unlock(&lock->rlock);
38: aa1403e0 mov x0, x20
3c: 94000000 bl 0 <_raw_spin_unlock>
} else {
// do sth
}
spin_unlock(&ms->lock);
}
40: f94013f5 ldr x21, [sp, #32]
44: a94153f3 ldp x19, x20, [sp, #16]
48: a8c37bfd ldp x29, x30, [sp], #48
4c: d65f03c0 ret
*p = 0xFFFF;
50: 529fffe0 mov w0, #0xffff // #65535
54: b90002a0 str w0, [x21]
58: aa1403e0 mov x0, x20
5c: 94000000 bl 0 <_raw_spin_unlock>
}
偏移 54 的位置是把 w0 的值保存到 x21, 而 x21 的地址是 0。w0 的值是 mov w0, 0xffff 直接賦值得來的。所以這裏是將 0xffff 直接寫到 0 地址導致的問題。
綜上如上信息,結合實際的代碼,最終找到問題的原因。
人人都是極客 號主 Peter Liu,芯片大廠資深系統工程師,科技公司技術合夥人,谷歌優秀講師。和我一起戰略上藐視技術,戰術上重視技術。
本文由 Readfog 進行 AMP 轉碼,版權歸原作者所有。
來源:https://mp.weixin.qq.com/s/_ZKix3ZJ8NqwkcvgR_6Zgg