DDD 2009年10月31日 星期六 21:46 | 2745次浏览 | 1条评论
The first doubleword of PPC64 ABI function descriptors contains the address of the entry point of the function.
"module_event" function descr
c
c00
c
*** c0 00 00 00 00 09 5c 80 -> the entry point of the "modu
c0
c0
c
抓虫日记之 kgdb set breakpoint at ppc64
A: BUG重现步骤
Host:
1: connect gdb to kgdb(GDB was configured as "--host=i686-pc-linux-gnu --target=powerpc-linux-gnu".)
(gdb) target remote udp:10.0.0.15:6443
2: set a break point at "module_event"
(gdb) b module_event
Target:
3: insert a module, and the "module_event" breakpoint will be hit, then we get the following error:
root@atca6101:/root> insmod /tmp/dummy.ko
Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x7d82100800095c80
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT NUMA LTT NESTING LEVEL : 0
Maple
Modules linked in: dummy(+) kgdboe
NIP: 7d82100800095c80 LR: c00000000007a14c CTR: 7d82100800095c80
REGS: c00000017817b9b0 TRAP: 0400 Not tainted (2.6.27.37-WR3.0.2as_standard-00080-gb14bbdf-dirty)
MSR: 9000000040009032 <EE,ME,IR,DR> CR: 24002088 XER: 00000000
TASK = c00000017a183180[2223] 'insmod' THREAD: c000000178178000
GPR00: 7d82100800095c80 c00000017817bc30 c0000000005fa2e8 c000000000547c88
GPR04: 0000000000000001 d00000000002d100 0000000024002022 c000000000011770
GPR08: c00000017817b660 c0000000005b0360 c00000000060c300 0000000000000000
GPR12: 0000000044002088 c00000000060c300 0000000000000000 000000001008a334
GPR16: 00000000100b142c 00000000100ad5c0 00000000100eb278 00000000100ad72c
GPR20: 00000000100eb2c0 00000000100e5030 0000000000000000 00000000100ad5c4
GPR24: c000000000491940 0000000000000000 0000000000000001 d00000000002d100
GPR28: 0000000000000000 fffffffffffffffc c000000000593ae0 0000000000000000
NIP [7d82100800095c80] 0x7d82100800095c80
LR [c00000000007a14c] .notifier_call_chain+0xcc/0x120
Call Trace:
[c00000017817bc30] [c00000000007a15c] .notifier_call_chain+0xdc/0x120 (unreliable)
[c00000017817bce0] [c00000000007a520] .__blocking_notifier_call_chain+0x70/0xb0
[c00000017817bd90] [c000000000089ae0] .SyS_init_module+0x100/0x260
[c00000017817be30] [c00000000000852c] syscall_exit+0x0/0x40
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace ff196d014336a31d ]---
Segmentation fault
来自我同事的简洁描述:
I disassembled the vmlinux file to see that the start of the module_event is as follows:
(gdb) i line *0xc0000000000a1840
Line 1622 of "/workspace/6101/build/linux/kernel/kgdb.c"
starts at address 0xc0000000000a1840 <module_event>
and ends at 0xc0000000000a1850 <kgdb_tasklet_bpt>.
And yet gdb insists on putting a breakpoint at a different location.
(gdb) i line *0xc0000000005b7b98
No line number information available for address
0xc0000000005b7b98 <module_event>
(gdb) i line module_event
Line 1622 of "/workspace/6101/build/linux/kernel/kgdb.c"
starts at address 0xc0000000000a1840 <module_event>
and ends at 0xc0000000000a1850 <kgdb_tasklet_bpt>.
[jl@prt-server5 linux-emer_atca6101-standard-build]$ grep module_event System.map
c0000000000a1840 t .module_event
c0000000005b7b98 d module_event
So gdb is picking the "d" one... I don't know what the "d" means in the System.map file though.
B: BUG现场分析
"Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x7d82100800095c80"
如果我们仔细观察“7d82100800095c80” 这个地址,可以发现其开头的"7d821008" 是PPC平台的触发断点的指令.
造成这个BUG的原因很可能就是 gdb/kgdb 本来要修改指针指向指令内容的值,由于某些原因,把这个指针地址本身给改了.
举个实例:
void * ptr;
&ptr = 0x005b0360
(*ptr) = (*0x005b0360) = 0x00095c80
本来是想修改(*ptr)指向的内容,即把0x00095c80 修改为 0x7d821008,
但由于某些错误操作, 把&ptr自己给修改了,即把 0x005b0360 修改成 0x7d821008了
所以系统在执行(*ptr) -> (*0x7d821008) 取指令的时候出问题了.
C: BUG触发原因
经过一番在kgdb里的艰苦打印调试,并没有发现kgdb有任何异常.
kgdb没辙了,就转向gdb吧.
一般来说,往哪个点设置什么值,是由gdb来主导的,kgdb只是执行相应的动作,既然kgdb是正常执行的,
那也许就意味着是gdb搞错地址了,把module_event函数的地址给取错了,然后触发了这个问题.
于是我objdump出vmlinux的符号地址,然后grep了下module_event这个符号,找到如下信息:
******************************************************************************************************
...
c0000000005b0360 <module_event>:
c0000000005b0360: c0 00 00 00 lfs f0,0(0)
c0000000005b0364: 00 09 5c 80 .long 0x95c80
c0000000005b0368: c0 00 00 00 lfs f0,0(0)
c0000000005b036c: 00 5f a2 e8 .long 0x5fa2e8
...
c000000000095c80 <.module_event>:
c000000000095c80: 7c 08 02 a6 mflr r0
c000000000095c84: fb c1 ff f0 std r30,-16(r1)
...
******************************************************************************************************
发现有两个关于module_event, 很显然它们的关系是:
看起来上面那个module_event是函数符号表之类的东西,然后它的内容是指向真正的函数地址
(* 0xc0000000005b0360 <module_event>) -> c000000000095c80 <.module_event>
<.module_event> 是真正的函数入口点地址.
我查看了下 ppc64的 ABI文档,找到了有关上面的解释。
我把关键内容贴出来:
******************************************************************************************************
In PPC64 ABI, there is a function descriptors structure.
PPC64 ABI Function Descriptors
A function descriptor is a three doubleword data structure that contains the following values:
* The first doubleword contains the address of the entry point of the function.
* The second doubleword contains the TOC base address for the function.
* The third doubleword contains the environment pointer for languages such as Pascal and PL/1.
For an externally visible function, the value of the symbol with the same name as the function is the address of the function descriptor. Symbol names with a dot (.) prefix are reserved for holding entry point addresses. The value of a symbol named ".FN" is the entry point of the function "FN".
The value of a function pointer in a language like C is the address of the function descriptor.
******************************************************************************************************
其它更多的有关ppc64 ABI的信息,可以浏览
http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi-1.9.html#FUNC-DES
因此"c0000000005b0360 <module_event>" 是函数描述符,其指向的地址 “c000000000095c80 <.module_event>”才是真正的函数地址.
看到这,就豁然开朗了,原来gdb那个笨蛋把0xc0000000005b0360这个当成module_event函数的地址,并修改插入断点值.
******************************************************************************************************
c0000000005b0360 <module_event>:
c0000000005b0360: 7d 82 21 08 ******----> here was modified to "7d 82 21 08"
c0000000005b0364: 00 09 5c 80 .long 0x95c80
c0000000005b0368: c0 00 00 00 lfs f0,0(0)
c0000000005b036c: 00 5f a2 e8 .long 0x5fa2e8
...
c000000000095c80 <.module_event>:
c000000000095c80: 7c 08 02 a6 mflr r0
c000000000095c84: fb c1 ff f0 std r30,-16(r1)
...
******************************************************************************************************
导致系统读取函数描述符的地址去取指令的时候,访问无效地址而出问题...
The right action of gdb should be:
******************************************************************************************************
...
c0000000005b0360 <module_event>:
c0000000005b0360: c0 00 00 00 lfs f0,0(0)
c0000000005b0364: 00 09 5c 80 .long 0x95c80
c0000000005b0368: c0 00 00 00 lfs f0,0(0)
c0000000005b036c: 00 5f a2 e8 .long 0x5fa2e8
...
c000000000095c80 <.module_event>:
c000000000095c80: 7d 82 21 08 ********modifiy here to "7d 82 21 08"*******
c000000000095c84: fb c1 ff f0 std r30,-16(r1)
...
******************************************************************************************************
D: BUG解决方法
修改gdb对ppc64 arch的函数符号解析规则,让其能获取到正确的函数入口地址,而不是取函数描述符.
Zeuux © 2024
京ICP备05028076号
回复 劳永超 2010年01月09日 星期六 15:11
膜拜