DDD的博客

抓虫日记之 kgdb set breakpoint at mips

DDD 2009年10月31日星期六 21:45 | 2593次浏览 | 0条评论

A: BUG重现步骤

A.1 写一个空函数module_event,并且注册到module的通知事件链上：

static int module_event(struct notifier_block *self, unsigned long val,

void *data)

{

return 0;

}

static struct notifier_block module_load_nb = {

.notifier_call = module_event,

};

register_module_notifier(&gdb_module_load_nb);

A.2 Host:

A.2.1: connect gdb to kgdb(GDB was configured as "--host=i686-pc-linux-gnu --target=mips-linux-gnu".)

(gdb) target remote udp:10.0.0.15:6443

A.2.2: set a break point at "module_event"

(gdb) b module_event

A.3 Target:

insert a module, and the "module_event" breakpoint will be hit,

A.4 Host:

send a "c" order to resume system,

(gdb) c

after do "c", the system will no response.

B: BUG现场分析

其实系统并没有挂掉,只是陷入了一个kgdb踩中断点和响应断点事件的死循环里面.

经过跟踪调试,kgdb的异常行为总结如下:

从触发一个断点进入do_trap_or_bp开始：

void do_trap_or_bp() -> notify_die() -> notifier_call_chain()

int notifier_call_chain()

{

struct notifier_block *nb, *next_nb;

...

ret = nb->notifier_call(nb, val, v)

---> kgdb_handle_exception()

...

}

一直进行如上循环，看起来是在notifier_call_chain()函数中被设置了一个断点，

由于kgdb会使用那块代码，所以导致kgdb不断的自己击中那个断点而陷入无限死循环..

C: BUG触发原因

首先 -> 我们并没有在notifier_call_chain()函数的任何地方设置过断点的.

但问题很明显，在击中module_event断点后，导致kgdb陷入死循环，所以就先从那下手.

act: 我们在击中那个断点后，然后在gdb端发个"continue"命令让系统继续运行.

Note gdb 如何响应 "continue" 命令：

在执行"continue" 命令时，由于需要将断点重置回去，gdb将

1：先执行一个单步命令，跳过这个断点地址

2：跳过这个断点地址后，在设置断点，

3：真正发出 c 命令

通过上面的分析，我们会先在"module_event"那做个单步,

我们来分析下"module_event"的如何实现单步以及其调用场景:

1： misp上的单步是通过软单步来实现的，即由gdb计算出下一个pc值，并且在其地址设置断点

**********************************************************************

int module_event(struct notifier_block *self, unsigned long val,

void *data)

{

return 0;

}

1 notifier_call_chain() kernel/notifier.c: 578

2 {

3 while (nb && nr_to_call) {

4 next_nb = rcu_dereference(nb->next);

5 ret = nb->notifier_call(nb, val, v); -----> call module_event()

7 if (nr_calls)

8 (*nr_calls)++;

9 nb = next_nb;

10 nr_to_call--;

11 }

12 }

**********************************************************************

But for some compiler's reasons,

the module_event() function be compiled as following:

"module_event" at MIPS:

00000000 <module_event>:

0: 03e00008 jr ra

4: 00001021 move v0,zero

熟悉Mips体系结构的朋友应该了解,

在mips上，紧跟着任何跳转指令的指令(在延迟槽中)会被CPU执行,即跳转被执行.(Mips流水线，跳转延迟)

所以我们可以认为module_event函数只有一条指令,如果执行单步的话，必然断点将下在其下一个运行代码，

而从上面的分析来看，这个软单步断点将被下在：

"7 if (nr_calls)" of notifier_call_chain().

终于找到原因了.

D： BUG解决方法

原因虽然是找到了，但解决方法确还比较麻烦.

我开始做了一个workaround的patch，即在module_event函数里插入空指令

**********************************************************************

#include <asm/system.h>

#ifndef nop

#define nop() __asm__ __volatile__ ("nop")

#endif

int module_event(struct notifier_block *self, unsigned long val,

void *data)

{

* add an "nop" instruction to avoid kgdb trap a die loop

* when gdb do an software single step to skip the

* "module_event" breakpoint.

nop();

return 0;

}

**********************************************************************

但这并没有解决本质问题，这个问题的关键点在于, kgdb依赖了notifier_call_chain(), 我们只要往notifier_call_chain()下了断点

就会触发这个问题，所以根本的解决方法是， A:"将kgdb依赖notifier_call_chain()关系移出",

从更长远的角度来看，应该是实现 B:“禁止在kgdb的代码调用链里下断点“ 这个特性.

对于A:来说我们需要在do_trap_or_bp()那里做个hook，让kgdb不在通过notify_die()来响应事件.

目前这个我正在实现这个问题(在JasonW的指导下. :) ).

对于B:“禁止在kgdb的代码调用链里下断点“ 这个特性.

这是我的想法，并且有些实现思路，我想我会在近一个月内尝试实现。

分享添加到桌面

暂时没有评论