Skip to content

Latest commit

 

History

History
680 lines (531 loc) · 23 KB

201603011322.txt.md

File metadata and controls

680 lines (531 loc) · 23 KB

标题: 自编译GDB执行出错后的排查

创建: 2016-03-01 13:22 更新: 链接: http://scz.617.cn/unix/201603011322.txt

在x64/Solaris 10上出现一个原始需求,调试64-bits样本。

此OS自带两大神器DTrace+MDB,但由于某些不便明说的理由,现在需要64-bits GDB。 网上可以下载gdb-6.8-sol10-x86-local.gz,这是预编译好的32-bits GDB。未能找 到现成的64-bits GDB for Solaris 10。

Google for "OpenCSW Solaris GDB",找到:

https://www.opencsw.org/package/gdb/ https://en.wikipedia.org/wiki/OpenCSW

据说这样可以安装GDB:

pkgadd -d http://get.opencsw.org/now /opt/csw/bin/pkgutil -U /opt/csw/bin/pkgutil -y -i gdb /usr/sbin/pkgchk -L CSWgdb

有兴趣者可以试试。

GDB当前最新版本是7.11,同样由于不便明说的理由,我只关注6.8版。执行自编译好 的64-bits GDB 6.8时,提示错误:

Interpreter `console' unrecognized

针对此问题,做了大量排查工作,最后将问题简化成本文所描述的这样。由于问题本 质与64-bits无关,下面以32-bits为例进行说明。

wget http://ftp.gnu.org/gnu/gdb/gdb-6.8a.tar.bz2 bzip2 -dc gdb-6.8a.tar.bz2 | tar xf - cd gdb-6.8/ rm -rf build mkdir build cd build CC=/usr/sfw/bin/gcc ../configure --program-transform-name='s,(.*),x86-\1-6.8,' -v make

会碰上错误:

../../gdb/remote.c: In function extended_remote_attach_1': ../../gdb/remote.c:2859: warning: unsigned int format, pid_t arg (arg 3) *** Error code 1 make: Fatal error: Command failed for target remote.o'

参看:

https://sourceware.org/bugzilla/show_bug.cgi?id=12003

这个BUG在GDB 7.0得到修复,临时解决方案是:

vi ../gdb/remote.c

将2859行的

sprintf (rs->buf, "vAttach;%x", pid);

修改成

sprintf (rs->buf, "vAttach;%x", (int)pid);

继续

make ls -l gdb/gdb file gdb/gdb ldd gdb/gdb

看似一切正常,结果执行时碰上错误:

[root@ /export/home/scz/src/gdb-6.8/build]> gdb/gdb Interpreter `console' unrecognized

Google for "GDB Interpreter console unrecognized"

http://gdb.sourceware.narkive.com/Z06KUw8E/error-message-interpreter-console-unrecognized-with-gdb-6-8-pcsolaris https://www.sourceware.org/ml/gdb/2009-11/msg00180.html

2009年有人碰上过同样的问题,他最后在configure阶段去掉--disable-tui,解决了。 但我configure时本来就没有--disable-tui,所以他所谓的解决只是碰巧而已。放狗 搜了很久,没有更多有价值的信息。我又试了7.0/7.6的源码,问题依旧。如果这是 一个很明显的BUG,应该很多人碰上,并为新版本所修复。现在看来,这个BUG的触发 条件很隐蔽。我确实需要自编译GDB(主要是为了64-bits GDB),决定调试解决。

查看GDB帮助

https://sourceware.org/gdb/onlinedocs/gdb/ https://sourceware.org/gdb/onlinedocs/gdb/Concept-Index.html https://sourceware.org/gdb/onlinedocs/gdb/Mode-Options.html gdb -h

有个参数可以指定interpreter:

--interpreter console -i console

在源码中应该有:

Interpreter `%s' unrecognized

用Source Insight找到它:


struct interp *interp = interp_lookup( interpreter_p ); if ( interp == NULL ) { error( _( "Interpreter `%s' unrecognized" ), interpreter_p ); }

这段代码位于main.c的captured_main()中。interpreter_p指向"console"。检查 interp_lookup():


struct interp * interp_lookup ( char *name ) { struct interp *interp;

if ( name == NULL || strlen( name ) == 0 )
{
    return( NULL );
}
for ( interp = interp_list; interp != NULL; interp = interp->next )
{
    if (strcmp (interp->name, name) == 0)
        return interp;
}
return( NULL );

}

interp_list是条全局链表,如果"console"不在其中,interp_lookup()返回NULL。 检查interp_list如何初始化的:


void interp_add ( struct interp *interp ) { if ( !interpreter_initialized ) { initialize_interps(); } gdb_assert( interp_lookup( interp->name ) == NULL ); interp->next = interp_list; interp_list = interp; }

interp_add()是唯一一个直接写interp_list的函数。谁在调用它?至少有三个:

_initialize_cli_interp() _initialize_mi_interp() _initialize_tui_interp()

第一个对应"console":


/*

  • Standard gdb initialization hook. */ extern initialize_file_ftype _initialize_cli_interp;

void _initialize_cli_interp ( void ) { static struct interp_procs procs = { cli_interpreter_init, /* init_proc / cli_interpreter_resume, / resume_proc / cli_interpreter_suspend, / suspend_proc / cli_interpreter_exec, / exec_proc / cli_interpreter_display_prompt_p / prompt_proc_p */ }; struct interp *cli_interp;

/*
 * Create a default uiout builder for the CLI.
 */
cli_uiout   = cli_out_new( gdb_stdout );
cli_interp  = interp_new( INTERP_CONSOLE, NULL, cli_uiout, &procs );
interp_add( cli_interp );

}

注释中的"Standard gdb initialization hook."引起我的注意,一般碰上这类说法, 表明在调用_initialize_cli_interp()时有奇技淫巧。果然,找不到它的主调函数!

手头有/usr/local/bin/gdb,源自gdb-6.8-sol10-x86-local.gz,这个版本可以正常 执行,没有前述BUG。用DTrace看一下,是否包含对_initialize_cli_interp的调用:

dtrace -ln 'pid$target:a.out::entry' -c /usr/local/bin/gdb > /tmp/usr_local_bin_gdb_probe.txt

ID PROVIDER MODULE FUNCTION NAME ... 48293 pid114 gdb _initialize_cli_interp entry ...

dtrace -qwn 'pid$target:a.out:_initialize_cli_interp:entry {jstack();stop();}' -c /usr/local/bin/gdb

          gdb`_initialize_cli_interp
          gdb`initialize_all_files+0x204
          gdb`gdb_init+0x48
          gdb`captured_main+0x2ef
          gdb`catch_errors+0x47
          gdb`gdb_main+0x23
          gdb`main+0x3d
          gdb`_start+0x60

再看gdb/gdb

dtrace -qwn 'pid$target:a.out:_initialize_cli_interp:entry {jstack();stop();}' -c gdb/gdb

dtrace: invalid probe specifier ...: failed to lookup '_initialize_cli_interp' in module 'gdb'

错误提示指出gdb/gdb中没有_initialize_cli_interp()。

dtrace -qwn 'pid$target:a.out:initialize_all_files:entry {jstack();stop();}' -c gdb/gdb

          gdb`initialize_all_files
          gdb`gdb_init+0x48
          gdb`captured_main+0x30d
          gdb`catch_errors+0x47
          gdb`gdb_main+0x23
          gdb`main+0x3d
          gdb`_start+0x80

(Ctrl-C)

gdb/gdb中有对initialize_all_files()的调用。查看源码,这个函数只在call-cmds.h 中有一个函数原型:

extern void initialize_all_files ( void );

找不到它的函数主体。用调试器找找:

ps -ef | grep gdb

root   126   125   0 10:01:46 pts/3       0:00 gdb/gdb

/usr/local/bin/gdb -p 126

(gdb) symbol-file gdb/gdb Reading symbols from /export/home/scz/src/gdb-6.8/build/gdb/gdb...done. (gdb) bt #0 0x08092539 in initialize_all_files () at init.c:106 #1 0x0808af8c in gdb_init (argv0=0x8047d4c "gdb/gdb") at ../../gdb/top.c:1643 #2 0x0808466d in captured_main (data=0x0) at ../../gdb/main.c:604 #3 0x0810c973 in catch_errors (func=0x8084360 <captured_main>, func_args=0x8047c54, errstring=0x8271935 "", mask=6) at ../../gdb/exceptions.c:513 #4 0x080851f3 in gdb_main (args=0x8307d00) at ../../gdb/main.c:891 #5 0x0808430d in main (argc=137395344, argv=0x8307c90) at ../../gdb/gdb.c:33 (gdb) list init.c:106 101 extern initialize_file_ftype _initialize_xml_support; 102 extern initialize_file_ftype _initialize_target_descriptions; 103 extern initialize_file_ftype _initialize_inflow; 104 void 105 initialize_all_files (void) 106 { 107 _initialize_gdbtypes (); 108 _initialize_i386_tdep (); 109 _initialize_amd64_sol2_tdep (); 110 _initialize_i386_sol2_tdep ();

initialize_all_files()函数主体在init.c中。注意到init.c不在源码目录下,而是 位于二进制gdb所在目录。

more gdb/init.c

/* Do not modify this file. / / It is created automatically by the Makefile. / #include "defs.h" / For initialize_file_ftype. / #include "call-cmds.h" / For initialize_all_files. */ extern initialize_file_ftype _initialize_gdbtypes; ... void initialize_all_files (void) { _initialize_gdbtypes (); _initialize_i386_tdep (); _initialize_amd64_sol2_tdep (); ... _initialize_xml_support (); _initialize_target_descriptions (); _initialize_inflow (); }

注释里写了,这个init.c是Makefile自动生成的,不是原始源代码的一部分,在 Source Insight中找不到,其中果然没有调用_initialize_cli_interp()。这会导致 "Interpreter `console' unrecognized",但这还不是根源。为什么Makefile生成的 init.c没有调用_initialize_cli_interp()?理论上应该调用。

原始源代码的某个文件中应该包含动态生成init.c的操作,全文内容搜索找出:

gdb-6.8/gdb/Makefile.in

其中有这样一段:


We do this by grepping through sources. If that turns out to be too slow,

maybe we could just require every .o file to have an initialization routine

of a given name (top.o -> _initialize_top, etc.).

Formatting conventions: The name of the initialize* routines must start

in column zero, and must not be inside #if.

Note that the set of files with init functions might change, or the names

of the functions might change, so this files needs to depend on all the

object files that will be linked into gdb.

FIXME: There is a problem with this approach - init.c may force

unnecessary files to be linked in.

FIXME: cagney/2002-06-09: gdb/564: gdb/563: Force the order so that

the first call is to _initialize_gdbtypes (implemented by explicitly

putting that function's name first in the init.l-tmp file). This is

a hack to ensure that all the architecture dependant global

builtin_type_* variables are initialized before anything else

(per-architecture code is called in the same order that it is

registered). The ``correct fix'' is to have all the builtin types

made part of the architecture and initialize them on-demand (using

gdbarch_data) just like everything else. The catch is that other

modules still take the address of these builtin types forcing them

to be variables, sigh!

NOTE: cagney/2003-03-18: The sed pattern ``s|^([^ /]...'' is

anchored on the first column and excludes the ``/'' character so

that it doesn't add the $(srcdir) prefix to any file that already

has an absolute path. It turns out that $(DEC)'s True64 make

automatically adds the $(srcdir) prefixes when it encounters files

in sub-directories such as cli/ and mi/.

NOTE: cagney/2004-02-08: The ``case "$$fs" in'' eliminates

duplicates. Files in the gdb/ directory can end up appearing in

COMMON_OBS (as a .o file) and CONFIG_SRCS (as a .c file).

INIT_FILES = $(COMMON_OBS) $(TSOBS) $(CONFIG_SRCS) init.c: $(INIT_FILES) @echo Making init.c @rm -f init.c-tmp init.l-tmp @touch init.c-tmp @echo gdbtypes > init.l-tmp @-LANG=c ; export LANG ;
LC_ALL=c ; export LC_ALL ;
echo $(INIT_FILES) |
tr ' ' '\012' |
sed
-e '/^gdbtypes.[co]$$/d'
-e '/^init.[co]$$/d'
-e '/xdr_ld.[co]$$/d'
-e '/xdr_ptrace.[co]$$/d'
-e '/xdr_rdb.[co]$$/d'
-e '/udr.[co]$$/d'
-e '/udip2soc.[co]$$/d'
-e '/udi2go32.[co]$$/d'
-e '/version.[co]$$/d'
-e '/^[a-z0-9A-Z_][SU].[co]$$/d'
-e '/[a-z0-9A-Z
]
-exp.tab.[co]$$/d'
-e 's/.[co]$$/.c/'
-e 's,signals.c,signals/signals.c,'
-e 's|^([^ /][^ ])|$(srcdir)/\1|g' |
while read f; do
sed -n -e 's/^initialize([a-z_0-9A-Z]
)./\1/p' $$f 2>/dev/null;
done |
while read f; do
case " $$fs " in
" $$f " ) ;;
* ) echo $$f ; fs="$$fs $$f";;
esac;
done >> init.l-tmp @echo '/
Do not modify this file. /' >>init.c-tmp @echo '/ It is created automatically by the Makefile. /'>>init.c-tmp @echo '#include "defs.h" / For initialize_file_ftype. /' >>init.c-tmp @echo '#include "call-cmds.h" / For initialize_all_files. /' >>init.c-tmp @sed -e 's/(.)/extern initialize_file_ftype initialize\1;/' <init.l-tmp >>init.c-tmp @echo 'void' >>init.c-tmp @echo 'initialize_all_files (void)' >>init.c-tmp @echo '{' >>init.c-tmp @sed -e 's/(.*)/ initialize\1 ();/' <init.l-tmp >>init.c-tmp @echo '}' >>init.c-tmp @rm init.l-tmp @mv init.c-tmp init.c

这段内容最终进入:

gdb-6.8/build/gdb/Makefile

调试Makefile,尝试理解上述操作:

[root@ /export/home/scz/src/gdb-6.8/build/gdb]> cp Makefile sczdbg.mk [root@ /export/home/scz/src/gdb-6.8/build/gdb]> vi sczdbg.mk

比如


init.c: $(INIT_FILES) @echo $(INIT_FILES) |
tr ' ' '\012' |
sed
-e '/^gdbtypes.[co]$$/d'
-e '/^init.[co]$$/d'
-e '/xdr_ld.[co]$$/d'
-e '/xdr_ptrace.[co]$$/d'
-e '/xdr_rdb.[co]$$/d'
-e '/udr.[co]$$/d'
-e '/udip2soc.[co]$$/d'
-e '/udi2go32.[co]$$/d'
-e '/version.[co]$$/d'
-e '/^[a-z0-9A-Z_][SU].[co]$$/d'
-e '/[a-z0-9A-Z
]
-exp.tab.[co]$$/d'

[root@ /export/home/scz/src/gdb-6.8/build/gdb]> rm init.c [root@ /export/home/scz/src/gdb-6.8/build/gdb]> make -f sczdbg.mk init.c i386-tdep.o i387-tdep.o amd64-tdep.o ... cli-interp.o ... xml-tdesc.o xml-builtin.o inflow.o ../../gdb/cli/cli-dump.c ../../gdb/cli/cli-decode.c ../../gdb/cli/cli-script.c ... ../../gdb/cli/cli-interp.c ... ../../gdb/tui/tui-wingeneral.c ../../gdb/tui/tui-winsource.c ../../gdb/tui/tui.c

"echo | tr | sed"先生成一份文件列表,所有的*.o都会转成*.c,非绝对路径的还 会在前面追加$(srcdir)。接下来的第一个while从这些文件中析取_initialize_*():

[root@ /export/home/scz/src/gdb-6.8/gdb]> sed -n -e 's/^initialize([a-z_0-9A-Z])./\1/p' cli/cli-interp.c cli_interp

比如原来有个:

_initialize_cli_interp (void)

析出:

cli_interp

第二个while的作用是消重。之后init.l-tmp内容形如:


gdbtypes i386_tdep amd64_sol2_tdep ... xml_support target_descriptions inflow

在init.l-tmp中没有发现:

cli_interp

回头再说这个问题。接下来有2个sed:


sed -e 's/(.)/extern initialize_file_ftype initialize\1;/' <init.l-tmp >>init.c-tmp sed -e 's/(.)/ initialize\1 ();/' <init.l-tmp >>init.c-tmp

负责生成这种代码:


extern initialize_file_ftype _initialize_inflow;

_initialize_inflow ();

由于init.l-tmp中没有cli_interp,所以没有生成:


extern initialize_file_ftype _initialize_cli_interp;

_initialize_cli_interp ();

那么,为什么init.l-tmp中没有cli_interp?前面看到过:

../../gdb/cli/cli-interp.c

调试sczdbg.mk发现$(srcdir)是:

../../gdb

由于

-e 's|^([^ /][^ ]*)|$(srcdir)/\1|g'

导致这样的文件列表


../../gdb/i386-tdep.c ../../gdb/i387-tdep.c ../../gdb/amd64-tdep.c ... ../../gdb/cli-interp.c ... ../../gdb/target-memory.c ../../gdb/xml-tdesc.c ../../gdb/xml-builtin.c ../../gdb/inflow.c ../../gdb/../../gdb/cli/cli-dump.c ../../gdb/../../gdb/cli/cli-decode.c ../../gdb/../../gdb/cli/cli-script.c ... ../../gdb/../../gdb/cli/cli-interp.c ... ../../gdb/../../gdb/tui/tui-wingeneral.c ../../gdb/../../gdb/tui/tui-winsource.c ../../gdb/../../gdb/tui/tui.c

[root@ /export/home/scz/src/gdb-6.8/build/gdb]> ls -l ../../gdb/cli-interp.c ../../gdb/cli-interp.c: No such file or directory [root@ /export/home/scz/src/gdb-6.8/build/gdb]> ls -l ../../gdb/../../gdb/cli/cli-interp.c ../../gdb/../../gdb/cli/cli-interp.c: No such file or directory

通过上述文件列表根本无法正确定位cli-interp.c,从而无法析取cli_interp。

参看:

gdb-6.8/gdb/configure

或者

[root@ /export/home/scz/src/gdb-6.8/build]> ../configure --help

可以指定--srcdir:

[root@ /export/home/scz/src/gdb-6.8/build]> CC=/usr/sfw/bin/gcc ../configure --srcdir="/export/home/scz/src/gdb-6.8" --program-transform-name='s,(.*),x86-\1-6.8,' -v

这里居然不能指定/export/home/scz/src/gdb-6.8/gdb。

[root@ /export/home/scz/src/gdb-6.8/build]> make [root@ /export/home/scz/src/gdb-6.8/build]> gdb/gdb

可以正常执行。检查init.c:

[root@ /export/home/scz/src/gdb-6.8/build]> grep cli_interp gdb/init.c extern initialize_file_ftype _initialize_cli_interp; _initialize_cli_interp ();

之前在Linux上编译4.18至7.6版本GDB,同样创建build子目录,从未指定--srcdir, 不知Solaris上为何有这种古怪。configure、Makefile系统太琐碎复杂,但价值不高, 不再深究。想到一种可能,在Solaris上不创建build子目录,是否不会碰上该BUG?

[root@ /export/home/scz/src/gdb-6.8-test]> vi gdb/remote.c [root@ /export/home/scz/src/gdb-6.8-test]> CC=/usr/sfw/bin/gcc ./configure --program-transform-name='s,(.*),x86-\1-6.8,' -v [root@ /export/home/scz/src/gdb-6.8-test]> make [root@ /export/home/scz/src/gdb-6.8-test]> gdb/gdb

可以正常执行。检查init.c:

[root@ /export/home/scz/src/gdb-6.8-test]> grep cli_interp gdb/init.c extern initialize_file_ftype _initialize_cli_interp; _initialize_cli_interp ();

难怪碰上该BUG的人廖廖无几,绝大多数人不会创建build子目录再编译,只有我们这 种有洁癖的人才会坚持不懈地创建build子目录。而且Linux不存在该BUG。

此时的$(srcdir)是:

.././gdb

拼接$(srcdir)之前的文件列表形如:

i386-tdep.c i387-tdep.c amd64-tdep.c ... cli-interp.c ... xml-tdesc.c xml-builtin.c inflow.c cli/cli-dump.c cli/cli-decode.c cli/cli-script.c ... cli/cli-interp.c ... tui/tui-wingeneral.c tui/tui-winsource.c tui/tui.c

拼接后形如:

.././gdb/cli-interp.c .././gdb/cli/cli-interp.c

[root@ /export/home/scz/src/gdb-6.8-test/gdb]> ls -l .././gdb/cli-interp.c .././gdb/cli-interp.c: No such file or directory [root@ /export/home/scz/src/gdb-6.8-test/gdb]> ls -l .././gdb/cli/cli-interp.c -rw-rw-rw- 1 9176 65490 4486 Jan 2 2008 .././gdb/cli/cli-interp.c

至少有一个可以正确定位cli-interp.c。

比较错误的init.c与正确的init.c,前者缺失:


extern initialize_file_ftype _initialize_cli_dump; extern initialize_file_ftype _initialize_cli_logging; extern initialize_file_ftype _initialize_cli_interp; extern initialize_file_ftype _initialize_mi_out; extern initialize_file_ftype _initialize_mi_cmds; extern initialize_file_ftype _initialize_mi_cmd_env; extern initialize_file_ftype _initialize_mi_interp; extern initialize_file_ftype _initialize_gdb_mi_common; extern initialize_file_ftype _initialize_tui_hooks; extern initialize_file_ftype _initialize_tui_interp; extern initialize_file_ftype _initialize_tui_layout; extern initialize_file_ftype _initialize_tui_out; extern initialize_file_ftype _initialize_tui_regs; extern initialize_file_ftype _initialize_tui_stack; extern initialize_file_ftype _initialize_tui_win;

_initialize_cli_dump (); _initialize_cli_logging (); _initialize_cli_interp (); _initialize_mi_out (); _initialize_mi_cmds (); _initialize_mi_cmd_env (); _initialize_mi_interp (); _initialize_gdb_mi_common (); _initialize_tui_hooks (); _initialize_tui_interp (); _initialize_tui_layout (); _initialize_tui_out (); _initialize_tui_regs (); _initialize_tui_stack (); _initialize_tui_win ();

强调一点,这个BUG不是GDB 6.8特有的,只要在Solaris上,GDB 7.6同样存在该BUG。

不知GDB为什么非要动态生成如此重要的init.c,内里的逻辑是?这又不像yacc/lex 之类的,完全可以静态写好的吧。如果对GDB源码熟悉,应该很容易找到BUG源头。我 不熟,只好一路调试过来。

问题:

自编译GDB执行时提示"Interpreter `console' unrecognized"

临时解决方案:

configure时指定--srcdir参数,或者不建build子目录

回顾一下整个过程:

  1. 发现BUG,稳定重现
  2. 根据错误提示放狗搜索,有人碰上过,但没有真正解决
  3. 研读源码,寻找特征字符串,定位相关函数
  4. DTrace+GDB调试GDB,其实就是查了个调用栈回溯,以及在GDB中查看源码
  5. 找到动态生成的init.c,调试相关Makefile,搞清楚BUG根源
  6. 针对BUG根源configure时指定--srcdir参数,或者不建build子目录,问题解决

经验:

*nix上有些开源软件会动态生成部分源代码,在下载的源码包中较难直接发现这种情 况。如果Source Insight发现缺失某些函数主体,或者某些函数没有主调者,可以考 虑通过动态调试寻找缺失的代码,进而搞清楚来龙去脉。

前面对DTrace的使用不必要,只用GDB就可以达成目标,只不过DTrace很方便,不用 白不用。要知道,不只是Solaris上有DTrace哦。

文中内容属于"Old News",不具有现实意义,但解决过程具有普遍意义,记之备忘。

编译64-bits GDB for Solaris 10与本BUG无本质关联,不在此记录。


说个题外话,现在这些非漏洞、非攻防类的基础CS技术交流,找不到合适的地方了。 过去我们有个APUE小站,还是很不错的。作为CS专业毕业的程序员,我一直很喜欢进 行这类技术交流,没有太多是是非非,只有发现问题、解决问题的简单乐趣。我自己 分享的技术文章当然可以放在此间,但这里终归是个只读区域,无法形成交流。想起 从前的APUE小站,有些淡淡的遗憾。