Debugging

From Apache OpenOffice Wiki
Revision as of 12:36, 7 December 2006 by Ericb (talk | contribs)
Jump to: navigation, search

This section assumes use of gdb, from the console. There are also specific notes on Windows Debugging or Mac OS X ( see : MacOSX_Debug_OpenOffice.org_using_XCode ) and hints on Build Problems Debugging

Building with debugging symbols

OO.o includes a way to add debugging code in per module, via the build debug=true command in each module. This also adds lots of runtime assertions, churning warnings etc. in addition to debug symbols - which can be useful. To do just a plain build with debug symbols though use build debug=true dbg_build_only=true or in later versions use build debug=true dbglevel=2 for max output and dbglevel=1 or 0 for less output.

You can also configure OO.o with --enable-symbols to build with symbolic generation.

gdb invocation

If you debug with gdb, you may find that execution stops due to signals at inappropiate locations, especially if running against libgcj and need to debug ignoring its garbage-collection. Best invocation is...

gdb ./soffice.bin
(gdb) handle SIGPWR nostop noprint
(gdb) handle SIGXCPU nostop noprint
(gdb) handle SIG33 nostop noprint
(gdb) run -norestore -writer

replace -writer with -draw/-impress/-calc/... as appropiate. The -norestore option prevents display of the crash reporter (as one frequently kills office during debugging).

Starting at the beginning

We start in 'main' with a sal wrapper, that calls vcl/source/app/svmain.cxx (SVMain). It invokes Main on pSVData->mpApp; but pSVData is an in-line local. To debug this use the pImplSVData global variable. eg:

     p pImplSVData->maAppData

This 'Main' method is typically: desktop/source/app/app.cxx (Main).

Examining strings

We have already seen that OO.o has it's own set of string classes, none of which gdb understands. You need to use: (gdb) print dbg_dump(sWhatEver) to print the contents of a UniString/ByteString/rtl::OUString/rtl::OString regardless of the type when debugging C++ code. See Caolan's write-up for details.

Getting the build order right

The build dependencies of the modules are clearly crucial to getting a clean build. When you type 'build' in a module, first build examines prj/build.list, eg.neon/prj/build.lst:

       xh      neon  :  soltools external expat NULL

this specifies that 'soltools', 'external' and 'expat' have to be satisfactorily built and delivered before neon can be built. Occasionally these rules get broken, and people don't notice for a while.

It crashes, but only in gdb

What fun — you symlinked desktop/unxlngi4.pro/bin/soffice to soffice.bin in your install tree didn't you. That works fine if you just run it, but it seems gdb unpacks the symlink and passes a fully qualified path as argv[0], which defeats the hunting for the binary in the path, so it assigns the program base path as /opt/OpenOffice/OOO_STABLE_1/desktop/unxlngi4.pro/bin and starts looking for (eg. applicat.rdb) in there. Of course when it fails to find any setup information, it silently crashes somewhere else yards away from the original problem.

It crashes, but doesn't crash

For various reasons signal handlers are trapped and life can get rather confusing; thus it's best for builders to apply something like this:

--- sal/osl/unx/signal.c
+++ sal/osl/unx/signal.c
@@ -188,6 +188,8 @@ static sal_Bool InitSignal()
             bSetILLHandler = sal_True;
        }
 
+       bSetSEGVHandler = bSetWINCHHandler = bSetILLHandler = bDoHardKill = sal_False;
+
        SignalListMutex = osl_createMutex();
 
        act.sa_handler = SignalHandlerFunction;

I can't find the code from the trace

Some methods, are described as having a special linkage, such that they can be used in callbacks; these typically have a prefix: 'LinkStub', so search for the latter part of the identifier in a freetext search. eg.

      IMPL_LINK( Window, ImplHandlePaintHdl, void*, EMPTYARG )

builds the 'LinkStubImplHandlePaintHdl' method.

How can I re-build just the files I see in the trace

Often when you run gdb on a build without debugging symbols, you get an unhelpful gdb trace, but yet you can't afford the time/space to recompile all of OO.o with debugging symbols. Thus we have created a small perl helper, which will hunt for & touch files containing the symbols from your trace. This sub-set can then be re-built with debugging enabled for a better trace next time around:

    gdb ./soffice.bin
    ...
    bt
#0  0x40b4e0a1 in kill () from /lib/libc.so.6
#1  0x409acfe6 in raise () from /lib/libpthread.so.0
#2  0x447bcdbd in SfxMedium::DownLoad(Link const&) () from ./libsfx641li.so
#3  0x447be151 in SfxMedium::SfxMedium(String const&, unsigned short, unsigned char, SfxFilter const*, SfxItemSet*) ()
   from ./libsfx641li.so
#4  0x448339d3 in getCppuType(com::sun::star::uno::Reference<com::sun::star::document::XImporter> const*) () from ./libsfx641li.so
...
    quit
    cd base/OOO_STABLE_1/sfx2
    ootouch SfxMedium
    build debug=true
    

Thus, all files referencing / implementing anything with SfxMedium will be touched, and hence rebuilt with debugging symbols.

How can I re-build all the files in one source directory

If you want to recompile the code in just your current directory, you can use the killobj dmake target to remove the object files:

    dmake killobj
    dmake
    

It always crashes in sal_XErrorHdl

You are a victim of asynchronous X error reporting; export SAL_SYNCHRONIZE=1 will make all the X traffic synchronous, and report the error by the method that caused it, it'll also make OO.o far slower, and the timing different.

It silently fails to load my word file

Caolan suggests: put breakpoints in ww8par.cxx top and tail of SwWW8ImplReader::LoadDoc, and confirm that the document gets as far as the import filter.

A handy human place to put a breakpoint is in SwWW8ImplReader::ReadPlainChars, you can see chunks of text as they are read in. Alternatively SwWW8ImplReader::AppendTxtNode as each paragraph is inserted.

How do I use the debug console ?

So OO.o contains some hefty debugging infrastructure; pictured here

Enabling it is pretty easy - what you need is a so-called Non-Product Build.

By default, an OpenOffice.org build is a Product Build, i.e. ready for release after completion. If you specifiy the --enable-dbgutil switch during configure, then your environment will be prepared for a Non-Product Build - with lots of additional diagnostic tools.

Note that libraries from product and non-product builds are usually incompatible, so don't mix them in the same installation.

For available tools in non-product builds, have a look at the various DBG_foo macros in tools/debug.hxx, or, if you already are knowledgeable about this, let others participate by writing your knowledge down here.

To actually fire up the debug settings dialog, press <ctrl><alt><shift>-D.

Excel Interop debugging

This is fairly easy; edit sc/source/filter/inc/biffdump.hxx, define EXC_INCL_DUMPER to 1, and re-build 'sc'. Also, copy sc/source/filter/excel/biffrecdumper.ini to ~. Then run soffice.bin foo.xls and you should get a foo.txt with the debug data in it.

The trace shows a crash in 'poll'

OO.o is a fairly threaded program, you're prolly just looking at the wrong thread: there are not likely to be bugs in poll. Use thread apply all backtrace to get a backtrace of all threads - this will most likely fail. When it does do: thread 1 then bt - most crashers occur in the 'main' thread.

What does this trace mean ?

There are several typical stack-traces that come up again and again, one would be:

#15 0x4164a501 in raise () from /lib/tls/libc.so.6
#16 0x4164bcd9 in abort () from /lib/tls/libc.so.6
#17 0x415fb5a5 in std::set_unexpected ()
   from /home/mnagashree/m72install/program/libstdc++.so.5
#18 0x415fb5e2 in std::terminate ()
   from /home/mnagashree/m72install/program/libstdc++.so.5
#19 0x415fb69c in __cxa_rethrow ()
    

This section of trace means (essentially) that an exception was thrown - but there was no-one trying to catch it. Often this means there was a missing 'try {} catch()' clause in one of the calling frames.

A great way to debug exceptions is to add a breakpoint in catch/throw, do this with catch throw or catch catch in gdb.

STLport and checking iterators

The STL is a powerful tool but it also makes it easy - in the grand old C/C++ tradition - to shoot one selves in the foot, as we all know. STL containers and algorithms are now pervasive in OOo, so there is a need to validate the use of STL constructs in OO.o to find hidden problems.

Fortunately the STLport library - the default STL implementation for OO.o - has a powerful debug mode, and it's easy to use. Since SRC680 m128 it is possible to use the environment variable USE_STLP_DEBUG to switch on the STLport debug mode, since SRC680 m150 it works for Windows, too

The most useful part of the STLport debug mode is iterator checking. Doing the OO.o smoke test and some little additional random testing we already found a number of questionable STL constructs.

Only code paths which are exercised will be tested by the STLport debug mode, though. If STLport finds a questionable STL usage it will throw an assertion and terminate. It is usually quite easy to extract a precise stack trace.

Some notes:

  • STLport debug mode iterators are no pointers! We've cleaned up all occurrences of the lazy - and wrong - usages of iterators as pointers in SRC680 m128/m150, but maybe something new has already crept in. This clean up also helps with other STL implementations, like the one which comes with gcc-4.x
  • A complete recompile is necessary, the debug modes renders all objects with STL constructs binary incompatible
  • The STLport debug mode breaks the complexity assertions of the STL. Theoretically some operations should be much slower in debug mode than in product mode. In practice I didn't notice a real slowdown of OO.o.
Personal tools