User:ErAck/WorkFlow

Here I'm noting down my work flow and environment I use when working on code, hopefully giving some useful hints for others.

System

Preferably Debian. Etch did it. Lenny does it better. Which is what I use at home. At work I use Kubuntu, Solaris/SPARC, Solaris/x86, MacOSX/x86, and with a ten-foot pole only when necessary Windows XP.

Shell environment

With the migration to SVN the .svn/* subdirectories' content came in the way when using grep -r ..., so in ~/.bashrc I have

export GREP_OPTIONS='--exclude-dir=.svn'

respectively in ~/.cshrc

setenv GREP_OPTIONS '--exclude-dir=.svn'

Note: On some systems there's an old version of grep that doesn't know that option and bails out if set. This will break the build later, so check by setting the variable and invoke grep to see whether it complains, and if so do not set the variable.

With the migration to Mercurial this is obsolete unless you plan to work on old branches of the SVN repository, or have other sources managed by SVN.

Vim editor

I'm a Vim addict since the mid 90s, so I use the stuff lined out in Editor Vim.

A useful link for gvim

I usually use plain vim in a build environment shell, but to be able to compile sources also from within a detached gvim I setup a symbolic link in $SRC_ROOT after having configured and sourced the build environment:

ln -s LinuxX86Env.Set.sh ENV.$INPATH

for bash respectively

ln -s LinuxX86Env.Set ENV.$INPATH

for tcsh.

Building

Out of habits introduced at Sun Hamburg labs, I usually follow the same naming scheme also at home when setting up CWSs, which is to checkout source code into a subdirectory of $CWS_WORK_STAMP/$WORK_STAMP/ooo, for example cwsname/DEV300/ooo

configure

When I don't need a full-blown tree because I don't plan to work on globally used stuff that would also affect the much disregarded binfilter binary filter module, I of course exclude that from the build, and I exclude many others as well. This boils build time down to 3.5 hours for the entire tree [hey, you Windows guys are getting envious, aren't you? ;-)]. My configure call currently (2009-03-14) is

./configure --enable-dbgutil --disable-strip-solver  --with-use-shell=bash --disable-binfilter --without-fonts --without-ppds --disable-build-mozilla --with-system-stdlibs --disable-systray --with-build-version="Built by erAck" --with-vendor="erAck" --disable-odk --disable-qadevooo --disable-pdfimport --disable-mediawiki --disable-reportdesign --disable-neon --with-system-zlib --with-system-openssl --with-system-jpeg

Especially note

--enable-dbgutil: This builds a non-product version with assertions and various checks during runtime enabled. The output directories are without the .pro extension, for example unxlngi6 instead of unxlngi6.pro
--disable-strip-solver: Symbols are not stripped from the libraries, so we'll have useful information in the debugger for backtraces.
--with-use-shell=bash: use bash shell instead of the default tcsh.
Note: if you do not specify this and source the resulting LinuxX86Env.Set.sh to work in bash, the SHELL environment variable will still contain /bin/tcsh, this may be needed for the build pocess [does it really? I'd consider it a bug], but will interfere with other tools attempting to invoke a shell that is thought to match the current shell, e.g. if you source ENV.$INPATH from within gvim as mentioned above in a useful link for gvim.

More shell variables

I add these to the end of LinuxX86Env.Set.sh, replace cwsname and m42 and the content of my_OOO_TREE as appropriate. For tcsh replace export var="..." with setenv var "..." and add to LinuxX86Env.Set instead.

export CWS_WORK_STAMP="cwsname"
export my_UPDMINOR="m42"
export WORKSPACE_STAMP="$CWS_WORK_STAMP"
export my_OOO_TREE="$HOME/ooo/src/$WORKSPACE_STAMP/$WORK_STAMP"
export TMP="/tmp"
export CCACHE_DIR="$my_OOO_TREE/.ccache_${INPATH}"
ccache -M 2G -F 100000
export LOCALINSTALLDIR="$my_OOO_TREE/inst.${my_UPDMINOR}"
export PKGFORMAT="installed"
export BUILD_COMMAND="perl $SRC_ROOT/solenv/bin/build.pl"

TMP: For some reason (is there any?) configure does not inherit that, so set again.
CCACHE_DIR and ccache: Speeds up things significantly when rebuilding source. Note that the cache directory is setup such that different milestones and product and non-product versions don't interfere.
LOCALINSTALLDIR and PKGFORMAT: Building the installation set in module instsetoo_native creates a directly usable installation instead of packages. The location is, for example, .../cwsname/inst.m42/...
See also GullFOSS entry.
BUILD_COMMAND: build actually is an alias, setting up a variable enables use in inherited shells, from within the editor, or simply in a time $BUILD_COMMAND --all invocation.

Using LOCALINSTALLDIR with a build in instsetoo_native is not deterministic if not building with PKGFORMAT installed or building for several languages or building language packs.

In these cases it is necessary to not set LOCALINSTALLDIR and let the build create deb or rpm packages, and after a complete build use

export LOCALINSTALLDIR=YourLocation
cd $SRC_ROOT/instsetoo_native/util
dmake openoffice_en-US PKGFORMAT=installed

If you also set FORCE2ARCHIVE=TRUE you'd get a .tar.gz archive you could extract to any place.

YMMV.. See this mail for details about LOCALINSTALLDIR.

ccache

On my system I have setup a symbolic link /usr/local/bin/gcc -> /usr/bin/ccache so every source I build uses ccache, I never encountered problems with that. If you want to use ccache selectively for OOo, add the following variables:

export CC="ccache gcc"
export CXX="ccache g++"

build

In the OOo tree's root, effectively being .../cwsname/DEV300/ooo in this example, execute

source LinuxX86Env.Set.sh
./bootstrap
cd instsetoo_native
build --all -- -P2

Note that I don't invoke the dmake command in the tree's root. Reasons are:

Fine grained call of the build command, specifying the number of processes to use. Here I create 2 dmake processes per source directory entered by the build script. As a rule of thumb, use 2 processes per CPU core, so if one is waiting for disk IO the other can do useful things. Yes, this is a much simplified view.. For a CPU having 2 cores, this would make 4 processes. However, instead of simply specifying -P4 for dmake I prefer 2 build processes and 2 dmake processes per build process, which would be
- build -P2 --all -- -P2
In case the build breaks, for example in module svx, after having fixed things the build can be easily continued in that module by retrieving and editing the command line, while still in module instsetoo_native:
- build --all:svx -- -P2
In case I'm short of disk space, which may happen if I build both, product and non-product, I add the --dlv_switch -link option, that creates hard links of the files delivered to the solver instead of copying them, which may save 500MB or so per build.
- build --dlv_switch -link --all -- -P2
Build can create a HTML status page to load in the browser and watch progress. Add the --html option and to still see the console output add --dontgraboutput as well. The HTML page is created as $SRC_ROOT/$INPATH.build.html
- build --html --dontgraboutput --dlv_switch -link --all -- -P2

Ah, and of course I use

time $BUILD_COMMAND --html --dontgraboutput --dlv_switch -link --all -- -P2

instead and hope the build doesn't break so I can see 1:55:03 or some such ;-)

product and non-product

To build a product version additionally to the non-product version it would be enough to configure again but omit the --enable-dbgutil option. However, that would overwrite the already existing environment files, so

mv LinuxX86.Set LinuxX86.non-pro.Set
mv LinuxX86.Set.sh LinuxX86.non-pro.Set.sh
rm ENV.$INPATH
ln -s LinuxX86.non-pro.Set.sh ENV.$INPATH
./configure ... with all other options except --enable-dbgutil ...
# Again, edit the environment files as mentioned above to add variables, but
# this time set LOCALINSTALLDIR to "$my_OOO_TREE/inst_pro.${my_UPDMINOR}"
# to not overwrite the non-pro installation!
mv LinuxX86.Set LinuxX86.pro.Set
mv LinuxX86.Set.sh LinuxX86.pro.Set.sh
source LinuxX86.pro.Set.sh
ln -s LinuxX86.pro.Set.sh ENV.$INPATH

Note that it is not needed to execute ./bootstrap again, as the resulting dmake executable is copied to and executed from solenv/$OUTPATH/bin for both, pro and non-pro. Having renamed the LinuxX86.Set.sh it wouldn't even work because bootstrap sources that.

Note also that $INPATH contains either unxlngi6 for non-product or unxlngi6.pro for product, whereas $OUTPATH contains always unxlngi6 without any extension. This may be confusing, as one would assume $OUTPATH would be used for the output directories. Just remember that $INPATH is used in the solver path to pull in header files and libraries. In fact, $OUTPATH is a base name that gets extended in makefiles with .pro for product output directories, or may be extended with other extensions, for legacy reasons, see solenv/inc/settings.mk

Running

Run the office from the installation. Do not attempt to run it from the solver/bin directory, it won't work. Also do not run it from within the build environment's shell, as libraries from the wrong path would be pulled in and subsequent libraries not be found. Use a clean shell instead. In the example used here, the executable to run would be .../cwsname/DEV300/inst.m42/openoffice.org3/program/soffice, execute it once to verify successful installation and step through the initial wizard.

To load documents attached to issues it is a good idea to set macro security to the highest possible value, i.e. execute macros only if the document resides in a specific location. Go to Tools → Options → OpenOffice.org → Security → Macro Security and choose Security Level Very high. If you need to execute macros to reproduce a bug you may add a directory to Trusted Sources in which you put such documents after having inspected the macros coming with the document.

Other options I modify:

OpenOffice.org → Paths
- Change My Documents to the desired location; I have a bugdocs folder.
Load/Save → General
- Uncheck Save AutoRecovery information every ... minutes.
  It usually gets in the way at the most inconvenient point when debugging.
- Uncheck Size optimization for ODF format.
  It's much easier to view the XML streams with each element on its own line when necessary, instead of everything on just one line.
  Hint: use the XML pretty printer (xmlpp) to reformat streams of documents that were saved with this option enabled.
Language Settings → Languages
- I check Enabled for Asian languages and Enabled for complex text layout (CTL) because I also work on i18n, not needed otherwise.

Little Helpers

To tame the source base I heavily use exuberant ctags and GNU id-utils and sometimes cscope. For scripts generating databases suitable for OOo see the Little Helpers.

Setting up the ctags and ID databases

In the build environment shell I create the ctags and GNU ID-utils databases using the scripts mentioned:

cd $SRC_ROOT
mkid-script '*'
ctags-script --global
cd sc
tagsID '{.,../formula}'

Note that '*' and '{.,../formula}' are enclosed in single quotes to prevent pathname and brace expansion on the command line. This creates

ID: Id-utils database for the entire OOo tree; generating this takes some time, 20 minutes or so.
tags: Tag file for $SRC_ROOT/solver/inc, containing symbols of all delivered header files.
sc/tags: Tag file for modules sc and formula.
sc/ID: Id-utils database for modules sc and formula.
sc/cscope.*: Cscope database for modules sc and formula.

Because I work within the sc module and the compiler and tokens are now derived from classes declared in module formula, I setup the combined databases, so Vim sees them as one entity. The global tags file is pulled in when an identifier is not found in the local tags file, and the global ID file comes in handy when working on changes that affect the entire office, for example to lookup where a certain method is used.

Debugging

Let assume we want to debug a call to an interpreter function. Let further assume we don't know the implementation name and we don't bother looking it up through the chain resource file, resource header file, opcode file, and guess the corresponding interpreter function's method (for details see implementation of spreadsheet functions).

Per debug session, I prefer having a terminal with 3 tabs open:

Shell with build environment, lets call it Build.
Shell from which I run the office, lets call it Run.
Shell for debugger, lets call it Debug.

This way they don't interfere with each other, preserving all stdout/stderr from the office run and having a clean screen for the debugger, which should not be started from within the build environment.

Build selected files with debug

Of course we could build just the entire sc module with debug, but I suppose you're just too impatient to wait for it, as I am. Instead, lets build just some objects with debug in the Build shell:

cd $SRC_ROOT/sc/source/core/tool
dmake killobj
dmake debug=t
cd ../../../util
dmake debug=t

dmake killobj: Removes object files corresponding to all source files in the current directory.
dmake debug=t: Builds all object files with debug that need to be rebuild.

The final dmake debug=t in util links the shared libraries.

The sc/unxlngi6/lib/libscli.so shared library now has debug information for those objects.

mkd script

Now wait, instead of killing all objects in the tools directory, selectively building only interpreter relevant files would be sufficient. A script mkd called as mkd interpr*.cxx is a reusable solution. The second dmake call in the script is to build the object archive or other targets for the directory if all sources were successfully compiled. After that we still need to cd into the util directory to link the library using dmake debug=t. This can be accomplished by passing the --link or -l option to the script, so mkd -l interpr*.cxx does it all.

Formula compiler, tokens, interpreter and document access

To not only have the interpreter with debug but also the compiler and methods that access the document and retrieve values from cells or interpret recursively, this comes handy:

# To get down to the compiler roots add module formula core
cd $SRC_ROOT/formula/source/core/api
mkd -l *.cxx
# Calc related
cd $SRC_ROOT/sc/source/core/tool
mkd compiler*.cxx token*.cxx interpr*.cxx
cd ../data
mkd -l doc*.cxx tab*.cxx col*.cxx cell*.cxx

Run the office

In Run shell

# cd into the 3-layer office library directory
cd .../cwsname/DEV300/inst.m42/openoffice.org/basis3.0/program

# Create a backup of the original libraries and symbolic links to the debug
# libraries, only needed the first time of course:
mkdir bkp
cp -p libsc* bkp
cp -p libvba* bkp
cp -p libfor* bkp
ln -sf ../../../../ooo/sc/unxlngi6/lib/libsc* .
ln -sf ../../../../ooo/sc/unxlngi6/lib/libvba* .
ln -sf ../../../../ooo/formula/unxlngi6/lib/libfor* .

Note: This is explicitly for debugging Calc, other modules, other libraries, of course.. You do not need this to run the office. Creating symbolic links doesn't necessarily work with all libraries, as some of them then won't find the appropriate run path to dlopen other libraries. Some need to be copied instead. This may need experimenting. However, it works for the Calc libraries.

Then execute the office:

../../../openoffice.org3/program/scalc & sleep 8 ; ps

This starts Calc in the background, sleeps for 8 seconds during startup (you may have to adapt the duration on a slower machine), and then displays the process list, one of them being soffice.bin, for example

 1234 pts/6    00:00:01 soffice.bin

Remember the PID or copy it to the clipboard or selection.

Debug the beast

In the Debug shell it is best to cd into some source code subdirectory, for example sc/source/core/tool, otherwise the debugger sometimes may not find included header files when stepping through inline methods. You may of course also use the --cd option instead, or put a cd command in a gdb command file and execute it on startup for frequent use. Consult the documentation for details. Then invoke the gdb debugger with the Text User Interface, gdbtui, and attach it to the office executable already running:

gdbtui --pid=1234

or, if that doesn't work

gdbtui .../cwsname/DEV300/inst.m42/openoffice.org3/program/soffice.bin 1234

where 1234 is of course the actual PID of the running process. If your system doesn't have gdbtui, try gdb -tui or gdb --tui or gdb --interpreter=tui instead. If that still doesn't come up with an extra text frame, you're out of luck or on MacOSX or both ;-)

When it loads the libraries you may have to press the enter key a few times during the listing, and then it waits for command input, with the executable being interrupted. To debug some interpreter function as mentioned above, the central entry point would be to set a breakpoint at ScInterpreter::Interpret(), so enter

b 'ScInterpreter::Interpret()'
c

where b sets a breakpoint, here at the entry of the desired function, and c continues running the program. Note that you don't have to type the full classname::method for the breakpoint, it is sufficient to type an unambiguous portion of the name an press the Tab key for completion. This might be

b 'ScInterpreter::Int<TAB>

Note that the leading single quote is needed for this functionality.

If you now enter a formula in a cell, the debugger breaks as soon as ScInterpreter::Interpret() is reached and you may start stepping through. In command line mode this would be pressing n Enter for next program line, stepping over function calls, or s Enter stepping into function calls. The previous command could be repeated by just pressing Enter.

However, we started with the TUI, so we will take advantage of it. Press Ctrl-x s to switch to SingleKey mode, from now on you can use just n or s. A single c continues execution. Other single keys do different things, experiment or read the fine manual.. You may leave SingleKey mode at any time by pressing q.

As detailed documentation for gdb and TUI sometimes is not installed on systems (check info gdb) and the info program is cumbersome to use if not used to, here's the online documentation: Debugging with GDB and the GDB Text User Interface

Profiling

I use the great Callgrind and KCachegrind tools to create call graphs and see where performance bottlenecks are.

Performance profiling of course needs to be done in a product version, compiled with optimizations and without assertions and test code. So cd into .../cwsname/DEV300/ooo and source ENV.unxlngi6.pro. Everything needed to be done hopefully is explained in profiling OOo with callgrind.