GOAL: fetch all crash dumps, send them to a central server, decode the backtraces automatically

Links

http://nick-black.com/dankwiki/index.php/Core -- gcore
http://stackoverflow.com/questions/2065912/core-dumped-but-core-file-is-not-in-current-directory
https://fedorahosted.org/abrt/wiki
https://code.google.com/p/google-breakpad/
http://www.freedesktop.org/software/systemd/man/systemd-coredumpctl.html

Basics

Verify coredumps work on your platform

ulimit -c unlimited
sleep 100 &
kill -SEGV %1

Check that a coredump has been created:

$ tail -3 /var/log/messages
Mar  5 08:48:59 dSS-kit user.notice root: Received coredump /var/log/coredumps/sleep_1425545339_signal11.core, compressing...
Mar  5 08:49:00 dSS-kit user.notice root: Compressed coredump /var/log/coredumps/sleep_1425545339_signal11.core.gz
Mar  5 08:49:00 dSS-kit user.notice root: COUNT: 3, STORE COUNT: 10

If the corewatch packet is not installed, check for core file in your local directory. If that doesn't exist check core_pattern

$ cat /proc/sys/kernel/core_pattern
null
$ echo core > /proc/sys/kernel/core_pattern

Enable coredumps globally

https://sigquit.wordpress.com/2009/03/13/the-core-pattern/

# /etc/sysctl.conf
kernel.core_pattern=/var/log/coredumps/%e.core

http://www.akadia.com/services/ora_enable_core.html

# /etc/login.defs
ULIMIT         2097152    

TODO: latter is not tested / enable in busybox?

Corewatch packet

Packet should be installed by default on all devel/testing installations

opkg install corewatch
pkill -SEGV ds485p

Check /var/log/coredumps for a new coredump, see in /var/log/messages what went wrong. If it doesn't work check the content of the core_pattern

cat /proc/sys/kernel/core_pattern
|/usr/sbin/coredump.sh /var/log/coredumps/%e_%t_signal%s.core %e %t %s

Generate coredump from gdb

https://www.sourceware.org/gdb/onlinedocs/gdb/Core-File-Generation.html

(gdb) generate-core-file

Generate coredump from cmdline

http://man7.org/linux/man-pages/man1/gcore.1.html

gcore [-o filename] pid

Decoding

https://git.digitalstrom.org/dss-misc/coredump-decoder

Light -- coredump was generated from up-to-date system

install cross compiled gdb, since the corefiles are not x86 binaries

fetch repository https://git.digitalstrom.org/dss-misc/coredump-decoder

Then decode the coredump:

./bin/coredecode-devel.sh <corefile> <program-name>

There will be plenty of warnings about missing debug symbols, ignore
The output can be picked up in the folder output-devel

Archive -- coredump was generated from binary that has been superseded

All binaries ever shipped are archived in huge (100GBytes) repository.
The coredumps from corewatch have the package and git revision encoded.
So the binary that matches the coredump can be retrieved from the archive automatically.

$ ./utilites/coredecode.sh dss_1.22.2+gitr2786+30a69e3-r1.96#1382332040_signal4.gz

Enable coredump on production system

ds485p/ds485d

modifiy the run script

--- run.orig
+++ run

+ulimit -c unlimited
 exec 2>&1
 $DAEMON $OPTIONS

dSS coredumps

if ulimit is missing, rare nowadays, there is a fallback in dss to enable coredumps via the POSIX API

--- run.orig
+++ run
@@ -80,6 +80,7 @@
          $OPT_DSM_FIRMWARE \
          $OPT_WEBSERVICE \
          -w /usr/share/dss-web/webroot \
+         --coredumps=true \
         " 

Redirect coredumps

Either the core_pattern is null (discard) or places the coredump in the current folder /etc/runit/<daemon>, which is inconvenient.
Rather redirect them into a central place

Redirect to /var/log/coredump/<progname>.core
Do not append timestamp, or any other discerning features, let coredumps overwrite themselves otherwise you need to enable some removal scheme

$ mkdir /var/log/coredumps
$ echo /var/log/coredumps/%e.core > /proc/sys/kernel/core_pattern

Create a few coredumps, make sure they overwrite the same file on the filesystem, by checking the date.

$ killall -SEGV ds485p

Be careful here. If coredumps do not overwrite each other, the filesystem will start to fill up. Bad things will happen then. Evtl. turn it off after you got what you need. Also consider that coredumps contain sensitive user data, such as passwords or tokens. Treat them with care.

Decode the coredump

See decoding section above

Decode coredump from self compiled binary / outside of default feeds packaged

you still need the latest toolchain, since you need a cross compiled gdb is can be found here:
http://update.aizo.com/feeds/dss11-devel-eglibc/sdk/

. /usr/local/angstrom-eglibc-x86_64-armv5te/environment-setup-armv5te-angstrom-linux-gnueabi
gdb <dss-binary-matching-core> <core>
(gdb) set sysroot <your-path-here>/dss-oe/angstrom-devel-build-eglibc/sysroots/dss11-1gb
(gdb) t a a bt

Remote debugging

Start with a core file of an up-to-date system:

./bin/coredecode-light.sh -i -t testing <corefile> <dss>

Mind the '-i' option this will bringup an gdb prompt. Inspect the backtrace to check all the symbols are resolved correctly, then redirect to your target

gdb> target remote 192.168.1.17:234

Don't forget to run gdbserver on your target first:

$ gdbserver :234 --attach pid