Watchdog

Good introduction to watchdog paranoia
http://www.ganssle.com/watchdogs.htm

IMX

Watchdog daemon
http://www.sat.dundee.ac.uk/psc/watchdog/Linux-Watchdog.html

Also enforce a reboot after a kernel oops
http://www.techrepublic.com/blog/linux-and-open-source/auto-reboot-linux-after-a-kernel-panic/

+CONFIG_PANIC_ON_OOPS=y
+CONFIG_PANIC_TIMEOUT=30

Evtl. also create a panic, hence reboot if task is hung. Not enabled yet, because we want to avoid too many reboots

+CONFIG_LOCKUP_DETECTOR=y
+CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y
+CONFIG_BOOTPARAM_HUNG_TASK_PANIC=y

We should also observe all of our daemons mainloops to verify they make progress

AT91

AT91 faces a hardware bug, that might cause the hw to hang upon reboot. Reboots are done through a complicated way of moving code into the internal SRAM and executing it from there. The watchdog will not do this, since it issues a hard reset.

Question is what is worse, a hung system due to software or to hardware problem. In either case we need human intervention to reset the device, but the hardware failure might not struck us on each reboot. But when hanging in reset, the question is if the watchdog fired off justified or just false alert.