[maemo-developers] debugging spontaneous reboot issues with N800/N810

From: Kalle Valo Kalle.Valo at nokia.com
Date: Thu Nov 22 16:23:34 EET 2007
"ext Aleksandr Koltsoff" <czr at iki.fi> writes:

> cat 32wd_to:
> 2
> Question: Does this count the times that the hw-watchdog was triggered
> so far (after last flash)?


> cat sw_rst:
> 1
> Question: Number of times "critical 'system' application crashing or
> being killed by the kernel OOM-killer" happened? (quoted text from the
> wiki page)

More like number of reboots issued after a critical system application
crashed. More than one critical application can crash at the same time :)

> cat lifeguard_restarts:
> /usr/bin/esd : 1
> /usr/sbin/multimediad  : 1
> /usr/sbin/dsp_dld -p --disable-restart \
>   -c /lib/dsp/dsp_dld_avs.conf : 1 *
> (line continuation in the last one was made by me)
> Question: Now this is the interesting part:
> 1) Am I correct to assume that each line records the name of the program
> that was running when the hw watchdog triggered?

No. This file tells that how many times each application has been
restarted. These are not "critical system applications" in a sense
that there is no need to restart (reboot) the whole device, only the
application is restarted. If the application crashed many times in a
row after the device will be restarted.

Or at least I think that's how the things are. DSME terms are a bit
confusing and I always mix them up. But here's how I see them:

reset = whole device is rebooted
restart = application is restarted but device is not rebooted

> 2) And the number after each records the times that the programs were
> running when the watchdog tripped?

No. It's the number of times an application is restarted.

> 3) Asterisk marks the application that caused the last wd timeout
> operation? (so in this case, the spontanous reboot was caused by dsp_dld
> or it at least seems so?)

There is no way to know what cause hardware watchdog reboot. It can be
a problem in kernel, or some userspace application taking all the CPU

So in a summary:

a number of times a watchdog reboot has happened, reason unknown

a number of times the device is rebooted due to a critical application

detailed statistics which application crashed and how many times have
caused device reboot (excluding watchdog reboots)

statistics about which applications have crashed but have not caused
device reboot

Kalle Valo

