[maemo-developers] debugging spontaneous reboot issues with N800/N810

From: Kalle Valo Kalle.Valo at nokia.com
Date: Thu Nov 22 16:23:34 EET 2007
"ext Aleksandr Koltsoff" <czr at iki.fi> writes:

> cat 32wd_to:
> 2
> Question: Does this count the times that the hw-watchdog was triggered
> so far (after last flash)?

Yes.

> cat sw_rst:
> 1
> Question: Number of times "critical 'system' application crashing or
> being killed by the kernel OOM-killer" happened? (quoted text from the
> wiki page)

More like number of reboots issued after a critical system application
crashed. More than one critical application can crash at the same time :)

> cat lifeguard_restarts:
> /usr/bin/esd : 1
> /usr/sbin/multimediad  : 1
> /usr/sbin/dsp_dld -p --disable-restart \
>   -c /lib/dsp/dsp_dld_avs.conf : 1 *
> (line continuation in the last one was made by me)
>
> Question: Now this is the interesting part:
> 1) Am I correct to assume that each line records the name of the program
> that was running when the hw watchdog triggered?

No. This file tells that how many times each application has been
restarted. These are not "critical system applications" in a sense
that there is no need to restart (reboot) the whole device, only the
application is restarted. If the application crashed many times in a
row after the device will be restarted.

Or at least I think that's how the things are. DSME terms are a bit
confusing and I always mix them up. But here's how I see them:

reset = whole device is rebooted
restart = application is restarted but device is not rebooted

> 2) And the number after each records the times that the programs were
> running when the watchdog tripped?

No. It's the number of times an application is restarted.

> 3) Asterisk marks the application that caused the last wd timeout
> operation? (so in this case, the spontanous reboot was caused by dsp_dld
> or it at least seems so?)

There is no way to know what cause hardware watchdog reboot. It can be
a problem in kernel, or some userspace application taking all the CPU
time.

So in a summary:

32wd_to:
a number of times a watchdog reboot has happened, reason unknown

sw_rst: 
a number of times the device is rebooted due to a critical application
crashing

lifeguard_resets: 
detailed statistics which application crashed and how many times have
caused device reboot (excluding watchdog reboots)

lifeguard_restarts:
statistics about which applications have crashed but have not caused
device reboot

-- 
Kalle Valo

More information about the maemo-developers mailing list