[maemo-developers] [maemo-developers] defective memory? (was: problem with dspmp3sink )

From: Siarhei Siamashka siarhei.siamashka at gmail.com
Date: Sun Sep 10 23:14:05 EEST 2006
On Sunday 10 September 2006 11:36, Olivier ROLAND wrote:

> Your test work fine on my device.
> I see that you run it from /media/mmc1so I guess you format your memory
> card with ext2.
> Mine still vfat so I can't. If you got same error when running from
> internal memory then your device is broken.

Thanks a lot for finding time and running the test.

Today in the morning I could not reproduce this bug. The device battery just
was recharged during night. As nothing else was changed (I checked uptime to
be sure that it did not reboot or something), I see three possible
explanations (may be wrong, I'm not hardware expert):
* page with the faulty memory bit was allocated to some other process
* cpu or memory chip was just overheated because of heavy use and the
bug disappeared as the temperature got back to normal
* maybe the bug is somewhat related to low battery charge level, maybe the
battery was unable to provide enough voltage or something for reliable

I did some search and found this utility for testing memory on non-x86
hardware: http://pyropus.ca/software/memtester/
For those who are lazy to compile it, the binary is here:

After playing with the device for some time, I got the same problem with lzma
program this evening. And memtester also confirms that the memory 
is really defective :(

# ./memtester 20
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 20MB (20971520 bytes)
got  20MB (20971520 bytes), trying mlock ...locked.
Loop 1:
  Stuck Address       : testing   0FAILURE: possible bad address line at 
offset 0x0037e9a5.
Skipping to next test...
  Random Value        : FAILURE: 0xdeb98374 != 0xdeb90000 at offset 
FAILURE: 0xd04629fc != 0xd046aa88 at offset 0x000fe9a4.
  Compare XOR         : FAILURE: 0x50467c54 != 0x50460000 at offset 
  Compare SUB         : FAILURE: 0xb069e1c0 != 0xdc200000 at offset 

By the way, I have seen some reports about random device reboots, maybe 
these people also suffer from defective memory problem. So maybe it is a 
good idea for everyone to test their memory. Though use it at your own risk, I
can't be sure that this test program is working correctly and always provides
valid results (I only found it today).

Well, as now the problem is identified, it is time to think how to solve it.

The first task is making a proper memory testing utility. As memtester needs
to allocate memory for testing and lots of memory is already taken by IT OS 
software and libraries, we can only test a small part of memory (only ~1/3 in
the test above). Maybe it is possible to patch kernel (or it already provides
such functionality) to allocate any physical memory page for us (relocating
its data to some other place if it is already occupied by some other process).
If it is possible, we would be able to check all the physical memory except
for probably the part occupied by the kernel itself.

The next task would be to make some way to use BadRAM kernel patch on 
Nokia 770: http://rick.vanrein.org/linux/badram/
Preferably physical addresses of the defective parts of memory should be
stored somewhere so that they survive reflashing (r&d mode and other flags 
are stored in such a way, right?). If BadRAM patch becomes a part of 
standard Nokia 770 kernel, it can help to make use of the memory chips that
otherwise would have to be replaced. I wonder how much does Nokia 770 
memory chip cost?

By the way, maybe Nokia already has some utility for hardware diagnistics 
and it could become available for download? There would be no need to 
reinvent the wheel in this case.

More information about the maemo-developers mailing list