[maemo-developers] [maemo-developers] defective memory? (was: problem with dspmp3sink)

From: Siarhei Siamashka siarhei.siamashka at gmail.com
Date: Tue Sep 19 03:06:47 EEST 2006
On Tuesday 19 September 2006 00:03, you wrote:


> An interesting observation is that you need to gradually increase the size
> of tested memory block. You need to start with testing 20MB first, then you
> can try 25MB and so on up to 43MB. If you try to allocate and test a large
> block of  memory too early, memtester will just get killed.
> As for the failures, only the last two hex digits of faulty address always
> contain 'a5' and it is a bit strange. I expected that offset within a page
> would remain the same (I changed malloc to mmap in order to always allocate
> memory buffer at a page boundary ) and unless pages have size equal to 256
> bytes, it is inconsistent.

A small update. As I checked manual [1], a minimal page size for arm926ej-s
cpu is in fact 1KB (tiny page). So inconsistency is now resolved.

I have patched memtester to gradually allocate memory starting from 20MB
to the size specified in a command line, so it is possible to check larger
blocks without any extra tricks, you can download this modified memtester
here: http://ufo2000.xcomufo.com/files/memtester-n770.tar.gz

If you are going to try it (and it may be a really good idea), it should be
run as root. The first argument is the size of memory block to be tested (in
megabytes), the second optional argument is the number of passes.

Here is a result of running it on my device:

Nokia770-26:/media/mmc1# ./memtester 40 1
memtester version 4.0.5 (32-bit)
Copyright (C) 2005 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 40MB (41943040 bytes)
got  40MB (41943040 bytes), virtual address=0x40128000, trying 
mlock ...locked.
Loop 1/1:
  Stuck Address       : testing   0FAILURE: possible bad address line at 
offset 0x009899a5 (page offset 1a5).
Skipping to next test...
  Random Value        : FAILURE: 0x3f770c1e != 0x3f770000 at offset 0x004899a5 
(page offset 1a5).
FAILURE: 0xc50dee8d != 0xc50d0000 at offset 0x004899a5 (page offset 1a5).
  Compare XOR         : FAILURE: 0x0e119ff2 != 0x0e100000 at offset 0x004899a5 
(page offset 1a5).
  Compare SUB         : FAILURE: 0x7d558974 != 0x5ca00000 at offset 0x004899a5 
(page offset 1a5).
  Compare MUL         :   Compare DIV         : ok
FAILURE: 0x7febf0e8 != 0x7feb0000 at offset 0x004899a5 (page offset 1a5).
  Compare OR          : FAILURE: 0x7b69b068 != 0x7b690000 at offset 0x004899a5 
(page offset 1a5).
  Compare AND         :   Sequential Increment: ok
  Solid Bits          : testing   1FAILURE: 0xffffffff != 0xffff0000 at offset 
0x004899a5 (page offset 1a5).
  Block Sequential    : testing   1FAILURE: 0x01010101 != 0x01010000 at offset 
0x004899a5 (page offset 1a5).
  Checkerboard        : testing   0FAILURE: 0xaaaaaaaa != 0xaaaa0000 at offset 
0x004899a5 (page offset 1a5).
  Bit Spread          : testing   0FAILURE: 0xfffffffa != 0xffff0000 at offset 
0x004899a5 (page offset 1a5).
  Bit Flip            : testing   0FAILURE: 0x00000001 != 0x00000000 at offset 
0x004899a5 (page offset 1a5).
  Walking Ones        : testing   0FAILURE: 0xfffffffe != 0xffff0000 at offset 
0x004899a5 (page offset 1a5).
  Walking Zeroes      : testing   0FAILURE: 0x00000001 != 0x00000000 at offset 
0x004899a5 (page offset 1a5).

So faulty address is always reported to have offset 1a5 within a page on 
every run. Now the next thing to do is to identify physical address for use
with BadRAM kernel patch.

> I also wanted to detect physical address of a faulty memory region. I tried
> to open '/dev/mem', read it one page at a time and compare its content with
> the data from a faulty page. Unfortunately this does not work on Nokia 770
> and segfaults on reading from '/dev/mem'. The same code works fine on
> desktop x86 pc and has no problems identifying physical address for any
> page. Test programs were always run as root.

I would really like to hear something from Nokia regarding this problem. There
may be a few other devices with faulty memory considering some browser crash
reports, reboots and instability for some people, a possible example can be
seen here (though the reporter did not run the memory test as adviced): 

That's not a tragedy and software solution can probably resolve this problem. 
As you know, bad blocks are common for flash and jffs2 file system handles
this issue. RAM can be probably treated in a similar way by using something
like BadRAM kernel patch [2]

[1] http://www.arm.com/pdfs/DDI0198D_926_TRM.pdf
[2] http://rick.vanrein.org/linux/badram/

More information about the maemo-developers mailing list