
Quoting Russell Coker (russell@coker.com.au):
A memory test is not just testing the memory. It's testing the motherboard, the socket, and the connection. You can plug a DIMM into a system twice and have it not work the first time but work the second time.
In my experience, some subtle RAM problems get properly exposed only by 'make -j $BIGNUM' kernel compiles inside a non-terminating while loop, as metest86/Memtest86+ sometimes can't catch them. http://linuxmafia.com/pipermail/conspire/2006-December/002662.html http://linuxmafia.com/pipermail/conspire/2006-December/002668.html http://linuxmafia.com/pipermail/conspire/2007-January/002743.html
If a system with multiple DIMMs reports a memory error and you don't know which DIMM is at fault (AFAIK only servers allow you to determine which DIMM is at fault) then the procedure is to remove DIMMs one at a time until the problem stops. Then you add the DIMM back again to see if the problem happens again.
Exactly. Quoting from the second of the two above-cited mailing list postings: In hindsight, there's something else, easy to do, that I should have done / checked right about then: If you have reason to suspect RAM, but for whatever reason can't get a consistent, reproducible symptom, try shuffling around the position of the sticks in their various sockets. Also, if possible, try individual sticks one at a time (i.e., remove the others from the machine for testing purposes). Sometimes, the problem will manifest clearly with the sticks in some configurations but not others, and apparently I'd accidentally stumbled onto one of _those_ configurations where the RAM wasn't reliable tut still tested clean. Also, remember that you must consider other not-known-good parts as suspects. E.g., at a later point in my testing, when I'd seen fairly compelling evidence of both 512 MB sticks having problems in J0 with no other RAM present, I had to consider the possibility that socket J0 itself on the motherboard was intermittant or bad. -- Cheers, "Two women walk into a bar and discuss the Bechdel Test." Rick Moen -- Matt Watson rick@linuxmafia.com McQ! (4x80)