RE: comparing bulk files

23 Mar 2012

      ...
I don't think that there is any evidence that a hash collision occurred.  An
error on the wire or a memory error are both more likely.
The drive itself is failing. It isn't reporting any errors under Linux (although Windows was reporting timeouts) but certain files were taking an incredibly long time to read as the drive silently tries to obtain a good read. It would have to be an astonishing coincidence that there was a wire error or memory error at the same time as I was trying to extract data from a failing disk.

I think it is much more reasonable to assume that there was a checksum collision on that sector such that the single bit error wasn't detected. Either the head or the media is failing so it will be returning incorrect data most of the time which is rejected by the controller due to incorrect checksum, but given enough read attempts it is possible that a checksum-valid combination of random bytes could sneak past the checksum verification. I don't know what checksum algorithm the drive uses internally but I bet it isn't nearly as strong as MD5.

One thing I guess I haven't considered is that there is a more widespread fault on the drive that could cause data errors past the point of the checksum (eg nearer the IDE interface side)... a failing capacitor causing unclean DC could do this I suppose.

The test rig (USB to IDE disk adapter plugged into a Linux machine) has been used many times before without issue, just never on a disk that is failing this badly, so I'm reluctant to suspect that as the culprit.

James

RE: comparing bulk files

James Harper