
I have just finished recovering the data from a failing NT4 installation. The dd took _ages_ but reported no i/o errors, but booting failed with a checksum failure and investigation revealed there was a single bit error in a critical file which I subsequently replaced from a download of NT4 sp6a. A single bit error in one file obviously raises the possibility of more such errors... So I have a directory tree of NT4 files in the image I took of the original disk, mounted under Linux, and a flat directory containing the files from service pack 6a, and I want to compare them. There are heaps more files on the disk but these are the only ones I have the originals of. So the disk image fs looks like (for illustrative purposes only - actual filenames may not be correct): /WINDOWS/welcome.exe /WINDOWS/system32/mem.exe /WINDOWS/system32/WINSRV.dll /WINDOWS/file_that_does_not_exist_in_sp6a.dll and the sp6a directory looks like: /sp6a/WELCOME.EXE /sp6a/mem.exe /sp6a/winsrv.exe /sp6a/file_that_does_not_exist_on_installation.dll The files are possibly in different directories, and are possibly different cases. I could write some horribly inefficient shell script to just go through each file in the sp6a directory and find a file with the same name in the /WINDOWS tree and diff it, but maybe there is a better way? This would at least give me a very rough idea of how many errors I might be looking at. Thanks James

Hi James, Okay, here is one method for bulk comparing two directory trees that will give you a start as you can then eliminate some files from further analysis. If you do an md5sum of all files of each tree sending the output to a file for each tre, then you sort those files. Once you have those files, you can start eliminating files that share the same basename and md5 checksum. You could use something other than md5sum, such as sha256sum in the same manner. Cheers -- Kind Regards AndrewM Andrew McGlashan Broadband Solutions now including VoIP

On 22 March 2012 23:34, Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote: <...>
You could use something other than md5sum, such as sha256sum in the same manner.
Using strong crypto would probably be overkill for this particular use-case. -- Joel Shea <jwshea@gmail.com>

On 22 March 2012 23:34, Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote: <...>
You could use something other than md5sum, such as sha256sum in the same manner.
Using strong crypto would probably be overkill for this particular use-case.
Those are just hashes, and I already know that whatever internal crc the drive uses has missed a single bit error, so I'm settling for nothing less than a byte for byte comparison :) (there is less than 200MB of data there, so a comparison isn't expensive in any way) James

James Harper <james.harper@bendigoit.com.au> wrote:
Those are just hashes, and I already know that whatever internal crc the drive uses has missed a single bit error, so I'm settling for nothing less than a byte for byte comparison :)
Md5 and SHA256 won't miss single bit errors. As I remember, they're designed so that single bit or other small differences in the input yield very large differences in the hash value.

I don't think that there is any evidence that a hash collision occurred. An error on the wire or a memory error are both more likely. -- My blog http://etbe.coker.com.au Sent from an Xperia X10 Android phone

I don't think that there is any evidence that a hash collision occurred. An error on the wire or a memory error are both more likely.
The drive itself is failing. It isn't reporting any errors under Linux (although Windows was reporting timeouts) but certain files were taking an incredibly long time to read as the drive silently tries to obtain a good read. It would have to be an astonishing coincidence that there was a wire error or memory error at the same time as I was trying to extract data from a failing disk. I think it is much more reasonable to assume that there was a checksum collision on that sector such that the single bit error wasn't detected. Either the head or the media is failing so it will be returning incorrect data most of the time which is rejected by the controller due to incorrect checksum, but given enough read attempts it is possible that a checksum-valid combination of random bytes could sneak past the checksum verification. I don't know what checksum algorithm the drive uses internally but I bet it isn't nearly as strong as MD5. One thing I guess I haven't considered is that there is a more widespread fault on the drive that could cause data errors past the point of the checksum (eg nearer the IDE interface side)... a failing capacitor causing unclean DC could do this I suppose. The test rig (USB to IDE disk adapter plugged into a Linux machine) has been used many times before without issue, just never on a disk that is failing this badly, so I'm reluctant to suspect that as the culprit. James

On Fri, 23 Mar 2012, James Harper <james.harper@bendigoit.com.au> wrote:
The test rig (USB to IDE disk adapter plugged into a Linux machine) has been used many times before without issue, just never on a disk that is failing this badly, so I'm reluctant to suspect that as the culprit.
My experience with IDE-USB adapters and dodgey disks hasn't been positive. They tend to obscure or mis-report errors. When I suspect a disk as having problems I connect it to the motherboard in it's native manner. It's quite inconvenient to do that, but it's worth the effort IMHO. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

In the end I did this: #!/bin/sh #!/bin/sh find sp6a | while read filename do basename=`basename "$filename"` othername=`find C/WINNT -iname $basename` if [ "$othername" != "" ] then diff "$filename" "$othername" if [ "$?" != "0" ] then ls -al "$filename" "$othername" fi fi done which showed me a few files that have a few possible 0xF0 -> 0x90 difference (comparing the hex dumps)... but that's a double bit transition vs the files from the service pack and I don't know if that's the way the files are supposed to be... James

On 23.03.12 09:06, James Harper wrote:
which showed me a few files that have a few possible 0xF0 -> 0x90 difference (comparing the hex dumps)... but that's a double bit transition vs the files from the service pack and I don't know if that's the way the files are supposed to be...
Now that's more like it. I wondered how a single bit error was supposed to have remained undetected by any half-decent CRC algorithm. Two in the right spots can cancel, however, depending on the algorithm. Erik -- Wizards had always known that the act of observation changed the thing that was observed, and sometimes forgot that it also changed the observer too. Terry Pratchett - Interesting times

Andrew McGlashan wrote:
Hi James,
Okay, here is one method for bulk comparing two directory trees that will give you a start as you can then eliminate some files from further analysis.
If you do an md5sum of all files of each tree sending the output to a file for each tre, then you sort those files. Once you have those files, you can start eliminating files that share the same basename and md5 checksum.
You could use something other than md5sum, such as sha256sum in the same manner.
finddup (from the perforate package) might likewise help. $ ls good-copy/ bad-copy/ $ finddup -l # links duplicate files $ find -links 1 # find files that have no duplicate AFAIK it checks length and sum. You can use finddup -n -v instead to have it merely report duplicates, but IIUC you want to find the NON- duplicates. finddup theoretically can compare dirs in separate areas, but IME it only works properly on $PWD.

On 22 March 2012 21:33, James Harper <james.harper@bendigoit.com.au> wrote:
The files are possibly in different directories, and are possibly different cases. I could write some horribly inefficient shell script to just go through each file in the sp6a directory and find a file with the same name in the /WINDOWS tree and diff it, but maybe there is a better way?
Alternatively, you could probably use 'rsync' to compare two trees, using the following flags in particular; -c, --checksum skip based on checksum -n, --dry-run show what would have been transferred e.g. rsync -cn /WINDOWS /sp6a -- Joel Shea <jwshea@gmail.com>
participants (7)
-
Andrew McGlashan
-
Erik Christiansen
-
James Harper
-
Jason White
-
Joel W Shea
-
Russell Coker
-
Trent W. Buck