same exe, same input, different output on different servers

I have an application I wrote to take a xen core dump and build a windows crash dump from it, and with the same input file, on one server it runs to completion and on the other it reports errors to the read and write functions... I have md5sum'd the exe and the input file, and both are identical on each server. Does anyone have a hint on where I might start looking for the problem? It's not a disk space issue, both systems are 64 bit, libc is the same version. Is it possible to run strace in such a way that pointers are removed so I can compare the output between runs on the different machines? Thanks James

James Harper <james.harper@bendigoit.com.au> writes:
[same exe, same input, different output on different servers]
Obviously a dynamic binary uses code outside itself; perhaps one host has a buggy version of a library? Unless you have some reason to believe the two servers are consistent (e.g. they are both Debian 6), it's generally a bad idea to compile a binary in one place and try to run it in another. Have you tried compiling the binary from source on each host?
Is it possible to run strace in such a way that pointers are removed so I can compare the output between runs on the different machines?
You could certainly start by running strace on the broken one, and look at the last page of output. Recompiling the binary with debugging symbols and using gdb would also be a good idea.

I have an application I wrote to take a xen core dump and build a windows crash dump from it, and with the same input file, on one server it runs to completion and on the other it reports errors to the read and write functions...
I have md5sum'd the exe and the input file, and both are identical on each server.
Actually it turns out they aren't equal - the original file is sparse but loses its sparseness when copied to the other server. Strange though... shouldn't sparse blocks be transparent to read() and just return 0's... the read call returns 0 length when reading the sparse block. I think. James

I have an application I wrote to take a xen core dump and build a windows crash dump from it, and with the same input file, on one server it runs to completion and on the other it reports errors to the read and write functions...
I have md5sum'd the exe and the input file, and both are identical on each server.
Actually it turns out they aren't equal - the original file is sparse but loses its sparseness when copied to the other server.
This is turning out even stranger... original xen core dump file is w2k3test.dump: cp --sparse=auto w2k3test.dump w2k3test2.dump cp w2k3test.dump w2k3test3.dump ls -lsk *.dump 528240 -rw------- 1 root root 528234 Feb 7 22:36 w2k3test.dump 1048576 -rw------- 1 root root 528234 Feb 8 11:12 w2k3test2.dump 528240 -rw------- 1 root root 528234 Feb 8 11:13 w2k3test3.dump How can the file consume 2x as many blocks on disk as its actual file size? Or is xfs mis-reporting things? James

James Harper writes:
cp --sparse=auto w2k3test.dump w2k3test2.dump cp w2k3test.dump w2k3test3.dump
ls -lsk *.dump 528240 -rw------- 1 root root 528234 Feb 7 22:36 w2k3test.dump 1048576 -rw------- 1 root root 528234 Feb 8 11:12 w2k3test2.dump 528240 -rw------- 1 root root 528234 Feb 8 11:13 w2k3test3.dump
Possibly also try du & du --apparent-size ?

James Harper writes:
cp --sparse=auto w2k3test.dump w2k3test2.dump cp w2k3test.dump w2k3test3.dump
ls -lsk *.dump 528240 -rw------- 1 root root 528234 Feb 7 22:36 w2k3test.dump 1048576 -rw------- 1 root root 528234 Feb 8 11:12 w2k3test2.dump 528240 -rw------- 1 root root 528234 Feb 8 11:13 w2k3test3.dump
Possibly also try du & du --apparent-size ?
Too late. I'm giving up on xfs and moving the OS to ext3 :) Thanks James
participants (2)
-
James Harper
-
trentbuck@gmail.com