
On Mon, Nov 25, 2013 at 09:47:02PM +1100, Craig Sanders wrote:
On Mon, Nov 25, 2013 at 06:57:14PM +1100, Andrew Greig wrote:
OK. I used the log files I had generated from each machine as output from rpm -qa, the output was not alphabetical ascending so line for line the lists were in no way parallel.
I used diff --normal software.log software2.log and got a difference of 1700 files, my expectation was around 100 files difference but I have to go through a few thousand to find them in an analog way.
btw, you may find the colordiff program useful. it nicely colourises diff output. you can use it in place of diff, or you can pipe output from diff into it.
if you pipe colordiff into less, though, use less's -R option so that it properly interprets the ANSI color codes.
also BTW, diff's -u or --unified output format tends to be more readable and shorter than the default (--normal). and is especially nice to read when colourised with colordiff.
Should I pre-sort the files alphabetically ascending before trying
yes. sorting both files will make them more similar to each other, so diff will find fewer uninteresting differences. the more similar you can make the files to each other, the better - so strip out any extraneous
Another option if going the 'diff' route is 'git diff --no-index'. One nice thing about git's diff (especially if you have really, really large files) is that it's clever enough to check whether the lines are in sorted order as it goes, providing a massive increase in speed. Also has all the usual goodies (colour, ignore whitespace, ignore case, patience algorithm etc.) Karl