
On Mon, Nov 25, 2013 at 06:57:14PM +1100, Andrew Greig wrote:
OK. I used the log files I had generated from each machine as output from rpm -qa, the output was not alphabetical ascending so line for line the lists were in no way parallel.
I used diff --normal software.log software2.log and got a difference of 1700 files, my expectation was around 100 files difference but I have to go through a few thousand to find them in an analog way.
btw, you may find the colordiff program useful. it nicely colourises diff output. you can use it in place of diff, or you can pipe output from diff into it. if you pipe colordiff into less, though, use less's -R option so that it properly interprets the ANSI color codes. also BTW, diff's -u or --unified output format tends to be more readable and shorter than the default (--normal). and is especially nice to read when colourised with colordiff.
Should I pre-sort the files alphabetically ascending before trying
yes. sorting both files will make them more similar to each other, so diff will find fewer uninteresting differences. the more similar you can make the files to each other, the better - so strip out any extraneous data. if all you care about is the package names then just have both files contain packages names (one per line) and nothing else. if you care about packagename+version then strip out everything except package names and version numbers (and either consistently use a single space or tab between the fields, or use diff's -b or --ignore-space-change option to ignore only whitespace differences). extraneous data is just extra irrelevant stuff for diff to notice is different. even with pre-sorting diff will still show more differences than you might expect because the two files aren't merely different versions of the same file (as is common when using diff for program source code), they're different files containing similar data. which is why i suggested a simple program comparing the packages, using associative arrays to store the package names & versions for each computer - so you're comparing the actual data rather than the text files. a very simple version in perl might look something like this: (warning, untested, written in this email and not executed. it does compile with 'perl -c' but expect bugs. intended more as an example of one simple way to approach the problem than as functioning code). #!/usr/bin/perl use strict; my $f1 = 'filename1.txt'; my $f2 = 'filename2.txt'; # format of both files is assumed to be one entry per line, with # fields separated by any amount of whitespace (spaces, tabs), # with the fields being: packagename version # e.g.: # #coreutils 8.21-1 #findutils 4.4.2-6 #psutils 1.17.dfsg-1 #sharutils 1:4.14-1 # declare associative arrays a and b: my %a = (); my %b = (); open(F1,"<",$f1) || die "couldn't open $f1: $!\n"; while(<F1>) { chomp; my ($package,$version) = split; $a{$package} = $version; }; close(F1); open(F2,"<",$f2) || die "couldn't open $f2: $!\n"; while(<F2>) { chomp; my ($package,$version) = split; $b{$package} = $version; }; close(F2); my $p; foreach $p (sort keys %a) { print "package $p is in a but not in b\n" unless defined($b{$p}); # if you don't care about version differences, comment out the next line print "a has $p $a{$p} but b has $p $b{$p}\n" unless $a{$p} eq $b{$p}; # note the 'eq' above is a string equality comparison which should # be adequate given that versions are strings not numbers. if you # need to do greater-than or less-than style comparisons on version # strings, there are several perl modules to choose from including # Sort::Versions. }; foreach $p (sort keys %b) { print "package $p is in b but not in a\n" unless defined($a{$p}); } craig -- craig sanders <cas@taz.net.au>