Re: Determining the software list on machines in my network

25 Nov 2013

      On Mon, Nov 25, 2013 at 06:57:14PM +1100, Andrew Greig wrote:
...
OK.  I used the log files I had generated from each machine as output
from rpm -qa, the output was not alphabetical ascending so line for
line the lists were in no way parallel.
I used diff --normal software.log software2.log and got a difference
of 1700 files, my expectation was around 100 files difference but I
have to go through a few thousand to find them in an analog way.
btw, you may find the colordiff program useful. it nicely colourises
diff output. you can use it in place of diff, or you can pipe output
from diff into it.

if you pipe colordiff into less, though, use less's -R option so that it
properly interprets the ANSI color codes.

also BTW, diff's -u or --unified output format tends to be more readable
and shorter than the default (--normal). and is especially nice to read
when colourised with colordiff.
...
Should I pre-sort the files alphabetically ascending before trying
yes. sorting both files will make them more similar to each other, so
diff will find fewer uninteresting differences. the more similar you can
make the files to each other, the better - so strip out any extraneous
data. if all you care about is the package names then just have both
files contain packages names (one per line) and nothing else. if you
care about packagename+version then strip out everything except package
names and version numbers (and either consistently use a single space or
tab between the fields, or use diff's -b or --ignore-space-change option
to ignore only whitespace differences).

extraneous data is just extra irrelevant stuff for diff to notice is
different.

even with pre-sorting diff will still show more differences than you
might expect because the two files aren't merely different versions of
the same file (as is common when using diff for program source code),
they're different files containing similar data.

which is why i suggested a simple program comparing the packages, using
associative arrays to store the package names & versions for each
computer - so you're comparing the actual data rather than the text
files.

a very simple version in perl might look something like this:

(warning, untested, written in this email and not executed. it does
compile with 'perl -c' but expect bugs. intended more as an example
of one simple way to approach the problem than as functioning code).

#!/usr/bin/perl

use strict;

my $f1 = 'filename1.txt';
my $f2 = 'filename2.txt';

# format of both files is assumed to be one entry per line, with 
# fields separated by any amount of whitespace (spaces, tabs),
# with the fields being: packagename version
# e.g.:
#
#coreutils 8.21-1
#findutils 4.4.2-6
#psutils 1.17.dfsg-1
#sharutils 1:4.14-1

# declare associative arrays a and b:
my %a = (); 
my %b = ();

open(F1,"<",$f1) || die "couldn't open $f1: $!\n";
while(<F1>) {
    chomp;
    my ($package,$version) = split;
    $a{$package} = $version;
};
close(F1);

open(F2,"<",$f2) || die "couldn't open $f2: $!\n";
while(<F2>) {
    chomp;
    my ($package,$version) = split;
    $b{$package} = $version;
};
close(F2);

my $p;

foreach $p (sort keys %a) { 
    print "package $p is in a but not in b\n" unless defined($b{$p});

    # if you don't care about version differences, comment out the next line
    print "a has $p $a{$p} but b has $p $b{$p}\n" unless $a{$p} eq $b{$p};

    # note the 'eq' above is a string equality comparison which should
    # be adequate given that versions are strings not numbers. if you
    # need to do greater-than or less-than style comparisons on version
    # strings, there are several perl modules to choose from including
    # Sort::Versions.
};

foreach $p (sort keys %b) { 
    print "package $p is in b but not in a\n" unless defined($a{$p});
}

craig

-- 
craig sanders <cas@taz.net.au>

Re: Determining the software list on machines in my network

Craig Sanders