[luv-main] Two questions: bash script review and awk/sed/grep

Hi, Following on from the recent bash script reviews, if anyone would like to comment on this one, please do: https://code.vpac.org/gitorious/patchman/patchman/blobs/master/client/patchm... The above script attempts to obtain a list of installed packages and enabled repos on debian/centos hosts, and sends that list to a central server. However, I'm having some sed/grep/awk issues. I'm trying to parse the output of "yum repolist enabled --verbose" and return each repo on a line as so 'rpm' 'this-repo-name' 'http://url1' 'http://url2' The output is similar to the following: Repo-id : extras Repo-name : CentOS-5 - Extras Repo-updated : Tue Oct 25 03:18:58 2011 Repo-pkgs : 237 Repo-size : 124 M Repo-baseurl : http://ftp.swin.edu.au/centos/5/extras/x86_64/, : http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/, : http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/ Repo-expire : 3,600 second(s) (last: Wed Nov 16 12:39:48 2011) So I just need the Repo-name and the Repo-baseurls for each repo. If there is one baseurl, my version works, but if there are multiple, it breaks: # yum repolist enabled --verbose | egrep "Repo-name|Repo-baseurl" | sed -e ':a;N;$!ba;s/\nRepo-baseurl//g' -e "s/Repo-name[ ]*: /'/g" -e "s/[ ]*:[ ]\+/' '/g" | sed -e "s/$/'/g" -e "s/'/ ${host_arch}'/2" -e "s/\/'/'/g" -e "s/ ' '/' '/" 'CentOS-5 - Base' 'http://ftp.swin.edu.au/centos/5/os/x86_64/,' 'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/,' 'CentOS-5 - Updates' 'http://ftp.swin.edu.au/centos/5/updates/x86_64/,' The extra comma at the end of the url is not meant to be there, and the additional baseurls are missing. Are there any awk or sed gurus that can help me get the final part of this? Thanks, Marcus. -- Marcus Furlong

Hi Marcus, I copied your example into a file (because I do not have a RPM based Linux at hand)
Repo-id : extras Repo-name : CentOS-5 - Extras Repo-updated : Tue Oct 25 03:18:58 2011 Repo-pkgs : 237 Repo-size : 124 M Repo-baseurl : http://ftp.swin.edu.au/centos/5/extras/x86_64/, : http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/, : http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/ Repo-expire : 3,600 second(s) (last: Wed Nov 16 12:39:48 2011)
and piped into the following quick&dirty awk script: #!/usr/bin/awk -f $1=="Repo-name" {printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "} $1=="Repo-baseurl" {url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} url=1 {if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} else url=0} The result is: $ cat /tmp/mom | /tmp/mom.awk 'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/' (One line actually) Happy with it? Sorry for not debugging your sed scripts, I have a problem to understand my own;-) (That's why I prefer awk as long as it seems doable by awk, it is easier to understand and read than sed) Regards Peter

On Wed, Nov 16, 2011 at 15:52, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Marcus,
I copied your example into a file (because I do not have a RPM based Linux at hand)
Repo-id : extras Repo-name : CentOS-5 - Extras Repo-updated : Tue Oct 25 03:18:58 2011 Repo-pkgs : 237 Repo-size : 124 M Repo-baseurl : http://ftp.swin.edu.au/centos/5/extras/x86_64/, : http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/, : http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/ Repo-expire : 3,600 second(s) (last: Wed Nov 16 12:39:48 2011)
and piped into the following quick&dirty awk script:
#!/usr/bin/awk -f
$1=="Repo-name" {printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "} $1=="Repo-baseurl" {url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} url=1 {if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} else url=0}
The result is:
$ cat /tmp/mom | /tmp/mom.awk 'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/'
(One line actually)
Happy with it?
Hi Peter, Thanks, yes! It almost does the trick. If there were multiple repos (e.g. the above text repeated again a few times), how would I go about adding a newline after each one? I've tried to modify your example but my awk-fu is not good! E.g. the following is on two lines: 'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/' 'CentOS-5 - Updates' 'http://ftp.swin.edu.au/centos/5/updates/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/updates/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/updates/x86_64/'
Sorry for not debugging your sed scripts, I have a problem to understand my own;-) (That's why I prefer awk as long as it seems doable by awk, it is easier to understand and read than sed)
Haha, yeah I often cringe when I have to go back months later and try to understand what I wrote in sed. Thanks, Marcus. -- Marcus Furlong

Hi Marcus, first I had an error (spot the url=1 instead of url==1 condition?) so it did not work that well for multiple appearences. However, this one should do: #!/usr/bin/awk -f { if ($1=="Repo-name") { printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "; } if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else {url=0; print "";} } } } I made it a bit more "C-like" in appearance but the logic is the same. Regards Peter Quoting "Marcus Furlong" <furlongm@gmail.com>:
On Wed, Nov 16, 2011 at 15:52, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Marcus,
I copied your example into a file (because I do not have a RPM based Linux at hand)
Repo-id : extras Repo-name : CentOS-5 - Extras Repo-updated : Tue Oct 25 03:18:58 2011 Repo-pkgs : 237 Repo-size : 124 M Repo-baseurl : http://ftp.swin.edu.au/centos/5/extras/x86_64/, : http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/, : http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/ Repo-expire : 3,600 second(s) (last: Wed Nov 16 12:39:48 2011)
and piped into the following quick&dirty awk script:
#!/usr/bin/awk -f
$1=="Repo-name" {printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "} $1=="Repo-baseurl" {url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} url=1 {if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF printf "'"out"' "} else url=0}
The result is:
$ cat /tmp/mom | /tmp/mom.awk 'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/'
(One line actually)
Happy with it?
Hi Peter,
Thanks, yes! It almost does the trick. If there were multiple repos (e.g. the above text repeated again a few times), how would I go about adding a newline after each one? I've tried to modify your example but my awk-fu is not good! E.g. the following is on two lines:
'CentOS-5 - Extras' 'http://ftp.swin.edu.au/centos/5/extras/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/extras/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/extras/x86_64/' 'CentOS-5 - Updates' 'http://ftp.swin.edu.au/centos/5/updates/x86_64/' 'http://ftp.monash.edu.au/pub/linux/CentOS/5/updates/x86_64/' 'http://mirror.aarnet.edu.au/pub/centos/5/updates/x86_64/'
Sorry for not debugging your sed scripts, I have a problem to understand my own;-) (That's why I prefer awk as long as it seems doable by awk, it is easier to understand and read than sed)
Haha, yeah I often cringe when I have to go back months later and try to understand what I wrote in sed.
Thanks, Marcus.
-- Marcus Furlong

On Thu, Nov 17, 2011 at 12:36, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Marcus,
first I had an error (spot the url=1 instead of url==1 condition?) so it did not work that well for multiple appearences.
However, this one should do:
#!/usr/bin/awk -f { if ($1=="Repo-name") { printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "; } if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else {url=0; print "";} } } }
I made it a bit more "C-like" in appearance but the logic is the same.
Thanks Peter, this one works perfectly! Is it possible to run the above on the command line so I can process through the pipeline without an external awk script? I've been playing with the formatting but keep getting "unexpected newline or end of string" along with plenty of syntax errors... Marcus. -- Marcus Furlong

Hi Markus, in principle it works, you can put all stuff in one line and run it between single quotes as a awk command line. There is a little script doing this for you: cat $my_awk_file | awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}' In general it works, however, you have to escape the single quotes _inside_ the script. "'" has to be written as "'"'"'" Well, that syntax hurts;-) especially if you want to write a awk line that processes an awk script to get an awk line.. And than you get: cat $my_awk_file | \ awk -F"'" '{for (i=1; i<NF; i++) printf $i"'"'"'""\"""'"'"'""\"""'"'"'"; print $NF}' | \ awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}' The appearing output as a command line to do what the awk script does: cat $my_file | awk '{ if ($1=="Repo-name") {printf "'"'"'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"'"'"' "} if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else {url=0; print "";} } } }' Well, that's all in one line now - but who can read that? Feel free to use whatever you like;-) Regards Peter Quoting "Marcus Furlong" <furlongm@gmail.com>:
On Thu, Nov 17, 2011 at 12:36, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Marcus,
first I had an error (spot the url=1 instead of url==1 condition?) so it did not work that well for multiple appearences.
However, this one should do:
#!/usr/bin/awk -f { if ($1=="Repo-name") { printf "'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"' "; } if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"out"' "; } else {url=0; print "";} } } }
I made it a bit more "C-like" in appearance but the logic is the same.
Thanks Peter, this one works perfectly! Is it possible to run the above on the command line so I can process through the pipeline without an external awk script? I've been playing with the formatting but keep getting "unexpected newline or end of string" along with plenty of syntax errors...
Marcus. -- Marcus Furlong

On Thu, Nov 17, 2011 at 15:56, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Markus,
in principle it works, you can put all stuff in one line and run it between single quotes as a awk command line.
There is a little script doing this for you:
cat $my_awk_file | awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
In general it works, however, you have to escape the single quotes _inside_ the script.
"'" has to be written as "'"'"'"
Well, that syntax hurts;-) especially if you want to write a awk line that processes an awk script to get an awk line..
And than you get:
cat $my_awk_file | \ awk -F"'" '{for (i=1; i<NF; i++) printf $i"'"'"'""\"""'"'"'""\"""'"'"'"; print $NF}' | \ awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
The appearing output as a command line to do what the awk script does:
cat $my_file | awk '{ if ($1=="Repo-name") {printf "'"'"'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"'"'"' "} if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else {url=0; print "";} } } }'
Well, that's all in one line now - but who can read that?
Agreed, it's a bit unwieldy!
Feel free to use whatever you like;-)
Will continue to investigate options, thanks again for your help, at least I have something that works now :) Marcus.

On 22/11/11 14:06, Marcus Furlong wrote:
On Thu, Nov 17, 2011 at 15:56, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Markus,
in principle it works, you can put all stuff in one line and run it between single quotes as a awk command line.
There is a little script doing this for you:
cat $my_awk_file | awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
In general it works, however, you have to escape the single quotes _inside_ the script.
"'" has to be written as "'"'"'"
Well, that syntax hurts;-) especially if you want to write a awk line that processes an awk script to get an awk line..
And than you get:
cat $my_awk_file | \ awk -F"'" '{for (i=1; i<NF; i++) printf $i"'"'"'""\"""'"'"'""\"""'"'"'"; print $NF}' | \ awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
The appearing output as a command line to do what the awk script does:
cat $my_file | awk '{ if ($1=="Repo-name") {printf "'"'"'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"'"'"' "} if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else {url=0; print "";} } } }'
Well, that's all in one line now - but who can read that? Agreed, it's a bit unwieldy!
Feel free to use whatever you like;-) Will continue to investigate options, thanks again for your help, at least I have something that works now :)
Another option is to use perl. sed, awk and bash are all useful tools for simple tasks, but as the complexity of what you are doing rises, they get unwieldy fast. Perl tends to scale better. You're at least bordering on where perl would be cleaner if treated as a text processing exercise. Alternatively, it looks like there's some CPAN modules for binding to the RPM API directly, so you would have a data structure to navigate rather than a text processing exercise. If what you have is working, then I don't expect it's worth re-doing it, but if you're building further on this, it might be worth thinking about. Andrew

On Thu, Nov 24, 2011 at 14:23, Andrew McNaughton <andrewmcnnz@gmail.com> wrote:
On 22/11/11 14:06, Marcus Furlong wrote:
On Thu, Nov 17, 2011 at 15:56, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
Hi Markus,
in principle it works, you can put all stuff in one line and run it between single quotes as a awk command line.
There is a little script doing this for you:
cat $my_awk_file | awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
In general it works, however, you have to escape the single quotes _inside_ the script.
"'" has to be written as "'"'"'"
Well, that syntax hurts;-) especially if you want to write a awk line that processes an awk script to get an awk line..
And than you get:
cat $my_awk_file | \ awk -F"'" '{for (i=1; i<NF; i++) printf $i"'"'"'""\"""'"'"'""\"""'"'"'"; print $NF}' | \ awk '{if (NR>1) for (i=1; i<=NF; i++) printf $i" "}'
The appearing output as a command line to do what the awk script does:
cat $my_file | awk '{ if ($1=="Repo-name") {printf "'"'"'"; for (i=3; i<NF; i++) printf $i" "; printf $NF"'"'"' "} if ($1=="Repo-baseurl") { url=1; comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else { if (url==1) { if ($1==":") { comma=match($NF,","); if (comma) out=substr($NF,1,comma-1); else out=$NF; printf "'"'"'"out"'"'"' "; } else {url=0; print "";} } } }'
Well, that's all in one line now - but who can read that? Agreed, it's a bit unwieldy!
Feel free to use whatever you like;-) Will continue to investigate options, thanks again for your help, at least I have something that works now :)
Another option is to use perl. sed, awk and bash are all useful tools for simple tasks, but as the complexity of what you are doing rises, they get unwieldy fast. Perl tends to scale better.
You're at least bordering on where perl would be cleaner if treated as a text processing exercise. Alternatively, it looks like there's some CPAN modules for binding to the RPM API directly, so you would have a data structure to navigate rather than a text processing exercise.
If what you have is working, then I don't expect it's worth re-doing it, but if you're building further on this, it might be worth thinking about.
Yes we were thinking about this, but for the client program, we were trying to keep external dependencies to a minimum. Similar projects (like pakiti) also try to keep dependencies on the client to a minimum, but they don't deal with uploading information about repos, only packages. There are also python modules for yum, rpm, deb and apt, and from a cursory glance at /usr/share/yum-cli/yumcommands.py it would seem relatively easy to get the information from the yumrepo data structures in there. We can also be sure that if yum/apt are installed then the relevant python libraries are already installed on the client. Given that the rest of the project is python this might be one way of doing it. Another way I was thinking of doing it was to upload the full output of "yum repolist" directly to the server, and parse it server-side using python. This would be ok for centos hosts, however for debian hosts, the output is harder to decipher. "apt-cache policy" on squeeze seems to give exactly what I want, but on lenny, the repo urls are incomplete. It seems I would need to parse "apt-cache policy" and combine it with the output of "apt-cache dump | grep -A10 "^File"". The nice thing about using apt-cache policy is that it gives the repo priorities, along with package priorities if there are any (i.e. pinned packages), and having this information would be great. Currently we use "apt-get update --print-uris" but this loses information about each repo, that could be used to determine if a given repo is a mirror of another (e.g. release v=6.0.3,o=Debian,a=stable,n=squeeze,l=Debian,c=main). It also (in lenny/squeeze, fixed in wheezy) deletes the gpg files associated with a repo if run as root. Grr. Another alternative would be upload the /etc/apt/*.list and /etc/yum.repos.d/*.repo files directly and again perform server side parsing. Plenty of options, not enough time to try them all! :) Marcus. -- Marcus Furlong
participants (3)
-
Andrew McNaughton
-
Marcus Furlong
-
Peter Ross