
Re screen-scraping URLs, The right way to do it is with XSLT, e.g. http://cyber.com.au/~twb/.bin/fortune-snarf The quick-and-dirty approach I would normally adopt is: curl -sL example.net/page.html | egrep -oi [^\'\"]+.png | wget -i- For relative URLs, --base doesn't work for me, so something like this before the wget: sed s,^,http://example.net/,g If the source is split over multiple pages, map curl -fsL -- example.net/?page={0..999} | where map is http://cyber.com.au/~twb/.bin/map -- assuming that "bad" pages return an HTTP 4xx (the -f makes that propagate upwards). You will also often have to spoof User-Agent (-U/-A) and/or set wget --referer -- the latter usually only needs to match the original domain, e.g. --referer=http://example.net/ will usually suffice, rather than --referer=http://example.net/foo/bar/baz.html