
On Tue, 29 May 2012, Carl Turney <carl@boms.com.au> wrote:
I often like to edit out stuff (e.g. ads etc.) from the page.
There is nothing preventing you from editing that from the copy that wget provides. You can edit it with your favourite text editor (vi, emacs, etc) as HTML files are text. You can edit it with one of the many web editing programs, I don't use such programs but I'm sure someone here can recommend a good one. For common types of advert (of which Google Adsense is the best example) there are usually only a few ways of formatting them. It shouldn't be difficult to write a little Perl program that goes through the output of "wget -r" and then removes all Adsense codes. Such a program wouldn't necessarily even need to be able to properly parse HTML, it could just look for blocks of Adsense codes. Even if you didn't make it smart enough to handle line-breaks in unexpected places it would catch more than 90% of all Adsense adverts. Someone has probably already written such a program, Google might find it. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/