Copying part of web page into LibreOffice Writer document

Hi All, Wondering if any LibreOffice wizards out there... Have copied and pasted part of a web page (text and pictures) into a LibreOffice Writer document. After saving and reopening the document, I note that the pictures will NOT appear, unless I am on-line. They need to be downloaded from the net =each= time I access the document. How can I overcome that? Have tried right-click and paste, as well as Edit > Past Special > HTML. Am I not exercising some option in the Save As command? Should I tweak some setting in the Tools > Options? Note: I'm saving in MS Word 97...2003 format, and set my default that way. Thanks very much, Carl Turney Melbourne

Carl Turney <carl@boms.com.au> wrote:
Have copied and pasted part of a web page (text and pictures) into a LibreOffice Writer document.
After saving and reopening the document, I note that the pictures will NOT appear, unless I am on-line. They need to be downloaded from the net =each= time I access the document.
That's presumably because it's saving the links in the document rather than downloading the images and embedding them in the document. I don't know how to fix this - I generally don't use LibreOffice except as a back-end for running file format conversions from the shell.

On Mon, 28 May 2012 03:24:48 pm Carl Turney wrote:
Have copied and pasted part of a web page (text and pictures) into a LibreOffice Writer document.
After saving and reopening the document, I note that the pictures will NOT appear, unless I am on-line. They need to be downloaded from the net =each= time I access the document.
The browser is working as expected, saving the currently selected object (typically the HTML document), and not any other documents on the page. There is no simple way to get around this by using just your browser, other than downloading each item manually. To begin with Save-As wouldn't know which object to ignore and which to download. it may also slow your connection if the page were to have a lot of large file links or other resources so it's reasonable for only the selected object to be downloaded - worse if you only need one object, say the HTML source. Save-As authors are also saving themselves potential issues if third party linked content is downloaded where permission is not otherwise granted. LibreOffice is working as I would expect also - after all it doesn't know whether or not you actually meant to embed content instead of linking to it. I daresay there's no easy way to go around it. You may need to try right- clicking on each image on the website, downloading it, and doing an Import into your LibreOffice document.

Mark Johnson <mark@chronopia.net> wrote:
I daresay there's no easy way to go around it. You may need to try right- clicking on each image on the website, downloading it, and doing an Import into your LibreOffice document.
If you use wget to download the Web page, then LibreOffice to convert it to ODF, does it still link to the images or are they embedded in the ODF document then?

On Mon, 28 May 2012 04:39:45 pm Jason White wrote:
Mark Johnson <mark@chronopia.net> wrote:
I daresay there's no easy way to go around it. You may need to try right- clicking on each image on the website, downloading it, and doing an Import into your LibreOffice document.
If you use wget to download the Web page, then LibreOffice to convert it to ODF, does it still link to the images or are they embedded in the ODF document then?
Firstly you need to make sure wget will only get the appopriate content. They will also still be links but you will already have the images and other content to Import into the document and replace or remove the links.

Mark Johnson <mark@chronopia.net> wrote:
They will also still be links but you will already have the images and other content to Import into the document and replace or remove the links.
Relevant thread here: http://listarchives.libreoffice.org/global/users/msg09563.html

Hi All, SOLVED Thanks to Mark Johnson for that link (below) to the LibreOffice site. It gives a good solution. Basically, I simply copy and paste any or all of a web page from my browser into a LibreOffice document. Then, in LibreOffice, I choose Edit > Links > Break Link for each link in the document. Finally, I save the document in whatever format desired. I've tested it, by re-loading the document while off-line, and it works great. (Faster, too.) This is excellent, as I keep finding that "the best" (for me) web pages often disappear from the net, and I like having a local copy. Cheers, Carl Turney On 28/05/12 17:12, Jason White wrote:
Mark Johnson<mark@chronopia.net> wrote:
They will also still be links but you will already have the images and other content to Import into the document and replace or remove the links.
Relevant thread here: http://listarchives.libreoffice.org/global/users/msg09563.html
_______________________________________________ luv-talk mailing list luv-talk@lists.luv.asn.au http://lists.luv.asn.au/listinfo/luv-talk

On Tue, 29 May 2012, Carl Turney <carl@boms.com.au> wrote:
This is excellent, as I keep finding that "the best" (for me) web pages often disappear from the net, and I like having a local copy.
If what you want is a local copy of a web site then why not just use "wget - r"? From memory that doesn't fix up href's, but that can be done with sed. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hi, I often like to edit out stuff (e.g. ads etc.) from the page. Cheers, Carl On 29/05/12 08:25, Russell Coker wrote:
On Tue, 29 May 2012, Carl Turney<carl@boms.com.au> wrote:
This is excellent, as I keep finding that "the best" (for me) web pages often disappear from the net, and I like having a local copy.
If what you want is a local copy of a web site then why not just use "wget - r"? From memory that doesn't fix up href's, but that can be done with sed.

On Tue, 29 May 2012, Carl Turney <carl@boms.com.au> wrote:
I often like to edit out stuff (e.g. ads etc.) from the page.
There is nothing preventing you from editing that from the copy that wget provides. You can edit it with your favourite text editor (vi, emacs, etc) as HTML files are text. You can edit it with one of the many web editing programs, I don't use such programs but I'm sure someone here can recommend a good one. For common types of advert (of which Google Adsense is the best example) there are usually only a few ways of formatting them. It shouldn't be difficult to write a little Perl program that goes through the output of "wget -r" and then removes all Adsense codes. Such a program wouldn't necessarily even need to be able to properly parse HTML, it could just look for blocks of Adsense codes. Even if you didn't make it smart enough to handle line-breaks in unexpected places it would catch more than 90% of all Adsense adverts. Someone has probably already written such a program, Google might find it. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

Hi,
There is nothing preventing you from editing that from the copy that wget provides.
I'm keeping such info only for personal reference, and prefer Word docs, so no need to keep them in HTML format.
You can edit it with your favourite text editor (vi, emacs, etc) as HTML files are text.
I remember being pretty good at those editors, long ago and far away. Then I got sucked into the world of EDLIN, NotePad, MSWrite, MSWord, GEdit, etc. So, inevitably, my brain turned to mush and I forgot all there was about those powerful/flexible terminal-based commands and utilities.
You can edit it with one of the many web editing programs, I don't use such programs but I'm sure someone here can recommend a good one.
I was quite happy with Kompozer for web editing, until one of its updates started messing up any pages altered, and active support disappeared. After =much= searching I finally found SeaMonkey Composer. (I don't use SeaMonkey for anything else. Too bad Composer isn't available separately.) Free. Linux compatible. Good switching between WYSIWYG and source code modes. Lots of features. Currently supported.
It shouldn't be difficult to write a little Perl program that goes through the output of "wget -r" and then removes all Adsense codes.
There are many different kinds of items, on most web pages, that are superfluous to me, so I just copy and paste the "gems" that I am focused on. Besides, I'm trying very hard NOT to get back into being a computer techie. Am just doing enough to my system so I can use it as a tool to achieve non-computing goals. Cheers, Carl Turney Bottom-line Ownership and Management Services www.boms.com.au

Carl Turney <carl@boms.com.au> wrote:
I remember being pretty good at those editors, long ago and far away. Then I got sucked into the world of EDLIN, NotePad, MSWrite, MSWord, GEdit, etc. So, inevitably, my brain turned to mush and I forgot all there was about those powerful/flexible terminal-based commands and utilities.
I would find the step backwards from Emacs, Vi, the shell, etc., very painful and my frustration levels would go straight up. It would be like going back to preschool after having grown up.

On Tue, May 29, 2012 at 11:45:53AM +1000, Russell Coker wrote:
You can edit it with your favourite text editor (vi, emacs, etc) as HTML files are text.
You can edit it with one of the many web editing programs, I don't use such programs but I'm sure someone here can recommend a good one.
bluefish is a reasonably good GUI HTML/CSS/etc editor if you like that sort of thing. (i don't, i prefer vi...but i've occasionally used bluefish and it's OK)
For common types of advert (of which Google Adsense is the best example) there are usually only a few ways of formatting them. It shouldn't be difficult to write a little Perl program that goes through the output of "wget -r" and then removes all Adsense codes. [...]
yep. doing stuff like this manually is crazy when you can automate it with a shell, sed, awk, perl, python, or whatever script. perl and python even have very comprehensive libraries for easily parsing, searching, and manipulating HTML files. it takes time to write the script, of course, but it's a lot more interesting (and usefully educational) to do than repetitive manual editing work and once the script is written, it's: a) nearly instantaneous to run b) consistent and predictable in its effects c) not prone to human error, tiredness, distraction, or just plain forgetting to do something, etc craig -- craig sanders <cas@taz.net.au> BOFH excuse #21: POSIX compliance problem

On Tue, May 29, 2012 at 08:39:52AM +1000, Carl Turney wrote:
I often like to edit out stuff (e.g. ads etc.) from the page.
you can edit html easily enough with vi. or emacs. even nano if you're perverse :) certainly easier than editing it in a word processor like open office. craig -- craig sanders <cas@taz.net.au> BOFH excuse #414: tachyon emissions overloading the system

On Tue, May 29, 2012 at 08:25:39AM +1000, Russell Coker wrote:
If what you want is a local copy of a web site then why not just use "wget - r"?
agreed. good solution.
From memory that doesn't fix up href's, but that can be done with sed.
there are options to make wget convert links (e.g. img src hrefs) to local/relative urls when mirroring. from wget --help: -k, --convert-links make links in downloaded HTML or CSS point to local files. more details in the man pages / docs. craig -- craig sanders <cas@taz.net.au> BOFH excuse #285: Telecommunications is upgrading.
participants (5)
-
Carl Turney
-
Craig Sanders
-
Jason White
-
Mark Johnson
-
Russell Coker