LCA videos, OpenWorm, Scicast etc

LCA2014 videos are up at http://mirror.linux.org.au/pub/linux.conf.au/2014/ ~12-13Gb Short audio interviews by Otto Benschop at https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM I'll burn some DVDs for the picnic OpenWorm an open source project dedicated to creating a virtual C. elegans nematode - may interest http://www.openworm.org/index.html Have any LUV Rasbian users tried Mathematica/ the Wolfram language yet? https://scicast.org/#/about - a prediction market study may interest

On 12/01/14 14:19, Rodney Brown wrote:
LCA2014 videos are up at http://mirror.linux.org.au/pub/linux.conf.au/2014/ ~12-13Gb
For Internode customers it is also on their mirror.

Hi, On 12/01/2014 2:19 PM, Rodney Brown wrote:
LCA2014 videos are up at http://mirror.linux.org.au/pub/linux.conf.au/2014/ ~12-13Gb Short audio interviews by Otto Benschop at https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM
Is there an single archive file for all of the those mp3 files? (if they are on mirror.linux.org.au then I couldn't see them) Thanks A.

Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
Is there an single archive file for all of the those mp3 files?
Do you mean a tar file? No there isn't, but you could use rsync, ftp or whatever you wish to transfer the video files to another machine. I'm not sure that I understand your question though.

Jason White <jason@jasonjgw.net> writes:
Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
Is there an single archive file for all of the those mp3 files?
Do you mean a tar file? No there isn't, but you could use rsync, ftp or whatever you wish to transfer the video files to another machine. I'm not sure that I understand your question though.
Obs. rsync mirror.internode.on.net::linux.conf.au/2014/Wednesday/ And variations thereof. Passing a destination will copy instead of listing the contents. YMMV, etc.

On 13/01/2014 3:53 PM, Trent W. Buck wrote:
Obs. rsync mirror.internode.on.net::linux.conf.au/2014/Wednesday/ And variations thereof. Passing a destination will copy instead of listing the contents. YMMV, etc.
I haven't got an Internode service, but I do have unlimited data on my DSL service. rsync / ftp is no problem, as I mentioned, the problem is the mp3 files that are stored in Google drive -- if I could ftp or rsync them, it would be good, but it would be much better if those files were added to the mirror.linux.org.au directory for this year. Cheers A.

Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
rsync / ftp is no problem, as I mentioned, the problem is the mp3 files that are stored in Google drive -- if I could ftp or rsync them, it would be good,
Based on the package description, grive may be cable of meeting your need.

On 13/01/2014 4:23 PM, Jason White wrote:
Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
rsync / ftp is no problem, as I mentioned, the problem is the mp3 files that are stored in Google drive -- if I could ftp or rsync them, it would be good,
Based on the package description, grive may be cable of meeting your need.
Thanks, but I'm not sure it would suit. From what I can tell and understand, grive works /somewhat/ and most likely works for files you a single account and not those shared by other accounts. Cheers A.

Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> writes:
On 13/01/2014 3:53 PM, Trent W. Buck wrote:
Obs. rsync mirror.internode.on.net::linux.conf.au/2014/Wednesday/ And variations thereof. Passing a destination will copy instead of listing the contents. YMMV, etc.
I haven't got an Internode service, but I do have unlimited data on my DSL service.
rsync / ftp is no problem, as I mentioned, the problem is the mp3 files that are stored in Google drive -- if I could ftp or rsync them, it would be good, but it would be much better if those files were added to the mirror.linux.org.au directory for this year.
Ah, sorry. I saw MP4's on that mirror and thought that's what you were talking about. I guess the google doodad has different content.

On 13/01/2014 3:02 PM, Jason White wrote:
Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
Is there an single archive file for all of the those mp3 files?
Do you mean a tar file? No there isn't, but you could use rsync, ftp or whatever you wish to transfer the video files to another machine. I'm not sure that I understand your question though.
I'm not talking about the mirror.linux.org.au files, they are simple to get -- the problem is with the Google drive mp3 files. Can you ftp or rsync that area? Or better still, can those mp3 files be placed on the linux.org.au server in the 2014 directory? Thanks A.

On Mon, Jan 13, 2014 at 1:47 AM, Andrew McGlashan <andrew.mcglashan@affinityvision.com.au> wrote:
Hi,
On 12/01/2014 2:19 PM, Rodney Brown wrote:
LCA2014 videos are up at http://mirror.linux.org.au/pub/linux.conf.au/2014/ ~12-13Gb Short audio interviews by Otto Benschop at https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM
Is there an single archive file for all of the those mp3 files? (if they are on mirror.linux.org.au then I couldn't see them)
Thanks A.
You should be able to use HTTrack hiddensoul@qball:~$ apt-cache show httrack Package: httrack Priority: optional Section: universe/web Installed-Size: 98 Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com> Original-Maintainer: Xavier Roche <roche@httrack.com> Architecture: amd64 Version: 3.47.21-1 Depends: libc6 (>= 2.14), libhttrack2 (>= 3.47.21) Suggests: webhttrack, httrack-doc Filename: pool/universe/h/httrack/httrack_3.47.21-1_amd64.deb Size: 19986 MD5sum: 3f4e801ccc35472f42da01b1eee5a482 SHA1: ba033f6f13202dd7cdcc5d017aa3e67588822c04 SHA256: 68f7803787154bb1efc110ef4898e6d791e90f84e0b71ece6db6bca14bc29637 Description-en_AU: Copy websites to your computer (Offline browser) HTTrack is an offline browser utility, allowing you to download a World Wide website from the Internet to a local directory, building recursively all directories, getting html, images, and other files from the server to your computer. . HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system. You can apply arguments to only download certain file types ie. .mp3 I do use HTTrack but have not tried it on a shared gdrive folder but looking at the links for each mp3 shows https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/nameoffile.mp3 so it should work YMMV -- Mark "Pockets" Clohesy Mob Phone: (+61) 406 417 877 Email: hiddensoul@twistedsouls.com G-Talk: mark.clohesy@gmail.com GNU/Linux..Linux Counter #457297 - "I would love to change the world, but they won't give me the source code" "Linux is user friendly...its just selective about who its friends are"

This does it: curl -Ss "https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/"|sed 's`<a href`\n<a href`g'|sed 's`</a>.*``; s`>.*$``; s`^<a href="`wget -c "https://googledrive.com`'|grep ^wget > a.a;chmod +x a.a; ./a.a;rm a.a [all a single line] A little more thinking and I'm sure it could be greatly improved, but I'm getting the files fine. Cheers A.

On 13/01/2014 5:19 PM, Andrew McGlashan wrote:
This does it:
curl -Ss "https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/"|sed 's`<a href`\n<a href`g'|sed 's`</a>.*``; s`>.*$``; s`^<a href="`wget -c "https://googledrive.com`'|grep ^wget > a.a;chmod +x a.a; ./a.a;rm a.a
The following is neater and cleaner: #!/bin/bash ( curl -Ss "https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/"| \ sed 's`<a href`\n<a href`g'| \ grep '^<a href'| \ awk -F\" '{print "wget -c \"https://googledrive.com"$2"\""}' ) | tee y.y chmod +x y.y ./y.y rm y.y Cheers A.

On 13.01.14 17:44, Andrew McGlashan wrote:
The following is neater and cleaner:
#!/bin/bash
( curl -Ss "https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/"| \ sed 's`<a href`\n<a href`g'| \ grep '^<a href'| \ awk -F\" '{print "wget -c \"https://googledrive.com"$2"\""}' ) | tee y.y chmod +x y.y ./y.y rm y.y
Ah, that is much easier to read. Just one annoying suggestion from a backseat driver; The following are equivalent: grep '^foo' | \ awk '{print "wget -c bar "}' and awk '/^foo/ {print "wget -c bar "}' I.e. The core of awk is that it is a line processor which runs blocks of C-like text processing code against those input lines which match a set of regex or literal text triggers. Also, to simplify quoting, inclusion of shell variables can be done with: awk -F '/^foo/ {print "wget -c bar '$2' "}' A simple demo of that: $ x=fred $ echo | awk '{print "$x is '$x'"}' $x is fred IIUC, the sed line is just adding line breaks at href tags. Setting RS to a regex (in awk) would allow awk to see the input as lines broken only at those tags, obviating the need for sed as well. Hopefully that's interesting and/or useful. Erik -- The future is a race between education and catastrophe. - H. G. Wells

On 13/01/2014 6:58 PM, Erik Christiansen wrote:
Ah, that is much easier to read. Just one annoying suggestion from a backseat driver; The following are equivalent:
grep '^foo' | \ awk '{print "wget -c bar "}'
and
awk '/^foo/ {print "wget -c bar "}'
I.e. The core of awk is that it is a line processor which runs blocks of C-like text processing code against those input lines which match a set of regex or literal text triggers.
Also, to simplify quoting, inclusion of shell variables can be done with:
awk -F '/^foo/ {print "wget -c bar '$2' "}'
Yes, I did that too, but didn't re-post again.
IIUC, the sed line is just adding line breaks at href tags. Setting RS to a regex (in awk) would allow awk to see the input as lines broken only at those tags, obviating the need for sed as well.
A bit of playing with RS didn't bear fruit, but then I found the page has completely changed -- so that's probably why.
Hopefully that's interesting and/or useful.
Yes, useful, but as the entire page has changed now -- much different to what it was -- so everything broke. Cheers A.

Re screen-scraping URLs, The right way to do it is with XSLT, e.g. http://cyber.com.au/~twb/.bin/fortune-snarf The quick-and-dirty approach I would normally adopt is: curl -sL example.net/page.html | egrep -oi [^\'\"]+.png | wget -i- For relative URLs, --base doesn't work for me, so something like this before the wget: sed s,^,http://example.net/,g If the source is split over multiple pages, map curl -fsL -- example.net/?page={0..999} | where map is http://cyber.com.au/~twb/.bin/map -- assuming that "bad" pages return an HTTP 4xx (the -f makes that propagate upwards). You will also often have to spoof User-Agent (-U/-A) and/or set wget --referer -- the latter usually only needs to match the original domain, e.g. --referer=http://example.net/ will usually suffice, rather than --referer=http://example.net/foo/bar/baz.html

On Mon, Jan 13, 2014, at 17:44, Andrew McGlashan wrote:
#!/bin/bash
( curl -Ss "https://googledrive.com/host/0B2KTxndVFSKuSjZKTDRaTlpJcmM/"| \ sed 's`<a href`\n<a href`g'| \ grep '^<a href'| \ awk -F\" '{print "wget -c \"https://googledrive.com"$2"\""}' ) | tee y.y chmod +x y.y ./y.y rm y.y
You could also avoid the temporary executable file by piping the generated commands directly into the shell, sh or bash or whatever. Something I do all the time: generate-commands | less to check that the generated commands look OK, then up-arrow to get the command back, and edit it to generate-commands | sh to run them, where "generate-commands" is stuff like the above (though I mostly use Perl). -- Smiles, Les.
participants (8)
-
Allan Duncan
-
Andrew McGlashan
-
Erik Christiansen
-
Hiddensoul (Mark Clohesy)
-
Jason White
-
Les Kitchen (LUV)
-
Rodney Brown
-
trentbuck@gmail.com