
Message: 5 Date: Fri, 23 Dec 2016 15:12:05 +1100 From: Craig Sanders <cas@taz.net.au> To: Luv Main <luv-main@luv.asn.au> Subject: Re: /usr/bin/env Message-ID: <20161223041205.qbkvfszdiy4tpakh@taz.net.au> Content-Type: text/plain; charset=us-ascii
On Fri, Dec 23, 2016 at 02:44:28PM +1100, Andrew Mather wrote:
Module files are generally set up by the admins, so they don't require anything more from the user than including the appropriate loading statements in their scripts. It's not unlike a wrapper script really.
it sounds similar to (but quite a bit more advanced than) what i've done in the past with wrapper scripts and collections of environment setting files sourced (#included) as needed.
Yep. Pretty much. It's not uncommon in scientific computing to need multiple versions of compilers and various bits of software compiled against a range of different libraries and the like. You have to retain old versions of software, often long past its use-by date in case someone queries a scientific paper based on using that particular version. By using a chain of module load commands the user can easily set up an environment very different from the current OS state (apart from the kernel itself), repeatably, across an entire cluster if needs be. They can even swap environments around between various steps in a script if that is needed. Obviously it's overkill for some requirements and won't suit everyone's Modus Operandi, but well worth knowing about if that's the sort of thing you need to do. -- - https://picasaweb.google.com/107747436224613508618 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- "Voting is a lot like going to Bunnings really: You walk in confused, you stand in line, you have a sausage on the way out and at the end, you wind up with a bunch of useless tools" Joe Rios -

On Friday, 23 December 2016 8:13:02 PM AEDT Andrew Mather via luv-main wrote:
It's not uncommon in scientific computing to need multiple versions of compilers and various bits of software compiled against a range of different libraries and the like. You have to retain old versions of software, often long past its use-by date in case someone queries a scientific paper based on using that particular version.
If you are particularly concerned about such things you wouldn't want to have a system where lots of different versions of the software were installed side by side. You would want a VM/chroot image with the exact software in question. The amount of storage space isn't an issue by today's standards. A plain text representation of a human DNA scan is 3G which is probably larger than the complete OS and all software needed to analyse it. But if you really want to reproduce things you need a copy of the same hardware (different releases of CPU families can give different floating point answers etc) and the same OS kernel. I've heard a lot of scientific computing people talk about a desire to reproduce calculations, but I haven't heard them talking about these issues so I presume that they haven't got far in this regard. http://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-... Not that it matters, minor issues like these pale into insignificance when compared to the above. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/

On Fri, Dec 23, 2016 at 08:22:45PM +1100, russell@coker.com.au wrote:
I've heard a lot of scientific computing people talk about a desire to reproduce calculations, but I haven't heard them talking about these issues so I presume that they haven't got far in this regard.
it was a big issue when i was at unimelb (where i built a HPC cluster for the chemistry dept and later worked on the nectar research cloud). depending on the funding source or the journal that papers were published in, raw data typically had to be stored for at least 7 or 12 years, and the exact same software used to process it also had to be kept available and runnable (which was an ongoing problem, especially with some of the commercial software like gaussian...but even open source stuff is affected by bit-rot and also by CADT-syndrome. we had a source license for gaussian, but that didn't guarantee that we could even compile it with newer compilers. it might have changed now, but iirc it would only compile with a specific intel fortran compiler. numerous efforts to compile it with gfortran ended in failure) and some of the data sets that had to be stored were huge - dozens or hundreds of terabytes or more. and while it wasn't something i worked on personally, i know that for some of the people working with, e.g., the synchrotron that that's a relatively piddling quantity of data. craig -- craig sanders <cas@taz.net.au>

On Friday, 23 December 2016 8:55:05 PM AEDT Craig Sanders via luv-main wrote:
On Fri, Dec 23, 2016 at 08:22:45PM +1100, russell@coker.com.au wrote:
I've heard a lot of scientific computing people talk about a desire to reproduce calculations, but I haven't heard them talking about these issues so I presume that they haven't got far in this regard.
it was a big issue when i was at unimelb (where i built a HPC cluster for the chemistry dept and later worked on the nectar research cloud).
depending on the funding source or the journal that papers were published in, raw data typically had to be stored for at least 7 or 12 years, and the exact same software used to process it also had to be kept available and runnable (which was an ongoing problem, especially with some of the commercial software like gaussian...but even open source stuff is affected by bit-rot and also by CADT-syndrome. we had a source license for gaussian, but that didn't guarantee that we could even compile it with newer compilers.
Debian/Unstable has a new version of GCC that has deprecated a lot of the older STL interfaces. It also has a kernel that won't work with the amd64 libc from Wheezy. These are upstream issues so other distributions may have dealt with them in some ways. It should be possible to change the Wheezy libc to the newer amd64 system call interface without changing much and using kvm or Xen is a possibility too. Also compiling against the old STL isn't that hard to do, and Debian has good support for multiple versions of GCC. The STL isn't necessarily a trivial issue. I recall that Wheezy stopped getting security support for Chromium because upstream (Google) decided to just make new releases which depended on new C++ features that weren't in the Wheezy version of GCC. Supporting old versions of software is the usual requirement and that's usually a lot easier. But I can imagine a situation where part of the tool-chain for scientific computing had a bug that was only fixed in a new upstream release that required a new compiler. Speaking for myself I'm having enough trouble making the software I'm responsible work on all the newer versions of compilers etc. I don't give much thought to backwards compatability. -- My Main Blog http://etbe.coker.com.au/ My Documents Blog http://doc.coker.com.au/
participants (3)
-
Andrew Mather
-
Craig Sanders
-
Russell Coker