
On Fri, Dec 23, 2016 at 08:22:45PM +1100, russell@coker.com.au wrote:
I've heard a lot of scientific computing people talk about a desire to reproduce calculations, but I haven't heard them talking about these issues so I presume that they haven't got far in this regard.
it was a big issue when i was at unimelb (where i built a HPC cluster for the chemistry dept and later worked on the nectar research cloud). depending on the funding source or the journal that papers were published in, raw data typically had to be stored for at least 7 or 12 years, and the exact same software used to process it also had to be kept available and runnable (which was an ongoing problem, especially with some of the commercial software like gaussian...but even open source stuff is affected by bit-rot and also by CADT-syndrome. we had a source license for gaussian, but that didn't guarantee that we could even compile it with newer compilers. it might have changed now, but iirc it would only compile with a specific intel fortran compiler. numerous efforts to compile it with gfortran ended in failure) and some of the data sets that had to be stored were huge - dozens or hundreds of terabytes or more. and while it wasn't something i worked on personally, i know that for some of the people working with, e.g., the synchrotron that that's a relatively piddling quantity of data. craig -- craig sanders <cas@taz.net.au>