reproducibility of results

23 Dec 2016

      On Fri, Dec 23, 2016 at 08:22:45PM +1100, russell@coker.com.au wrote:
...
I've heard a lot of scientific computing people talk about a desire to
reproduce calculations, but I haven't heard them talking about these
issues so I presume that they haven't got far in this regard.
it was a big issue when i was at unimelb (where i built a HPC cluster
for the chemistry dept and later worked on the nectar research cloud).

depending on the funding source or the journal that papers were
published in, raw data typically had to be stored for at least 7 or 12
years, and the exact same software used to process it also had to be
kept available and runnable (which was an ongoing problem, especially
with some of the commercial software like gaussian...but even open
source stuff is affected by bit-rot and also by CADT-syndrome. we had
a source license for gaussian, but that didn't guarantee that we could
even compile it with newer compilers. it might have changed now, but
iirc it would only compile with a specific intel fortran compiler.
numerous efforts to compile it with gfortran ended in failure)

and some of the data sets that had to be stored were huge - dozens or
hundreds of terabytes or more. and while it wasn't something i worked on
personally, i know that for some of the people working with, e.g., the
synchrotron that that's a relatively piddling quantity of data.

craig

--
craig sanders <cas@taz.net.au>

reproducibility of results

Craig Sanders