Sorry - I made a mistake, the job also fails to run on the lmp_linux build too-so an entirely separate parallel system setup with enough disk space. What could it be?
Sorry - I made a mistake, the job also fails to run on the lmp_linux
build too-so an entirely separate parallel system setup with enough disk
space. What could it be?
miscompilation due to non-standard syntax in the /opt pair class.
if you are using a gcc-4 compiler, it will choke on some of the
constructs in the opt package. this is due to usage of, e.g. restrict
in ways that are not compatible with the c++ standard.
try the non-opt version of the same class (how much speedup, do you
see through /opt, btw?).
Thanks Axel!! That was it. A bit of a delay replying as I wanted to run the timings, and I needed to iron out another bug (actually I tried using the non opt potential before but it still conked out - but this was in fact due to a different error to do with fix ave/atom (i think this was purely an out of memory error).
So using non opt works but what cost. Well comparing the pair times, a 300K atoms in a box on 8 cores goes at:
Pair time (\) = 106\.26 \(54\.9112\) not optimised: Pair time \() = 140.285 (61.1297)
So it's reasonably significant, the optimised potential taking three quarters of the time.
Steve there really isn't much more detail than the process crashes with the error message I included in the body of the mail. If your objection is with tar files to wrap up multiple files, that is of course easily remedied!
Thanks for your help. I will use the non optimised version of the potential whenever I have geometries which cause this kind of error, and of course leave it to the Lammps team to decide whether anything needs to be done!
please try the attached file on your system.
this is using the main and safe trick that the
OPT package does to speed up the compute method:
to use a function template to have the compiler
optimize out if statements that have a predetermined
outcome before you enter the innerloops, without
having to code the essentially same code multiple
times or use a preprocessor or a script/program to
generate the innerloop.
it removes all the non-portable and some IMO
completely wrong code.
on the micelle example input i (still) get a 15%
speedup from using this version over the plain lj/cut.
pair_lj_cut_opt.h (3.47 KB)