Another difference in the output was in the number of special bonds;
16Mar18:
18 = max # of special neighbors
7Aug19:
17 = max # of special neighbors
I noticed that "special.cpp" has substantially changed between the two versions. Recompiling Lammps-7Aug19 with special.cpp and special.h from Lammps-16Mar18 results in an PotEng of 4949.1186 again.
What is the cause of this difference? For this case, coronene crystals are no longer stable with the new version.
the major difference is the switch from a ring communication to a rendezvous algorithm to improve parallel performance when run on many MPI ranks. the code has passed many tests, so seeing a difference like you are noting is a bit of a surprise. the only unusual command in your input is the “box tilt large” is this absolutely required? could you try a system, that doesn’t require this command?
does the failure happen consistently regardless of the number of MPI ranks used?
in general, the best way to make certain that some issue like this one is being looked into properly is to file a report with suitable files to (quickly!) reproduce the issue at the LAMMPS issue tracker on github. https://github.com/lammps/lammps/issues
mails to the mailing list have a tendency to get overlooked, if nobody has the time to react on them immediately.
I’ll second Axel in that if you post (mail list or GHub) a small-as-possible
example that quickly gives a different answer between the 2 versions,
we will look at it quickly. If this is in fact a bug with the new version
of special.cpp, we would want to know.
I reduced the input file a bit.
The result is independent of the number of cores. I typically use intel compilers in combination with openmpi, but also if I just use gnu compilers I get the same results. Ubuntu version also does not make a difference.
I tried different “special_bonds” settings. This off course changes the outcomes, but the two versions keep giving different results.
The data file can be found here:
Herma ___________ units real atom_style full bond_style harmonic angle_style harmonic kspace_style ewald 1e-6 dihedral_style harmonic improper_style cvff pair_style buck/coul/long 10.0 pair_modify shift yes special_bonds lj 0.0 0.0 0.0 coul 0.0 0.0 0.0 read_data supercell.data replicate 2 4 1 velocity all create 100 12 thermo_style custom time vol pe evdwl ecoul elong thermo 1000 timestep 0.5 fix 1 all nvt temp 300 300 40 run 0
I made a mistake in my previous email. Using a GNU compiler does make a difference. This morning made several new builds using different compilers. It appears an intel problem. I use icc in combination with openmpi, since the intel mpi (Version 19.0.1.144) gives a memory leak. I now compiled also using a newer version of intel compiler (make intel_cpu_intelmpi), but I did not test for the memory leak yet, since these test runs take an hour or so. This build still gives wrong results. If I make the simulation cell a little larger, "max # of special neighbors" starts to deviate as well.
Herma
16Mar18 - intel openmpi (Version 19.0.1.144)
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
18 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 5616.5709 412.43142 100 -519.71207 275.52319 -740.74556
7Aug19 - mpi
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
18 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 5616.5709 412.43142 100 -519.71207 275.52319 -740.74556
7Aug19 - intel openmpi (Version 19.0.1.144)
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
18 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 5616.5709 2368.5426 100 1202.6081 509.31419 -740.74556
7Aug19 - intel_cpu_intelmpi (Version 19.0.5.281)
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
18 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 5616.5709 2368.5426 100 1202.6081 509.31419 -740.74556
replicate 4 8 3 instead of replicate 2 4 1 (7Aug19 - intel_cpu_intelmpi)
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
17 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 67398.851 38802.71 100 23298.102 7625.2225 -8889.0052
replicate 4 8 3 instead of replicate 2 4 1 (7Aug19 - mpi)
3 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
18 = max # of special neighbors
Time Volume PotEng Temp E_vdwl E_coul E_long
0 67398.851 4949.1186 100 -6236.5448 3306.2783 -8889.0052
This is puzzling b/c special.cpp does not do anything numerical.
It is simply figuring out the topology of molecules. So I wouldn’t
think the compiler matters. Also if you “make the simulation cell
larger” by simply replicating more then I don’t think replicating makes
individual molecular any larger. Just more of them. So not clear
how replication can change the max special neigh count.
Some versions of the the Intel compilers have known issues where they miscompile correct code, especially with high optimization settings. Outside of the USER-INTEL package, which benefits from the superior vectorization capabilities of the Intel compilers, there is little reason to use them over GCC or CLang. The performance difference compared to LAMMPS executables compiled with suitably recent GCC or CLang compilers is often negligible and dwarfed by the performance gains from properly tuning runtime settings in the LAMMPS input file (neighbor list, load balancing, MPI vs. multi-threading, coulomb cutoff vs. kspace, processor mapping etc.).
Mind you, this is not limited to Intel compilers, but in my personal experience, they have been more prone to this in recent years. My guess is that this is due to more aggressive attempts at vectorization, which seems to collide with some of the programming style employed in LAMMPS.
This is puzzling b/c special.cpp does not do anything numerical.
It is simply figuring out the topology of molecules. So I wouldn’t
but if the compiler misses data dependencies when trying to vectorize loops, it can produce incorrect neighbor lists, which in turn will result in the incorrect energies/forces.
think the compiler matters. Also if you “make the simulation cell
larger” by simply replicating more then I don’t think replicating makes
individual molecular any larger. Just more of them. So not clear
how replication can change the max special neigh count.
again, this would be due to miscompiled code. so with more atoms/molecules, the neighbor lists would be different and then loops would process atoms in a different order and in different subdomains and thus it is quite possible to see different results and exclusion counts etc.
Whilst its not necessarily the compiler since undefined behavior takes many sneaky forms that manifest between compilers differently, we can probably shed light on how aggressive the intel compiler is being by seeing the make file being used. Normally auto-vectorization of the intel compiler is very conservative for the default -O2 and -O3 settings as far as I’m aware (the autovectorization report for lammps in my experience fails for most primary loops such as the force computation due to pointer access and function calls) so it’d probably have to be a more aggressive setting.
I just use the standard makefile (Makefile.intel_cpu_openmpi) which uses -O2 with some additional flags.
I did some timings, I noticed some time benefit when using in combination with USER-INTEL. That was initially the reason for me to use intel compilers. We also use quantum chemical codes in our group where the difference is more beneficial.
Since we have the new 2019 compilers, I had many problems. I will stick to GNU compilers in the future.
Perhaps the intel makefiles could come with some warning?
where temporary objects from dereferencing with the inline ++ operation result in hidden aliases that break code compiled with the “-fno-alias” option.
Options are:
Use Intel 2018 compilers
Use Intel 2019 compilers and remove the “-fno-alias” from the Makefile flags with some performance impact to the USER-INTEL package.
As Axel noted, if you are using the USER-INTEL package, the Intel compilers really are a benefit due to a lot of compiler generated vector code. This should perform significantly better on supported simulations.
Sorry that you (also) had to fight with this issue.
I guess option 3 also requires some additional changes to special.cpp as Steve suggests?
LAMMPS versions later than Sep19 2019 shouldn’t have any issues. Code changes to allow the same performance without the “-fno-alias” flag for USER-INTEL are included in these versions and the Makefiles have this flag removed.
Should we just convert these single lines in special.cpp
I’m not really an expert in C++ legalese, but this does seem like more of a compiler issue to me. Since there are current workarounds, we probably don’t need to change the current version of LAMMPS. I’ll dig a little more to see whether it makes since to change the code in LAMMPS. Our compilers are in a transition now that makes things a little tricky. Thanks for offering to help here.