[lammps-users] Unit tests failing with Intel 19.1 and patch_31Aug2021

Hello,

I’m finding that multiple unit tests fail due to small numerical differences:

The following tests FAILED:

101 - MolPairStyle:lj_charmm_coul_msm (Failed)

364 - FixTimestep:aveforce_variable (Failed)

373 - FixTimestep:momentum (Failed)

375 - FixTimestep:nph (Failed)

378 - FixTimestep:npt_iso (Failed)

404 - FixTimestep:rigid_npt_small (Failed)

416 - FixTimestep:shake_angle (Failed)

449 - DihedralStyle:table_linear (Failed)

450 - DihedralStyle:table_spline (Failed)

458 - ImproperStyle:harmonic (Failed)

An example of the contents of LastTest.log is:

[==========] Running 7 tests from 1 test suite.

[----------] Global test environment set-up.

[----------] 7 tests from PairStyle

[ RUN ] PairStyle.plain

/home/jdh4/software/lammps-patch_31Aug2021/unittest/force-styles/test_pair_style.cpp:556: Failure

Expected: (err) <= (epsilon)

Actual: 5.3501425542118778e-14 vs 5.0000000000000002e-14

[ FAILED ] PairStyle.plain (80 ms)

Below is the procedure for building lammps and running the tests:

This points to the fact that you turn on optimization, while the unit tests are tuned to pass without optimization. This is mentioned in the documentation. The tests are very sensitive and aggressive compiler optimization can therefore result in significant deviations in the order of operations and then impact some values. The difference you quote are small and thus negligible. On top of that, the Intel compilers are often the most aggressive in modifying code and thus resulting in the largest difference, sometimes to the point of being incorrect.

axel.

Thanks, Axel. I find all the tests pass with “-O0”.

I’d like to know if the optimization is breaking anything in a significant way without looking at LastTest.log. Is there an easy way for me to drop the tolerance by a few percent? I see “epsilon” in “3.11.5. Code Coverage and Unit Testing (CMake only)” in the docs. Can that be set through an environment variable or something?

Jon

Thanks, Axel. I find all the tests pass with “-O0”.

I’d like to know if the optimization is breaking anything in a significant way without looking at LastTest.log. Is there an easy way for me to drop the tolerance by a few percent? I see “epsilon” in “3.11.5. Code Coverage and Unit Testing (CMake only)” in the docs. Can that be set through an environment variable or something?

Not. It is specific for each pair style. Some are more noisy than others and we are not just comparing forces but also forces after a few MD steps and the makes them diverge even more.
Furthermore there are some internal “fudge factors” to check with OPENMP or INTEL package styles (which are compared to the same plain style reference data) and even more so for GPUs when compiled for single or mixed precision. Factor in multiple compilers and platforms and you have to make some choices. I’d rather have the reference data checked more tightly on non-optimized code (particularly during development) and thus have better chances to catch some subtle side effects than relaxing the tolerance too much and then have some “lesser” errors slip through the cracks.
The divergence on some systems with optimization enabled gives you a gauge of how much some compiler optimization impact results. For as long as the result is close to the reference, this is still acceptable. If it diverges too much, then a second look is required or a change in compiler flags to be less aggressive (the settings for intel compilers are by default rather aggressive on request of the contributor working at Intel).

That makes sense. I’ll work out some grep command for quick checks. Thanks.

Jon