Hi all,
My special thanks to Axel and Steve, amongst others, for their inputs to my previous queries.
The issue: Errors introduced in a simulation when multiple processors are used.
Description of the issue:
Our group is currently trying to simulate crack propagation in metals. In doing so, we recently observed a complete change in the crack propagation behavior depending on the number of the processors utilized, for the same input script. This was observed irrespective of the (Jan 15, 2010 & Feb 18, 2011) versions. The input script is listed below my signature.
We tested it on 1x1x1 configuration (1 processor), 8x8x1 (64 processors), 8x12x1 (96 processors) and 16x16x1 (256 processors). Openmpi_gcc-1.2.5 was used in all the runs.
Deviations in temperature and other computed parameters (dumped in the log.lammps file) began from the 200th step. The thermo command has a step size of 200.
The differences in computed parameters are so drastic that in some cases the crack splits the specimen in to two halves while getting prematurely arrested in the mid-length of the specimen in other cases. Unless this job was completely conducted in a single processor, it is difficult to know which result is correct.
We wonder if such artifacts are due to approximation errors while passing the compute parameters between different compute nodes. If so, we had like to know of possible ways to eliminate or reduce this issue.
We appreciate any thoughts and experiences concerning this issue.
Thanks,
Vijay