strange behavior of thermo

14mar13 lammps
simulation on a M2090 cluster with 1.5 millions atoms. (20 seconds for test)

using "thermo 100", I get the Lost atoms with several steps.
using "thermo 10", everything seems normal, but
with "thermo 1", simulation is 4 times faster than the thermo 10. (800 steps vs. 200 steps)

about "4 times" I lied, it is not differences,
but I'm losing atoms with "thermo 46" at 300 steps, "thermo 45" at 5000 steps.

17.03.2013, 02:19, "Maxim" <[email protected]>:

impossible to say anything with so little information and no way to reproduce it. could be anything from a messed up initial configuration over bad choice of simulation parameters to broken hardware. Axel.

The result on 1 node with 3 gpus, input file below.

with thermo 47:

Step Temp E_pair E_mol TotEng Press
0 256.11017 -5440986 0 -5387300.4 -2226.3813
47 269.63128 -5442394.8 0 -5385875 -1360.0107
ERROR: Lost atoms: original 1621683 current 1621682 (thermo.cpp:389)

With "thermo 100":

tep Temp E_pair E_mol TotEng Press
0 256.11017 -5440986 0 -5387300.4 -2226.3813
ERROR: Lost atoms: original 1621683 current 1621653 (thermo.cpp:389)
Sun Mar 17 16:43:55 NOVT 2013

with thermo 10:

4970    203\.63584   \-5429739\.6            0   \-5387053\.7    259\.59766 
4980      204\.805   \-5429984\.7            0   \-5387053\.7     204\.7459 
4990    206\.66191     \-5430374            0   \-5387053\.8    155\.22829 

ERROR: Lost atoms: original 1621683 current 1621643 (thermo.cpp:389)
Sun Mar 17 16:55:49 NOVT 2013

with s s s boundaries

segmentation fault

input file

your system seems to be behaving strangely and cannot run reliably on the CPU either. only that it takes longer until the simulation loses atoms. this would be consistent with a compilation for single precision, for example. the different behavior in response to thermo settings could be caused by on demand calculations (including the check for lost atoms).

in any case, you need to first resolve why your simulation loses atoms. everything else is just a secondary effect.

axel.

I found that error induced by
compute strs all stress/atom

and vis confirmed this. Several atoms are moved out with this command in script.

17.03.2013, 16:40, “Axel Kohlmeyer” <[email protected]…24…>:

As already stated, you must look as to why your simulations are loosing atoms. Try reducing ur timestep first.

U may also try to exclude the k.e contribution in the stress tensor and see what happens…

how it can affect on simulation? i suppose it’s just output calculations.

18.03.2013, 19:17, “Sagar Chandra” <[email protected]…24…>:

how it can affect on simulation? i suppose it's just output calculations.

no. but that is beside the point. your input deck doesn't even run reliably
on a CPUs, *regardless* of what output you use. you have to fix that first.

axel.

plz post to the forum and not to me personally.

timestep affects the simulation. u should read a little more. reducing the timestep allows your atoms to follow a more probabilistic path. i suggested this because if u are losing atoms, then probably two or more atoms are comng too close to each other, the high repulsive forces tend to move them with such a high speed that u loose them. Since this is not practically possible, u must first try this problem by reducing ur timestep (Assuming u have checked your geometry) so that atom trajectories follow a more realistic path.

Sagar

Hi there,
strong minimization seems solve the problem at initial configuration, I obtained stable system at many steps.

but when I set velocity for piston(i want impact simulation),
velocity udar set 0 -10 0 sum yes units box
then this error is

USER-CUDA mode is enabled (lammps.cpp:394)
package cuda gpu/node 3
units metal
boundary s s s
atom_style atomic

#lattice fcc 4.04527
#region box block -300 300 0 200 -300 300 units box
#create_box 2 box

#region target block -300 300 0 100 -300 300 units box
#create_atoms 1 region target

#region udar block -50 50 120 200 -300 300 units box
#create_atoms 1 region udar

read_data data
orthogonal box = (-298.848 -0.0590201 -298.848) to (298.848 198.26 298.848)
3 by 1 by 3 MPI processor grid
2489009 atoms
2489009 velocities

region udar block -50 50 120 200 -300 300 units box
group udar region udar
283784 atoms in group udar

pair_style eam/fs
pair_coeff * * Al_mm.eam.fs Al Al
thermo 10
timestep 0.001
fix 1 all nve
velocity udar set 0 -10 0 sum yes units box
dump myDump all image 10 dump.*.jpg type type
run 30000
Step Temp E_pair E_mol TotEng Press Volume
0 123.33104 -8452201.4 0 -8412522.2 518.11542 70847373
10 123.33104 -8452201.4 0 -8412522.2 518.1158 70847373
20 123.33104 -8452201.4 0 -8412522.2 518.11697 70847373
30 123.33104 -8452201.4 0 -8412522.2 518.11895 70847373
40 123.33104 -8452201.4 0 -8412522.2 518.12179 70847373
50 123.33104 -8452201.4 0 -8412522.2 518.12553 70847373
60 123.33104 -8452201.4 0 -8412522.2 518.1302 70847373
70 123.33104 -8452201.4 0 -8412522.2 518.1358 70847373
80 123.33104 -8452201.4 0 -8412522.2 518.14232 70847373
90 123.33104 -8452201.4 0 -8412522.2 518.14973 70847373
Out of v-range atoms:
1930167 1 // 298.784514 48.544989 -76.697497 // -11168834391857.648438 -265.812540 -243.582958
1930165 1 // 294.830912 48.545310 -76.697085 // -9198.799928 -7.274922 -7.810658
1927981 1 // 296.840958 48.545168 -78.715565 // -56231.197049 -115.300530 -127.972388


110807 1 // -294.813719 56.619814 -149.348350 // 6859.678563 38.728745 -16.736480
110804 1 // -296.822492 58.639575 -149.348911 // 41602.098948 794.061348 -264.760326
110803 1 // -298.767806 56.622126 -149.349525 // 6989381334373.493164 1347.928669 -549.070141
ERROR: Temperature out of range. Simulations will be abortet.
(compute_temp_cuda.cpp:161)

18.03.2013, 20:53, “Sagar Chandra” <[email protected]…24…>:

If you look at the line indicated in the source code,
it is printing the velocity of "bad" atoms as the last
3 fields. Your velocities are huge and your temp
is > 10^15. So the code stops. You have an error
in your dynamics.

Steve

i am clear about this, but there is no reasons for me.

Most easy example below, just minimized box of atoms, and if i set one vector velocity for all of it, then mistake is. I don't see any reasons for incredible blowing up system at just one step from stable state.

package cuda gpu/node 1
units metal
boundary s s s
atom_style atomic
newton off

##lattice bcc 2.855312
##region box block -50 50 0 50 -50 50 units box
##create_box 1 box
##create_atoms 1 region box

read_data data #minimized box

pair_style eam/alloy
pair_coeff * * Fe.set Fe
thermo 1
timestep 0.001
fix 1 all nve
velocity all set 0 -25 0 sum yes units box
dump myDump all image 1 dump.*.jpg type type adiam 2 up 0 1 0 view 0 0
run 60000

sl003
LAMMPS (14 Mar 2013)
# Using LAMMPS_CUDA
USER-CUDA mode is enabled (lammps.cpp:394)
# CUDA: Activate GPU
# Using device 0: Tesla M2090
Reading data file ...
  orthogonal box = (-50.0163 -0.242496 -50.0163) to (50.0163 50.2096 50.0163)
# Using device 1: Tesla M2090
# Using device 2: Tesla M2090
  1 by 1 by 3 MPI processor grid
  45378 atoms
  45378 velocities
# CUDA: VerletCuda::setup: Allocate memory on device for maximum of 16638 atoms...
# CUDA: Using precision: Global: 8 X: 8 V: 8 F: 8 PPPM: 8
Setting up run ...
# CUDA: VerletCuda::setup: Upload data...
# CUDA: Total Device Memory useage post setup: 104.070312 MB
Memory usage per processor = 5.87241 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
       0 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       1 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       2 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       3 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       4 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       5 13993.73 -190308.65 0 -108229.3 173588.12 504850.28
       6 13993.73 -190308.65 0 -108229.3 173588.13 504850.28
       7 13993.73 -190308.65 0 -108229.3 173588.13 504850.28
       8 13993.73 -190308.65 0 -108229.3 173588.13 504850.28
       9 13993.73 -190308.65 0 -108229.3 173588.13 504850.28
      10 13993.73 -190308.65 0 -108229.3 173588.13 504850.28
      11 13993.73 -190308.65 0 -108229.3 173588.14 504850.28
      12 13993.73 -190308.65 0 -108229.3 173588.14 504850.28
      13 13993.73 -190308.65 0 -108229.3 173588.14 504850.28
      14 13993.73 -190308.65 0 -108229.3 173588.15 504850.28
      15 13993.73 -190308.65 0 -108229.3 173588.15 504850.28
      16 13993.73 -190308.65 0 -108229.3 173588.15 504850.28
      17 13993.73 -190308.65 0 -108229.3 173588.16 504850.28
      18 13993.73 -190308.65 0 -108229.3 173588.16 504850.28
      19 13993.73 -190308.65 0 -108229.3 173588.17 504850.28
      20 13993.73 -190308.65 0 -108229.3 173588.17 504850.28
      21 13993.73 -190308.65 0 -108229.3 173588.18 504850.28
      22 13993.73 -190308.65 0 -108229.3 173588.18 504850.28
      23 13993.73 -190308.65 0 -108229.3 173588.19 504850.28
      24 13993.73 -190308.65 0 -108229.3 173588.19 504850.28
      25 13993.73 -190308.65 0 -108229.3 173588.2 504850.28
      26 13993.73 -190308.65 0 -108229.3 173588.21 504850.28
      27 13993.73 -190308.65 0 -108229.3 173588.21 504850.28
      28 13993.73 -190308.65 0 -108229.3 173588.22 504850.28
      29 13993.73 -190308.65 0 -108229.3 173588.23 504850.28
      30 13993.73 -190308.65 0 -108229.3 173588.23 504850.28
      31 13993.73 -190308.65 0 -108229.3 173588.24 504850.28
      32 13993.73 -190308.65 0 -108229.3 173588.25 504850.28
      33 13993.73 -190308.65 0 -108229.3 173588.26 504850.28
      34 13993.73 -190308.65 0 -108229.3 173588.26 504850.28
      35 13993.73 -190308.65 0 -108229.3 173588.27 504850.28
      36 13993.73 -190308.65 0 -108229.3 173588.28 504850.28
      37 13993.73 -190308.65 0 -108229.3 173588.29 504850.28
      38 13993.73 -190308.65 0 -108229.3 173588.3 504850.28
      39 13993.73 -190308.65 0 -108229.3 173588.31 504850.28
      40 3.2114757e+08 3208480.7 0 1.8868795e+09 3.9114411e+09 514856.28
      41 1.3235179e+09 310309.81 0 7.7633219e+09 1.6105769e+10 514856.28
      42 1.448048e+09 173830.94 0 8.4936089e+09 1.7620991e+10 514856.28
      43 1.48981e+09 635535.29 0 8.7390232e+09 1.8129599e+10 514856.28
      44 1.5476517e+09 -32706.462 0 9.0776217e+09 1.8832883e+10 514856.28
      45 1.5476753e+09 -18734.717 0 9.0777742e+09 1.8833179e+10 514856.28
      46 1.5648003e+12 738864.72 0 9.1782393e+12 1.9041118e+13 514856.28
      47 6.2545116e+12 -57544.114 0 3.6685447e+13 7.6107403e+13 514856.28
      48 6.254512e+12 -57904.316 0 3.668545e+13 7.6107409e+13 514856.28
      49 6.2545124e+12 -107590.78 0 3.6685452e+13 7.6107413e+13 514856.28
ERROR on proc 0: Too many neighbor bins (neighbor.cpp:1561)

Maxim,
Did you notice that when you set the velocity of the group to -10 then the crash happens at timestep 100. Then when the velocity is set to -25 the crash is at timestep 40. The system seem to keep crashing after traveling the same distance.
Maybe this observation will help you somehow.
Carlos

i think that happening when lammps are correcting boundaries positions. visualization slides in pdf atach. at 99 step when boundaries are replaced, atoms are start unexpected moving.

22.03.2013, 05:27, “Carlos Campana” <[email protected]…24…>:

i forget attach, added.

i think that happening when lammps are correcting boundaries positions. visualization slides in pdf atach. at 99 step when boundaries are replaced, atoms are start unexpected moving.

images.pdf (3.32 MB)

with CPUs and 1 GPU all works fine. Troubles with several GPUs in use…
may this warnings from compilation affects on this?

so
i set

fix all setforce 0 0 0 (no set velocity, no atoms moving, yeah).
neigh_modify delay 1 every 1 check no

and result^

Step Temp E_pair E_mol TotEng Press Volume
0 0 -1741.987 0 -1741.987 -2.0722151 4094.8701

1 0 483102.06 0 483102.06 64429085 4094.8701
2 0 483102.06 0 483102.06 64429085 4094.8701
3 0 483102.06 0 483102.06 64429085 4094.8701
4 0 483102.06 0 483102.06 64429085 4094.8701
etc

Its happens only with 2 and more GPUs in use(i try 2 nodes with 1 gpu and 1 node with 2 gpus). Its no happens on CPUs. Are “ptxas info” errors? Attached cudalib compile log.

input file

package cuda gpu/node 2
units metal
boundary s s s
atom_style atomic
newton off
#lattice bcc 2.855312
#region box block -10 10 -10 10 -10 10
#create_box 1 box
#create_atoms 1 region box
read_data data (load minimized box).
pair_style eam/alloy
pair_coeff * * Fe.set Fe
neigh_modify delay 1 every 1 check no
thermo 1
timestep 0.001
fix 1 all nve
fix 2 all setforce 0 0 0
run 50

M2090_Tst.o38177 (74 KB)

log.lammps (5.63 KB)

Unfortunately I have zero experience with lammps on GPU.
Hopefully someone familiar with the topic will reply.
Carlos

With the USER-CUDA package you can only use one CPU per GPU.
I assume that is what you are doing.

Christian can likely take a look at this.

Steve