14mar13 lammps
simulation on a M2090 cluster with 1.5 millions atoms. (20 seconds for test)
using "thermo 100", I get the Lost atoms with several steps.
using "thermo 10", everything seems normal, but
with "thermo 1", simulation is 4 times faster than the thermo 10. (800 steps vs. 200 steps)
impossible to say anything with so little information and no way to reproduce it. could be anything from a messed up initial configuration over bad choice of simulation parameters to broken hardware. Axel.
The result on 1 node with 3 gpus, input file below.
with thermo 47:
Step Temp E_pair E_mol TotEng Press
0 256.11017 -5440986 0 -5387300.4 -2226.3813
47 269.63128 -5442394.8 0 -5385875 -1360.0107
ERROR: Lost atoms: original 1621683 current 1621682 (thermo.cpp:389)
With "thermo 100":
tep Temp E_pair E_mol TotEng Press
0 256.11017 -5440986 0 -5387300.4 -2226.3813
ERROR: Lost atoms: original 1621683 current 1621653 (thermo.cpp:389)
Sun Mar 17 16:43:55 NOVT 2013
your system seems to be behaving strangely and cannot run reliably on the CPU either. only that it takes longer until the simulation loses atoms. this would be consistent with a compilation for single precision, for example. the different behavior in response to thermo settings could be caused by on demand calculations (including the check for lost atoms).
in any case, you need to first resolve why your simulation loses atoms. everything else is just a secondary effect.
how it can affect on simulation? i suppose it's just output calculations.
no. but that is beside the point. your input deck doesn't even run reliably
on a CPUs, *regardless* of what output you use. you have to fix that first.
timestep affects the simulation. u should read a little more. reducing the timestep allows your atoms to follow a more probabilistic path. i suggested this because if u are losing atoms, then probably two or more atoms are comng too close to each other, the high repulsive forces tend to move them with such a high speed that u loose them. Since this is not practically possible, u must first try this problem by reducing ur timestep (Assuming u have checked your geometry) so that atom trajectories follow a more realistic path.
If you look at the line indicated in the source code,
it is printing the velocity of "bad" atoms as the last
3 fields. Your velocities are huge and your temp
is > 10^15. So the code stops. You have an error
in your dynamics.
i am clear about this, but there is no reasons for me.
Most easy example below, just minimized box of atoms, and if i set one vector velocity for all of it, then mistake is. I don't see any reasons for incredible blowing up system at just one step from stable state.
package cuda gpu/node 1
units metal
boundary s s s
atom_style atomic
newton off
pair_style eam/alloy
pair_coeff * * Fe.set Fe
thermo 1
timestep 0.001
fix 1 all nve
velocity all set 0 -25 0 sum yes units box
dump myDump all image 1 dump.*.jpg type type adiam 2 up 0 1 0 view 0 0
run 60000
Maxim,
Did you notice that when you set the velocity of the group to -10 then the crash happens at timestep 100. Then when the velocity is set to -25 the crash is at timestep 40. The system seem to keep crashing after traveling the same distance.
Maybe this observation will help you somehow.
Carlos
i think that happening when lammps are correcting boundaries positions. visualization slides in pdf atach. at 99 step when boundaries are replaced, atoms are start unexpected moving.
i think that happening when lammps are correcting boundaries positions. visualization slides in pdf atach. at 99 step when boundaries are replaced, atoms are start unexpected moving.
Its happens only with 2 and more GPUs in use(i try 2 nodes with 1 gpu and 1 node with 2 gpus). Its no happens on CPUs. Are “ptxas info” errors? Attached cudalib compile log.
input file
package cuda gpu/node 2
units metal
boundary s s s
atom_style atomic
newton off #lattice bcc 2.855312 #region box block -10 10 -10 10 -10 10 #create_box 1 box #create_atoms 1 region box
read_data data (load minimized box).
pair_style eam/alloy
pair_coeff * * Fe.set Fe
neigh_modify delay 1 every 1 check no
thermo 1
timestep 0.001
fix 1 all nve
fix 2 all setforce 0 0 0
run 50