[lammps-users] p4_error: net_recv read: probable EOF on socket: 1

Dear, everyone

When I run the simulation with a script input file.
In the input script , run 20000 timestep first.After it is finished,
then setting up run 20000 occured error:

The error occured as follows:
LAMMPS (21 May 2008)
Reading restart file …
orthogonal box = (-36.6375 -19.9841 -19.9841) to (36.6375 19.9841 19.9841)
2 by 2 by 2 processor grid
6960 atoms
6640 bonds
9120 angles
11020 dihedrals
Finding 1-2 1-3 1-4 neighbors …
4 = max # of 1-2 neighbors
6 = max # of 1-3 neighbors
18 = max # of 1-4 neighbors
17 = max # of special neighbors
60 atoms in group C60cent
60 atoms in group C60win
6840 atoms in group IL
Resetting global state of Fix 1 Style npt from restart file info
PPPM initialization …
G vector = 0.123534
grid = 10 8 8
stencil order = 5
RMS precision = 0.000407645
brick FFT buffer size/proc = 810 80 810
Setting up run …
Memory usage per processor = 13.9493 Mbytes
Step Temp E_pair E_bond KinEng E_mol Volume Press Epp
220000 275.36959 -36708.121 1888.6238 5712.1172 4508.7312 117053.13 -245.43537 -16.683765
220500 283.19271 -36007.347 1243.6702 5874.3959 2926.9273 123601.3 -240.98533 -8.5403246


420000 274.27526 -36738.893 1896.3985 5689.4171 4465.1822 117158.07 283.29729 -15.903971
Loop time of 27919.7 on 8 procs for 200000 steps with 6960 atoms
0.4ns run 27919.7/3600=7.75hours

Pair time () = 16911.8 (60.5732) Bond time () = 258.017 (0.924138)
Kspce time () = 8132.59 (29.1285) Neigh time () = 819.54 (2.93535)
Comm time () = 1350.19 (4.83599) Outpt time () = 1.73096 (0.00619977)
Other time (%) = 445.787 (1.59668)

FFT time (% of Kspce) = 1230.47 (15.1301)
FFT Gflps 3d (1d only) = 0.0134431 inf

Nlocal: 870 ave 897 max 836 min
Histogram: 1 0 0 0 2 1 2 1 0 1
Nghost: 12672.9 ave 12741 max 12604 min
Histogram: 1 1 0 1 1 1 0 2 0 1
Neighs: 631068 ave 651013 max 616281 min
Histogram: 2 1 1 0 0 2 0 1 0 1

Total # of neighbors = 5048546
Ave neighs/atom = 725.366
Ave special neighs/atom = 6.84483
Neighbor list builds = 7568
Dangerous builds = 1
Respa levels:
0 = bond angle dihedral improper
1 =
2 = pair-inner
3 = pair-outer kspace
WARNING: One or more respa levels compute no forces
PPPM initialization …
G vector = 0.123513
grid = 10 8 8
stencil order = 5
RMS precision = 0.000407838
brick FFT buffer size/proc = 810 80 810
Setting up run …
p0_27933: p4_error: net_recv read: probable EOF on socket: 1
rm_l_7_2090: (27920.242188) net_send: could not write to fd=5, errno = 32
p6_2054: (27920.332031) net_send: could not write to fd=5, errno = 32
rm_l_3_27997: (27921.347656) net_send: could not write to fd=5, errno = 32
p7_2073: (27930.253906) net_send: could not write to fd=5, errno = 32
p3_27980: (27931.359375) net_send: could not write to fd=5, errno = 32
p0_27933: (27933.519531) net_send: could not write to fd=4, errno = 32

I performed the parallel simulation with 4 processors.When I set one run
in the input script, it will not occured any error.

The version of the lammps is lammps-24Oct08.

I used pgCC with mpich1.2.7 on AMD machines

I will appreciate your help

Fred

Can you isolate the problem? I.e. can you run for
fewer initial steps before the 2nd run. Can you run
a smaller problem on fewer processors? Did you
change anything in your input script between the two
runs?

Steve

Can you isolate the problem? I.e. can you run for
fewer initial steps before the 2nd run. Can you run

i would suggest to check the input file.

from the log file it looks as if in the
second run, r-RESPA is turned on with 4 levels,
but only levels 0, 2, and 3 are actually assigned
some calculations. i suspect that LAMMPS does
not handle this gracefully and that one of the
processes thus dies and causes the resulting mpi errors.

cheers,
    axel.

a smaller problem on fewer processors? Did you
change anything in your input script between the two
runs?

Steve

[...]