Dear all,
I have been trying to set LAMMPS parallel on an Athlon cluster. On the website it states that I should ask such questions to a local expert but we have been working on it for weeks and had a lot of troubles. I just want to make sure if it is rather a simple command problem.
For the same system that lammps runs on a single processor with the same inputs without any error, I get three types of errors when I run it parallel.
- ERROR on proc 0: Failed to reallocate 204955488 bytes for array atom:dihedral_atom1
mpiexec: Warning: accept_abort_conn: MPI_Abort from IP 10.0.0.22, killing all.
[0] MPI Abort by user Aborting program !
[0] Aborting program!
It runs on a single processor so I do not understand why it would run out of memory. Below is my batch processing file:
#PBS -l walltime=300:05:10
#PBS -l nodes=2:ppn=2
#PBS -N nptpolymer2
#PBS -S /bin/ksh
#PBS -j oe
cd $HOME/systems/polymer/parallel
mpiexec -np 4 ./lmp_linux < polymer2.in
In my input file I put this command: processors 2 2 1
I tried with a smaller system (much less atoms), it worked fine.
- To confirm that it works fine for a smaller system, I ran another small system with different morphology on 4 processors again. In my output file, there is the following error:
Dihedral problem: 1 2 55 57 84 83
1st atom: 1 nan nan nan
2nd atom: 1 nan nan nan
3rd atom: 1 nan nan nan
4th atom: 1 nan nan nan
Dihedral problem: 1 2 77 79 80 53
1st atom: 1 nan nan nan
2nd atom: 1 nan nan nan
3rd atom: 1 nan nan nan
4th atom: 1 nan nan nan
It continues like this. Eventough the second dihedral is a different set of atoms it again numbers them as 1st dihedral. It is not like this in my data file.
The log file does not have this error message but it prints nan for the energy, temperature …
- For larger systems I tried writing processors 1 4 1 . It complains of bad grid. For any other combinations of number of nodes and processors, it fails to calculate anything. For instance:
nodes =2: processors per node =1
-np 2
or
nodes = 4: ppn =2
-np 8
In this cluster there are two processors per node. It looked like there is a mapping problem and I do not know hot to get it working for large systems(such as 30,000 atoms) for extended runs(100ps). The only succesful run was for 3000 atoms so far.
I hope it was not very long of a question. Thank you for taking the time to read it. Looking forward for responses.
Best regards,
Burcu Eksioglu