speed up of simulations

This is the system I am simulating.
Two LJ gases contained in rectangular box. The two gases are separated by a "piston" composed of many rigid bodies (20 in my case).
Each of this rigid body is made up of particles (around 100) constrained with a

fix 3 pistone01 rigid/nve single force * on off off torque * off off off
fix 4 pistone02 rigid/nve single force * on off off torque * off off off
.....

Each rigid body is linked to the two adjacent ones with a spring. The piston is maintained on average in the center of the box with a

fix 23 pistone spring tether 100000 500 50 0 0

I use

.....
neigh_modify exclude type 10 12
.....

to reduce the length of the collision list.

This is the performance results from a typical simulation

Loop time of 274089 on 8 procs for 250000000 steps with 4400 atoms
799.6% CPU use with 1 MPI tasks x 8 OpenMP threads

MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total

This is the system I am simulating.
Two LJ gases contained in rectangular box. The two gases are separated
by a “piston” composed of many rigid bodies (20 in my case).
Each of this rigid body is made up of particles (around 100)
constrained with a

fix 3 pistone01 rigid/nve single force * on off off
torque * off off off
fix 4 pistone02 rigid/nve single force * on off off
torque * off off off

Each rigid body is linked to the two adjacent ones with a spring. The
piston is maintained on average in the center of the box with a

fix 23 pistone spring tether 100000 500 50 0 0

I use


neigh_modify exclude type 10 12

to reduce the length of the collision list.

This is the performance results from a typical simulation

Loop time of 274089 on 8 procs for 250000000 steps with 4400 atoms
799.6% CPU use with 1 MPI tasks x 8 OpenMP threads

MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total

Pair | 86209 | 86209 | 86209 | 0.0 | 31.45
Bond | 237.11 | 237.11 | 237.11 | 0.0 | 0.09
Neigh | 11889 | 11889 | 11889 | 0.0 | 4.34
Comm | 1201.9 | 1201.9 | 1201.9 | 0.0 | 0.44
Output | 8.8642 | 8.8642 | 8.8642 | 0.0 | 0.00
Modify | 1.6943e+05 | 1.6943e+05 | 1.6943e+05 | 0.0 | 61.82
Other | | 5109 | | | 1.86

Nlocal: 4400 ave 4400 max 4400 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 10147 ave 10147 max 10147 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 10147
Ave neighs/atom = 2.30614
Ave special neighs/atom = 0.0245455
Neighbor list builds = 8174530
Dangerous builds = 0

I am using open MPI to run the

No, you are not using MPI, but OpenMP.

simulations. And it results that more

than 60% of the time is related to the section modify. So I guess this
is linked to the use of neigh_modify exclude type, is this right?

No, wrong. More likely it is due to the rigid integrator or other stuff you are doing in you input.