I have been using SGE and MPICH for my LAMMPS simulations in our clusters
but next month we will be getting new machines and I have the opportunity
to either stick to MPICH or change to OpenMPI. You mentioned that it would
be better to use OpenMPI and dump MPICH compleletly. If I were to migrate
yes, that is my personal opinion. i am certain at least the mpich
to OpenMPI, should I stick to SGE or do you know of a better "free" Grid
that would be basically an independent decision. it has not impact
on the MPI performance. most machines i'm running on, use Maui/Torque
or their commercial counterparts, but SGE has been working fine
for me as well. it is more a question of convenience to set
up and flexibility. for the typical workload in our group (a mix of
serial and different size parallel jobs that are usually broken
into segments of no longer than 24 hours) the default configuration
of maui/torque works very well.
this gives me another opportunity to advertise openmpi.
in both cases you don't even need ssh/rsh to start your parallel
job. openmpi supports several schemes and that includes SGE and
Torque/PBS to launch parallel jobs. this has several advantages
from the administrative point of view:
- you actually keep track of the cpu time used by the parallel job
(and not the cputime used by mpirun, which is next to nothing and
thus makes fair share scheduling impossible unless configured to
use wall time instead of cpu time).
- the batch system has control over all processes and if you delete
a job (or its wall time expires) they get killed (and you may not
need to write a script to clean up, which can be tricky unless you
give only exclusive access to individual nodes).
- you don't have to worry about making sure that the -np argument
matches the number of processors/cores assigned to the job.
you can run mpirun without -np and it will start one mpi task
per allocated core.
hope that helps,