Running two MPI threaded Lammps jobs on the same node

Matthew_Wander · November 1, 2016, 1:40pm

So we had it tested and it turns out there was no memory leak in our part of the code. We are quadruple checking this (already ran it on three different checkers) but I am pretty confident this is not the problem. Does anyone have any other ideas?

thanks
matthew

_James_Kress · November 1, 2016, 2:49pm

Matthew,

Are you running both jobs out of the same directory?

Jim

_Diaz_Adrian · November 1, 2016, 4:03pm

You’ve made sure there’s no memory leak running a single job correct?

Matthew_Wander · November 1, 2016, 4:37pm

James: Yes. The files are named such (and output and other files as well) so that there shouldn’t be any files overwriting one another. I can run two jobs out of the same directory on different nodes and there is no problem.

Diaz: we had two different people run memory leak checks and neither came up with anything.

The only new information I have is that there are small differences in the force computation (on the order of 0.01kcal/Å2) which are present whether newton flag is on or off.

matthew

akohlmey · November 1, 2016, 10:21pm

James: Yes. The files are named such (and output and other files as well) so
that there shouldn't be any files overwriting one another. I can run two
jobs out of the same directory on different nodes and there is no problem.

this is irrelevant. LAMMPS doesn't use any temporary scratch files and
thus you cannot corrupt internal memory structures this way.

Diaz: we had two different people run memory leak checks and neither came up
with anything.

The only new information I have is that there are small differences in the
force computation (on the order of 0.01kcal/Å2) which are present whether
newton flag is on or off.

this is irrelevant.

there is no way to debug this from remote without specific information
and the opportunity to reproduce it.
many people have claimed that they checked their code or inputs over a
100 times and there were still errors in them.

at the moment, you don't even provide dependable information
confirming the unexpectedly large memory consumption.
if this is a shared use node, it could also be a different user
creating problems.

axel.

Stan_Moore · November 1, 2016, 11:03pm

Let's take a step back: what node are you running on? Is this on a supercomputer? What do you mean by difficulty? Is it slow, does it crash?

Stan

Matthew_Wander · November 2, 2016, 12:03pm

I think I have actually figured out the problem. (Among others that this discussion has revealed so thank you all for all of the helpful comments). It turns out a file that I thought was being only accessed once at the start of the calculation, I believe is in fact being accessed at every time step. Running the jobs in separate directories actually appears to solve the problem.

This leads me to a general question of the design of pair styles in the code. Most appear to have functions for running (compute etc.), and setup (init_one, settings etc.) How often are the latter accessed and run? If more often than once then why?

thank you all very much again
matthew

akohlmey · November 2, 2016, 1:55pm

I think I have actually figured out the problem. (Among others that this
discussion has revealed so thank you all for all of the helpful comments).
It turns out a file that I thought was being only accessed once at the start
of the calculation, I believe is in fact being accessed at every time step.

please stop making such claims without proof.

Running the jobs in separate directories actually appears to solve the
problem.

This leads me to a general question of the design of pair styles in the
code. Most appear to have functions for running (compute etc.), and setup
(init_one, settings etc.) How often are the latter accessed and run? If more
often than once then why?

again, this is irritatingly vague. what is called when depends very
much on the specific input and calculation style. LAMMPS is extremely
flexible, so it is better to avoid any kind of blanket statements, and
explaining each and every items would be a massive writing effort and
beyond the scope of this mailing list.

thus please have a look at: http://lammps.sandia.gov/doc/Developer.pdf
where some of the inner workings and specifically the general flow of
control during a time step is described.

i suggest you ask more specific questions based on that and give
examples of what kind of calculations and specific scenarios you have
questions about.

axel.

sjplimp · November 2, 2016, 4:31pm

This leads me to a general question of the design of pair styles in the code. Most appear to have functions >for running (compute etc.), and setup (init_one, settings etc.) How often are the latter accessed and run? If >more often than once then why?

Compute() is called every timestep and once by Verlet::setup() before

a run starts.

The init() methods are called once per run, before the run starts.

Settings() and coeff() are called when the pair style or pair coeffs command is

processed in the input script.

Steve