Atom property calculation in source code

_Rohskopf_Andrew_D · June 14, 2015, 6:05pm

Hello,

I have coupled LAMMPS to a program I am making in C++. I am doing this to create a single instance of LAMMPS while doing multiple MD runs to calculate the forces on atoms using the "dump" command. I am now interested in reducing overhead in my code by eliminating the need for LAMMPS to read data files and write dump files. I would therefore just like to directly compute the forces, however LAMMPS does it, and then store the forces as variables within my own code.

My main questions: What .cpp file in the LAMMPS source code is responsible for computing atom properties. How can I use this source code to make atom properties variables in my code without writing them to dump files?

Thanks for your time.

akohlmey · June 14, 2015, 6:13pm

Hello,

I have coupled LAMMPS to a program I am making in C++. I am doing this to create a single instance of LAMMPS while doing multiple MD runs to calculate the forces on atoms using the "dump" command. I am now interested in reducing overhead in my code by eliminating the need for LAMMPS to read data files and write dump files. I would therefore just like to directly compute the forces, however LAMMPS does it, and then store the forces as variables within my own code.

wow. this is a weird way to do such a thing, particularly coming from
a person that already does programming and not scripting.

that being said, do you have a measure how much overhead you incur
from reading/writing files or from LAMMPS' startup or in general where
the time is spent?

My main questions: What .cpp file in the LAMMPS source code is responsible for computing atom properties. How can I use this source code to make atom properties variables in my code without writing them to dump files?

there is no single file. different properties are determined in
different ways. some are input data, some are part of the regular MD
time step, some need to be requested explicitly. i cannot go into too
much detail without giving you a long lecture on MD and repeating what
is written in the LAMMPS developer's guide.

why don't you use the library interface? or perform the equivalent
operations directly from C++ after creating a LAMMPS class instance.
that setup is made particularly for such purposes. please pay
attention to the MPI library set up, when you roll your own.

axel.

sjplimp · June 15, 2015, 2:10pm

As Axel indicates, the library interface to lammps in src/library.cpp

has many functions that allow you to access LAMMPS internals.

You can retrieve a pointer to atom coords and forces, for example,

or invoke computes and return their values to the caller.

Steve

_Rohskopf_Andrew_D · June 17, 2015, 4:48am

Thanks for your advice. I used the interface library to retrieve pointers to atom coordinates and forces and this has slightly reduced the overhead in my code since LAMMPS is no longer reading data files and writing dump files.

After this I have realized that ~95% of my overhead is coming from the “run” command, even after using “pre no post no”.

I am calling “run 0 pre no post no” up to 10^8 times in my C++ code, and it takes about 7 days. I know this is the “run” command because my code executes in less than a minute when not calling “run” at all.

I am calling “run 0 pre no post no” so many times because I am fitting the forces of many configurations to ab initio forces, so I need a large population and many trials to get more accurate fits. My fitting algorithm is sufficiently fast (less than a minute for many parameters), but the “run” command is killing the time.

Is there a faster way to get the forces from the “0th” time step?

I know I can expect some overhead calling the run command this many times, but are there any other ways to reduce it? The only way I can think of is to inhibit LAMMPS from outputting data to the screen. Please let me know if there are any other ways.

Thanks for your time.

sjplimp · June 17, 2015, 1:17pm

If you ran 10^8 timesteps in a single run, how

long would it take for your system? If it’s roughly

7 days, then the issue isn’t the overhead of

“run 0”.

If you use run 0 with pre no, then LAMMPS is

doing very little before it invokes your pair

style and computes forces.

You could always write your own library method

that invokes the pair style compute() method directly,

if you do it appropriately.

Steve

akohlmey · June 17, 2015, 1:27pm

Thanks for your advice. I used the interface library to retrieve pointers to
atom coordinates and forces and this has slightly reduced the overhead in my
code since LAMMPS is no longer reading data files and writing dump files.

After this I have realized that ~95% of my overhead is coming from the "run"
command, even after using "pre no post no".

I am calling "run 0 pre no post no" up to 10^8 times in my C++ code, and it
takes about 7 days. I know this is the "run" command because my code
executes in less than a minute when not calling "run" at all.

I am calling "run 0 pre no post no" so many times because I am fitting the
forces of many configurations to ab initio forces, so I need a large
population and many trials to get more accurate fits. My fitting algorithm
is sufficiently fast (less than a minute for many parameters), but the "run"
command is killing the time.

Is there a faster way to get the forces from the "0th" time step?

how large is your system? what is your LAMMPS input? have you done
some profiling where the time is spent?
most likely, it is spend actually computing the forces and thus the
only way to speed this up would be to parallelize the computation.

I know I can expect some overhead calling the run command this many times,
but are there any other ways to reduce it? The only way I can think of is to

i don't think there is much overhead.

inhibit LAMMPS from outputting data to the screen. Please let me know if

this is pure speculation and i seriously doubt it.

there are any other ways.

the only way to give meaningful advice and make informed choices about
how to address performance problems to do proper profiling of the
application and not the empirical guessing that you have done so far.

axel.

Stefan_Paquay · June 17, 2015, 9:13pm

inhibit LAMMPS from outputting data to the screen. Please let me know if

this is pure speculation and i seriously doubt it.

Sorry if I misunderstood what you meant, but in my experience, dumping lots of stuff to screen is really slow. I do agree that it is speculation though, but it is easily tested by redirecting the screen output to /dev/null. Other than that, yes, profiling is the only way to be certain.

akohlmey · June 17, 2015, 9:24pm

inhibit LAMMPS from outputting data to the screen. Please let me know if

this is pure speculation and i seriously doubt it.

Sorry if I misunderstood what you meant, but in my experience, dumping lots
of stuff to screen is really slow. I do agree that it is speculation though,

why would there be lots of output? with sane settings in your input
there should be very little output. at least very little compared to
the amount of work been done.

but it is easily tested by redirecting the screen output to /dev/null. Other

that is not correct. when sending data to /dev/null, you *still* have
to create formatted output and copy it. only if you create a massive
output stream that would produce far more data than your OS can absorb
and buffer in your ram and write out asynchronously, this would have
some impact. especially the formatting of floating point numbers is
quite time consuming as it requires calls to the log() function. of
course, reading/parsing is even worse.

than that, yes, profiling is the only way to be certain.

please let me add, that i find it quite irritating that you don't
provide us with essential information that would remove a lot of the
speculation. both, steve and i have asked for it, yet you consistently
continue with speculations and do not provide hard data. e.g. if you
would provide a representative example input deck that you would
generate and run and the corresponding output, then we can have a
*much* better estimate of what is going on. people with sufficient MD
experience can quickly assess how much computational effort an input
deck would create. also, with kernel level profiling, a simple profile
will require next to zero additional effort and would make most of our
arguing superfluous.

axel.

Stefan_Paquay · June 17, 2015, 9:28pm

Erm… I think (hope) you mistook me for the original poster?

akohlmey · June 17, 2015, 9:32pm

Erm... I think (hope) you mistook me for the original poster?

yes. i am not used to have people on both sides of the discussion at
the same time.

Stefan_Paquay · June 18, 2015, 10:10am

Haha, ok.

Just to clarify what I meant: I found that when my program dumped many lines to screen each second, then there was a sort-of micropause after each line because the output to the terminal emulator was flushed, which throttled the entire program. When redirecting this went away because the output to file was buffered. Not sure if this was a std::cout (and on that matter, I do not know if fprintf works differently), terminal emulator or OS thing.

As you already pointed out, these points are moot if there is little output to screen.