# [lammps-users] Efficiently Generating Samples By Selecting Sections From Larger Sample

Hi all,

I am running MD simulations of tensile tests on amorphous silica in LAMMPS. I would like to run 100s of trials of these tensile tests with samples of the same dimensions but different initial conditions. Generating 100s of these samples (starting with atoms of random velocities, heating, heat sustain, cooling, cool sustain) would take a very long time to run. I am considering instead generating a single very large sample and then taking random sections of this large sample, with each section being a sample. (I plan to load the final atom position on MATLAB and use MATLAB to select atoms inside a randomly positioned prism). I would then run each sample through a tensile test.

Is this approach ever used? Would it lead to issues (I’m concerned that the samples would be too similar because they are taken from the same larger sample)? How could I go about checking whether the large sample is sufficiently large?

My LAMMPS version is 9 Oct 2020.

Thank you!

That’s a very good point which I had not considered. However, I believe the boundary conditions for the small sample will be shrink-wrapped in all 3 directions.

How will you maintain periodicity in the small sample when you cut it from the large amorphous sample?

Sanjib

Two points :

Tensile tests without periodic boundary conditions are a problem and more difficult

Your 100s of calculations are an embarrassingly parallel problem and thus the time to solution can be easily reduced by a large factor.

Axel

With shrink-wrapped BCs, you have to consider sample with sufficiently large dimension to minimum surface or finite geometry effects. Otherwise, you won’t get accurate bulk mechanical properties. I am guessing you are interested in structure and mechanical properties.

In the following paper, I noticed at least 10 nm length is necessary to minimize the finite geometry effects.

https://www.sciencedirect.com/science/article/pii/S0013794418308877

But if you consider periodic sample, probably 3nm size sample would be good enough.

So I suspect, your approach with shrink-wrapped BCs wouldn’t be computationally efficient since you have to consider 100s of relatively large size model instead of small size periodic samples.

HTH.

Sanjib

Thank you Dr. Kohlmeyer!

"Your 100s of calculations are an embarrassingly parallel problem and thus the time to solution can be easily reduced by a large factor. "

I did not realize this: it should be very useful if I can figure out how to parallelize the problem properly (I’m using my university’s Seawulf cluster).

as I mentioned, this is embarrassingly parallel. you can just write a python (or bash or other script language) script that will generate your 100s of inputs for the 100s of calculations and put each in a different folder. then you can write a script that generates the submission scripts for the cluster where each bundles multiple runs into a single script.
if your cluster supports job arrays, you can exploit that feature as well. in the simplest case you would run each calculation in serial then you would place as many calculations as you have CPU cores into a single submit script then have in it:

(cd job###; lmp_serial -in in.job### -log log.job### ) &
(cd job###; lmp_serial -in in.job### -log log.job### ) &
(cd job###; lmp_serial -in in.job### -log log.job### ) &
(cd job###; lmp_serial -in in.job### -log log.job### ) &
(cd job###; lmp_serial -in in.job### -log log.job### ) &

(cd job###; lmp_serial -in in.job### -log log.job### ) &
wait

submit and wait until all are done and collect the results.

“With shrink-wrapped BCs, you have to consider sample with sufficiently large dimension to minimum surface or finite geometry effects. Otherwise, you won’t get accurate bulk mechanical properties. I am guessing you are interested in structure and mechanical properties.”

“Tensile tests without periodic boundary conditions are a problem and more difficult”

I’m basing the boundary condition on the previous work done in the research group I’m in (it was actually fixed BC, not shrink-wrapped, although I imagine the problem would be the same). We are running these tests for samples of different sizes, seeing the effect of size on structural and mechanical properties. Our goal is to determine the properties of that size as opposed to the properties of a bulk sample.

However, the sizes are still less than the 10 nm length, with the smallest being 2 nm.

Would using non-periodic BC be an issue for what we are attempting?

with PBC you can use fix deform and easily deform the entire sample homogeneously (because of the PBC the system will pull/push itself when you change the box size). with fixed or shrinkwrap you don’t have this option (changing the box size does not at all impact the sample) but have to pull on each side explicitly, which means you have surface and boundary effects, need to wait until changes due to the pulling have properly propagated throughout the sample, and have different behavior due to the pulling in those faces. hence you need the larger sample size and have to be careful whether the results are really comparable i.e. you first need to do a series of test calculations to confirm which geometry will reproduce the results you are looking for.

at the length and time scales of atomic simulations the construction of the system always needs to be taken into account when planning simulations. things are not as straightforward as with physical experiments, while at the same time, you have more control and can do things that are impossible in the real world.

axel.

Thank you Dr. Kohlmeyer!

"Your 100s of calculations are an embarrassingly parallel problem and thus the time to solution can be easily reduced by a large factor. "

I did not realize this: it should be very useful if I can figure out how to parallelize the problem properly (I’m using my university’s Seawulf cluster).

“With shrink-wrapped BCs, you have to consider sample with sufficiently large dimension to minimum surface or finite geometry effects. Otherwise, you won’t get accurate bulk mechanical properties. I am guessing you are interested in structure and mechanical properties.”

“Tensile tests without periodic boundary conditions are a problem and more difficult”

I’m basing the boundary condition on the previous work done in the research group I’m in (it was actually fixed BC, not shrink-wrapped, although I imagine the problem would be the same). We are running these tests for samples of different sizes, seeing the effect of size on structural and mechanical properties. Our goal is to determine the properties of that size as opposed to the properties of a bulk sample.

However, the sizes are still less than the 10 nm length, with the smallest being 2 nm.

Would using non-periodic BC be an issue for what we are attempting?

In the work we did previously, the argument made was that a PBC would be modelling a material in which the cracks formed during the tensile test are repeated infinitely, so it wouldn’t be realistic for determining tensile strength. Would this be a reasonable concern?

it all depends on what it is that you are comparing to experimentally and what kind of information you want to obtain and what kind of conclusions you want to derive from your simulations. at the length scale available to atomic scale models, you are limited in what you can compare to.

the concern about replicated cracks are indeed an issue, but then again if you do not use PBC you are looking at a tiny sample where the amount of surface is significant compared to the bulk. thus unless you have carefully checked how well you can represent what you want to study with the model you choose you cannot be certain of making the best choice. there is no simple “do this not that” answer. there should be plenty of previous similar studies of other materials in the published literature, and you will find that different people use different approaches with different justifications. This all is often influenced by what kind of background or training they have. People with a background in mechanical engineering have a tendency to apply their understanding of macroscopic systems to the microscopic scale (and sometimes overlook that things can be different at the atomic scale), while people with a background in condensed matter physics or quantum chemistry tend to put too much emphasis on interactions and dynamics of individual or groups of atoms and overlook that on the macroscopic scale a lot of the microscopic dynamics and fluctuations will cancel and only (subtle) net effects (that require very good statistical sampling to be identified correctly) will become relevant.

please note, that you can also have PBC only in the direction of loading but not orthogonal to it, but then you are studying an ultra-thin stick as a sample and not a bulk or macroscopic size material. I have also seen people use PBC, but relax x-, and y- while loading in z.

in short, you have to “pick your poison” and choose the approach where you have a better representation for what you need, and be aware of the limitations of that choice.

axel.

In the work we did previously, the argument made was that a PBC would be modelling a material in which the cracks formed during the tensile test are repeated infinitely, so it wouldn’t be realistic for determining tensile strength. Would this be a reasonable concern?

No. It makes no sense at all. First you need to spend a lot of CPU resources to equilibrate a large system, but only to cut it in pieces and thus destroy a lot of the benefit of having such a system. Even worse, you then have to reequilibrate every chunk you cut from it since the cutting is a very disruptive process causing shockwaves due to the surface atoms being far from equilibrium.

You could just start from the individual chunks and decorrelate/prepare them the same way and do this in parallel.

The longer I think about it, the worse it looks to me. I cannot predict the results, but if I would be betting money, I would rather put it on using periodic boundaries. My expectations are that the problems with fixed boundaries and having to put in extra effort (and atoms) to minimize surface effects would be worse that what you incur from pbc. But that is just a hunch and not a proper scientific assessment. My primary MD experience is with liquids where pbc are essential.

Axel.

Two things

1. you are discounting that the cost of the simulation does not scale by the number of atoms but by the number of pairs of atoms. For a larger block you have more of those per atom than for small ones (hence my concern about surface effects)

2. you can build a large variety of initial small block geometries with LAMMPS using the lattice, region, displace_atoms, and velocity commands. You can use many different lattice orientations, randomly displaced positions and different initial random velocities. Since you need to equilibrate after cutting, you don’t save anything.

Thank you! I will make sure to discuss this with my PI.

If we do use fixed BC, would the method in my original post make sense? I realize the problem would no longer be embarrassingly parallel, but I also wouldn’t have to model nearly as many silica atoms.

I realize this approach may not make as much sense as I thought it did.

I do want to clarify it a bit though. The benefit of doing this in my mind is because you don’t have to simulate nearly as many atoms as if you generated the samples separately. For example, the samples are 1X1X2 and the large sample is 2X2X2, so only four times the size. However, I could cut 72 different samples out from this large sample (with the samples being in different positions and different orientations within the large sample). By cutting it, I just mean selecting atoms in a certain region (using MATLAB). My fear though is that the samples would be too similar because they would of course have to share many of the same atoms.

I have tried my best, but it is apparently impossible to convince you to make some proper explorations and collect hard data. instead you keep sticking to concepts that are unproven and tangential and where arguments can be made either way. you have to look at the big picture and keep in mind that if you focus just on getting one detail done perfectly, you will likely mess up a whole bunch of others that are also of importance.

a very simple (and very obvious) way to make your preparation of systems even more effective would be to somewhat extend the period where you have melted the system and collect multiple, sufficiently decorrelated geometries and then start cool down runs from each of those and thus multiply the number of independent start geometries.

trying to throw AI at such a rather simple and straightforward procedure sounds more like adding “fairy dust” or “snake oil”.

this is my last response to this topic. in the end it is you who has to defend the choices you made, so make the choices that you are willing to defend. after all, “some dude on the internet told me to do it” is not a valid justification.

good luck,
axel.

Even using the fact that it’s embarrassingly parallel, I’m still afraid that generating samples may take a prohibitively long time on my university’s cluster. Do you know of any tools which generate the position of silicon and oxygen atoms in amorphous silica? There are papers which seem to use machine learning to successfully reproduce amorphous silicon (unfortunately not silica), but I can’t find any tools available to the public for this task. Thank you!