OpenCL error when increasing the simulation box size

wenu · April 18, 2023, 5:10pm

Hi,
I tried to simulate single layer graphene ribbon (the length direction is x) using tersoff.

First I ran with CPUs. There I noted that Increasing the empty space in both y and z directions in the box, the speed of the simulation is significantly reduced. However CPU is able to run the simulation.

Then I ran with GPU. There I noted same thing with increasing the box size. However at largest box size , the OpenCL error was occurred.

OpenCL error in file ‘…/lib/gpu/geryon/ocl_memory.h’ in line 645 : -4.

Then I changed the default value(2) of neighbor command to 3 (,4 and so on) and run with GPU. Again it runs fine with no errors.

I would like to know that why the provided default value of neighbor skin did not work with GPU.

Also, increasing the skin distance(5) from default, significantly affect the capturing the system dynamics (accuracy) in order to perform the simulation with GPU acceleration.

Are there any way to use GPU acceleration while accurate capturing the system dynamics ?
(like using neigh_modify , atom_modify)

Thank You
Wenu

akohlmey · April 18, 2023, 7:00pm

When reporting issues, please always report which version of LAMMPS you are using and what platform you are running on.

How many atoms has your system? What are the dimensions or the ribbon, what the dimensions of the box? How many CPUs (via MPI) are you using?

Similar information is needed: How many MPI processes? How many GPUs? Box and ribbon dimensions?

That is most certainly not solving the real problem. It just “blows up” your neighbor list.

There is some other - yet unidentified - issue. Changing the neighbor list skin is just masking it.

What do you mean by that? How does this manifest? How did you compile the GPU support? What kind of GPU hardware do you have?

It is impossible to comment on this without knowing any specifics. We would not include features in LAMMPS if we know that they don’t work correctly. The most likely issues are that either your expectations are not accurate or that there is a problem that you have not identified.

wenu · April 18, 2023, 7:56pm

Apologies for insufficient simulation details

I am using LAMMPS (23 Jun 2022 - Update 2) on Ubuntu 20.04

System has 135000 atoms ribbon length 800nm (in x direction) and width 5nm (in y direction).
simulation box size (0 -50nm -50nm) (810nm 50nm 50nm)
8 by 1 by 1 MPI processor grid was used.

4 by 1 by 1 MPI processor grid was used.
The simulation was failed at the same system size, simulation box size for the same ribbon.
However after increasing the skin distance from 2 to 5, simulation was run with out generating an error.

After changing the skin distance from 2 to 5 (in order to run with GPU acceleration)the obtained thermodynamics data were significantly different when compared to the obtained thermodynamics data from CPU with skin distance equal to 2.

akohlmey · April 18, 2023, 11:45pm

The additional information is still very terse and does not allow a detailed assessment because it is not possible to reproduce the issue with the information provided.

Please try updating to LAMMPS version 28 March 2023.
This includes significant improvements for the GPU package and bugfixes, specifically a few that you have been working around by increasing the skin distance (which is not the correct solution for that specific issue, but rather you would have needed to increase the ghost atom cutoff).

wenu · April 19, 2023, 4:03am

Thank you very much for your reply.

Here I upload the script & output log.
graphene.in (2.6 KB)
log.lammps (10.9 KB)

The issue is when increasing the box size it generate the
OpenCL error in file ‘…/lib/gpu/geryon/ocl_memory.h’ in line 645 : -4.

akohlmey · April 19, 2023, 5:30am

In which direction? by how much?

wenu · April 19, 2023, 5:42am

In Z direction by changing the lattice vector to
a3 0 0 500
(I tried to replace the a3 0 0 100 by a3 0 0 500 then the issue was occurred. changing in z length 2.49*400 ~ 99.8nm)

akohlmey · April 19, 2023, 6:07am

I have no problem running your input with OpenCL when using the latest LAMMPS version and removing the (aggressive) “neigh_modify delay 10 every 2” line and instead stick with the (safe) default of “neigh_modify delay 0 every 1 check yes” setting.

Also the force values are the same. Due to the difference of “volume”, the computed pressure is different (what is the correct “pressure” for such an open system is a difficult to answer question) and thus fix npt will result in a different changes of the box size. But beyond that, the energies and forces are the same for the different box sizes.

This confirms my hunch that the issue in question here was indeed fixed by this commit change warning message and get the comm cutoff properly · lammps/lammps@2eb125d · GitHub
in October 2022 in combination with the other updates and improvements to the GPU package.

wenu · April 21, 2023, 7:28am

I tried the LAMMPS versions “Jun 2022 - update 2” and “28 Mar 2023”, but the issues persisted.

The system has ~150K atoms creating a single-layer Graphene sheet and the Tersoff potential was employed.

The dimensions of the Graphene ribbon are ~7500A 50A 1.5A and the dimensions of the simulation box are ~7500A 2160A 3456A.

Both the Y and Z directions have free spaces while the sample is placed in the center of the box. The two respective directions have fixed BC applied and for the X direction, periodic BC were applied.

The following problems occurred,

LAMMPS execution with 2 mpi tasks terminated giving the error “Too many atom sorting bins”

However, with an increased number of mpi tasks, LAMMPS was able to handle the system.

Even though LAMMPS was able to handle the situation with a greater number of mpi tasks, the ram utilization was quite high, over 50GB.

Sometimes, LAMMPS exits without an error due to overutilization of the available ram capacity. I have seen silicon systems of over 500K atoms (without this much free space in the simulation box though) run without any issue utilizing less than 2GB of ram.

So my concerns are,

Is there a maximum number of neighbor bin counts that a single mpi task can handle?
Is there a way to prevent LAMMPS from assigning neigh_bins for free spaces? My system with a lot of free space had (3651 1054 1687) neigh_bins while the silicon system I mentioned before had (151 500 152) bins. Of course, the reason for higher ram utilization is the increased number of neigh_bins in my system. But I need this extra free space to prevent the graphene ribbon from deforms out of the simulation box due to flexural phonon modes .

Your assistance would be greatly appreciated.

srtee · April 21, 2023, 11:03am

How about neighbor nsq then?

akohlmey · April 21, 2023, 11:10am

If you have non-periodic boundaries, there should be no need to leave empty space. With “m” or “s” boundaries, the box can been expanded as needed (but only as far as needed) and losing atoms avoided, yet still the physics of the system remains the same.

Leaving lots of empty space is quite inefficient, especially with a large number of MPI tasks, since LAMMPS uses a volume based domain decomposition to parallelize over MPI tasks. There is not much help coming from adding MPI tasks, if they have no “owned” atoms.

akohlmey · April 21, 2023, 11:15am

Sorry, but this makes no sense. Any center of mass drift during production calculation should be avoided by resetting it after equilibration. If the ribbon would still show a drift, it can be prevented from it by applying a global restraint using fix spring in tether mode to the entire ribbon. With “s” or “m” boundaries, the box will be expanded as far as atoms travel.

wenu · April 21, 2023, 3:09pm

Thank you for your the explanatory reply.

wenu · April 21, 2023, 3:24pm

Hi,
I like know that what is the effect or the specification of using neigh nsq on the system.
However, using nsq for this system has given no help to resolve the issues.

akohlmey · April 21, 2023, 4:02pm

The effect of using the nsq setting to construct neighbor lists is documented in the neighbor command documentation. However, that setting only applies to GPU package runs, if you use the neighbor list build on the CPU via -pk gpu 0 neigh no.

But, as I already explained, the issue is due to bad system setup choices and thus the pointless use of empty space should be stopped and then neighbor list construction should work normally and LAMMPS will also exhibit better performance due to better load balancing.