Problem with user defined fix in parallel

Hello, everyone.

I’ve some trouble with my FIX in parallel.

The concept of MY FIX is:

  1. Allocate memory for data.
  2. Call MY_FIX’s member function in pair_XXX::compute() by modify->fix[ifix]->function(i, j, nlocal…), since I need pair force to calc virial term.
  3. Dump data.

It worked in serial, however it was stuck in parallel. The information printed for diagnosis is as follows
(printf(i, j ,nlocal) is at the end of my function, in.test runs in parallel with 2 threads)
i j nlocal
2361 2364 2401
2395 2397 2399

2364 2366 2401
2397 2398 2399

2364 2365 2401
2398 2960 2399

2365 2070 2401
2398 2963 2399

2365 2363 2401
It was stuck here. It seems one thread has completed the compute while the other one hasn’t. Both CPU usage are 100%.
I’m not familiar with MPI. Can anyone explain the reason why it stopped and provide some solutions. Thanks.

Best Regards,

M.C. Wang
School of Mechanical Science & Engineering
Huazhong University of Science & Technology
Tel: +86 152 7181 0218

Hello, everyone.

I’ve some trouble with my FIX in parallel.

The concept of MY FIX is:

  1. Allocate memory for data.
  2. Call MY_FIX’s member function in pair_XXX::compute() by modify->fix[ifix]->function(i, j, nlocal…), since I need pair force to calc virial term.
  3. Dump data.

this makes no sense. why not just create a derived class of the pair style? and which pair style? most pair styles do tally the virial contributions as they are using the ev_tally*() functions or - if possible - using Pair::virial_fdotr_compute() and store it for access to compute pressure.

It worked in serial, however it was stuck in parallel. The information printed for diagnosis is as follows
(printf(i, j ,nlocal) is at the end of my function, in.test runs in parallel with 2 threads)
i j nlocal
2361 2364 2401
2395 2397 2399

2364 2366 2401
2397 2398 2399

2364 2365 2401
2398 2960 2399

2365 2070 2401
2398 2963 2399

2365 2363 2401
It was stuck here. It seems one thread has completed the compute while the other one hasn’t. Both CPU usage are 100%.
I’m not familiar with MPI. Can anyone explain the reason why it stopped and provide some solutions. Thanks.

impossible to say based on just your say-so.

axel.

Axel Kohlmeyer <[email protected]> 于2019年5月13日周一 下午8:25写道:

Hello, everyone.

I’ve some trouble with my FIX in parallel.

The concept of MY FIX is:

  1. Allocate memory for data.
  2. Call MY_FIX’s member function in pair_XXX::compute() by modify->fix[ifix]->function(i, j, nlocal…), since I need pair force to calc virial term.
  3. Dump data.

this makes no sense. why not just create a derived class of the pair style? and which pair style? most pair styles do tally the virial contributions as they are using the ev_tally*() functions or - if possible - using Pair::virial_fdotr_compute() and store it for access to compute pressure.

Thanks for your reply, Axel. Morse & Tersoff are the modified pairs for now. What I want to do is tally the virial term to the spatial points(not atom->vatom[ ]). So ev_tally and virial_fdotr_compute are not working.

actually, for pair style morse, you can still use the mechanism used by the computes in the USER-TALLY package to hook a callback into the ev_tally() call and thus avoid having to make any modifications to that pair style and only implement a new compute.

sadly, for pair style tersoff, the situation is a bit more complicated and thus the callback mechanism is not available.

axel.

Axel Kohlmeyer <[email protected]> 于2019年5月13日周一 下午8:25写道:

It worked in serial, however it was stuck in parallel. The information printed for diagnosis is as follows
(printf(i, j ,nlocal) is at the end of my function, in.test runs in parallel with 2 threads)
i j nlocal
2361 2364 2401
2395 2397 2399

2364 2366 2401
2397 2398 2399

2364 2365 2401
2398 2960 2399

2365 2070 2401
2398 2963 2399

2365 2363 2401
It was stuck here. It seems one thread has completed the compute while the other one hasn’t. Both CPU usage are 100%.
I’m not familiar with MPI. Can anyone explain the reason why it stopped and provide some solutions. Thanks.

impossible to say based on just your say-so.

axel.

I still have no clue on it. The following details may help to understand the problem.

pair_tersoff::compute() {
for(ii=0; ii<inum; ii++)

for(jj=0; jj<jnum; jj++)

modify->fix[ifix]->My_Function; ### Added lines
printf(****%d + %d + %d, i, j, nlocal); ### For testing

printf("%d - %d - %d",i, j, nlocal); ### For testing
}

The function works fine in serial. But the test was stuck in parallel.
in.test has 4800 atoms, runs by 2 processors named P0 and P1.

Output of P0 Output of P1
2372 + 2947 + 2400 2396 + 2397 + 2400
2372 + 2374 + 2400 2396 + 2399 + 2400
2372 + 2951 + 2400 2397 + 2395 + 2400
2399 - 2396 - 2400

No more lines printed on the screen, it was stuck and CPU usage was 100% for both P0 & P1.
P1 has completed the force->pair->compute(), while P0 was still processing data.
I think P1 would wait for P0. It seems it is abnormal after adding lines in tersoff ( modify->fix[ifix]->My_Function )

M.C.

Axel Kohlmeyer <[email protected]> 于2019年5月13日周一 下午8:25写道:

It worked in serial, however it was stuck in parallel. The information printed for diagnosis is as follows
(printf(i, j ,nlocal) is at the end of my function, in.test runs in parallel with 2 threads)
i j nlocal
2361 2364 2401
2395 2397 2399

2364 2366 2401
2397 2398 2399

2364 2365 2401
2398 2960 2399

2365 2070 2401
2398 2963 2399

2365 2363 2401
It was stuck here. It seems one thread has completed the compute while the other one hasn’t. Both CPU usage are 100%.
I’m not familiar with MPI. Can anyone explain the reason why it stopped and provide some solutions. Thanks.

impossible to say based on just your say-so.

axel.

I still have no clue on it. The following details may help to understand the problem.

no, it doesn’t help …and i already mentioned, that what you are doing is very bad design and should be done differently.
fixes are not features, that should be called from within innerloops. fixes are classes that are called at regular intervals at specific reference points in the MD or minimizer loop.

also, for performance reasons, you want to avoid non-inline function calls in inner loops altogether.

axel.