Hi,
I'm trying to run GPU benchmarks using both OMP and CUDA packages.
I've tried as follows.
Here I want use 1 GPU, 1 MPI process, 4 OMP threads.
$ env OMP_NUM_THREADS=4 mpirun -np 1 ../../src/lmp_hp6 -sf omp -sf cuda -v g 1 -v x 2 -v y 2 -v z 1 -v t 100 < ../GPU/in.rhodo.cuda
But, it looks like that OMP directive is ignored.
I think MPI works well with CUDA.
I could get performance differences.
But, for OMP, changing OMP_NUM_THREADS value doesn't affect the performance at all.
Any idea?
Thanks,
David