Sorry I don’t have experience with those cards and thus have no specific recommendation for the settings for them.
You can always experiment yourself by trying different settings used for OCL_TUNE = -DCYPRESS_OCL, as defined in lal_preprocessor.h. For newer cards with more compute units and more (global and local) memory, I would try to increase BLOCK_PAIR and BLOCK_NBOR_BUILD; the former may have more impact on the performance than the latter. You will need to make a clean build for libgpu.a after making the changes.