Calibrating the GULP results against results done in similar software?

In my work to be sure I do not make errors, I try to calculate the same thing 2x by different programs and check the results are identical enough. In such way I was e.g. able to identify my problems with generating UFF and Dreiding input for GULP (by comparison with Materials Studio GULP and Forcite results) …
I am now trying to calibrate in the same way gaff/gaff2 results generated by GULP.
In the supplementary data of following publication:

I have found 21 CIF files optimized by gaff + RESP charges - calculation was done by CHARM engine.
My idea was to recalculate in GULP this data with as much similar setting as possible. So I had used gaff forces + RESP charges calculated by Orca QM code. I had expected to get in the same local minima with some numeric-related minor differences.
Unfortunately the results differ a lot. The GULP results give sense - they do not look wrong, but torsion angles and the molecule position differs significantly as well as lattice parameters. The differences in torsion angles are up to 30 degree level …
It will be interesting for me to know:
a) Whatever my GULP input is wrong
b) Whatever the original article data are wrong
or what is happening …
Questions:

  1. Does anybody have some 100% correctly minimized molecular crystal by gaff or gaff2 in any software (GULP,Lammps, CHARM whatever), so I can check I can reproduce the results ?

  2. Any advice how to investigate this situation will be welcome.

If anybody interested, I can send GULP inputs generated for this 21 structures by my for investigation (give me E-mail).