KIM_API_allocate() and memory allocation by a test.

Hi,

I am building OpenKIM support into my simulation software Asap. I have a question or two about memory management, in particular relating to parallel MD where atoms will migrate between processors, and the total number of atoms will vary.

I see the following steps in the code.

1) Match the model with KIM_API_string_init()

2) Allocate memory.

The example tests use KIM_API_allocate(), but the documentation warns against using it with KIM_API_set_data to avoid memory leaks. Also, KIM_API_allocate() takes the number of particles as an argument, and can only be called once, so it appears to be unsuitable for parallel simulations with varying particle number. As I understand the docs, I can use KIM_API_set_data instead (hence the warning against doing both), but which data structures does KIM_API_allocate allocate (i.e. what should I do instead)?

I would assume that KIM_API_allocate both allocate some scalar data and some arrays, the arrays probably being the ones mentioned in the .kim file of the model. Is that a correct assumption, and should I instead allocate all data types mentioned in the .kim file (or the tests similar large string variable) one at a time? And do I need to specify the number of atoms both by setting the numberOfParticles (and numberContributingParticles) AND by specifying the shape of all arrays?

3) Calculate energies and forces.

4) Move atoms. If a migration occurs, go to 2, else go to 3.

Does this make sense?

Best regards

Jakob

I'm working on an answer (a long one) and hope to have it to you soon...

Cheers,

Ryan

Hi Jakob,

Thanks for your questions! I've taken a bit of time to respond in full so that I could respond comprehensively. Please see below

Hi,

I am building OpenKIM support into my simulation software Asap. I have a question or two about memory management, in particular relating to parallel MD where atoms will migrate between processors, and the total number of atoms will vary.

I see the following steps in the code.

1) Match the model with KIM_API_string_init()

2) Allocate memory.

The example tests use KIM_API_allocate(), but the documentation warns against using it with KIM_API_set_data to avoid memory leaks. Also, KIM_API_allocate() takes the number of particles as an argument, and can only be called once, so it appears to be unsuitable for parallel simulations with varying particle number. As I understand the docs, I can use KIM_API_set_data instead (hence the warning against doing both), but which data structures does KIM_API_allocate allocate (i.e. what should I do instead)?

To respond to this I've written new version of the KIM_API_init() and KIM_API_allocate() function documentation that describes in detail what they do. I'll paste this information below. I expect to release this updated documentation with the next openkim-api release. If you have any questions/suggestions/comments/corrections to help improve the documentation, they are always welcome.

int KIM_API_init(void *kimmdl, char *testname, char *modelname);

     This routine creates the KIM API object that can store every pointer to the
     data described in the KIM descriptor file for the model. It also checks if
     arguments described in descriptor files (Tests and Models) are compatible
     with KIM standard naming convention (stored in the file `standard.kim') and
     if Models are compatible with Tests. It will return KIM_STATUS_OK upon
     successful completion or KIM_STATUS_FAIL otherwise. Also if it is
     unsuccessful, kimmdl will have NULL value.

     Here the process of matching a Test KIM descriptor file and a Model KIM
     descriptor file is described in detail. For a final positive match result
     to occur, a positive match must be obtained in EACH AND EVERY ONE of the
     below steps:

     (1) Syntax and standard conformance: The Test and Model KIM descriptor
         files are read. Each item in the file is parsed and compared for
         correct syntax and conformance with the `standard.kim' file. If both
         the files a syntactically correct and conform to the `standard.kim'
         file definitions, a positive match is obtained. Otherwise, a negative
         match is obtained and the routine returns KIM_STATUS_FAIL.

     (2) Unit matching: If the Model's Unit_Handling value is `flexible' a
         positive match is obtained and the routine skips to the next step
         below. Otherwise, the Test's base unit values are compared, one by
         one, to the Model's base unit values. If all five values are equal, a
         positive match is obtained. Otherwise, a negative match is obtained
         and the routine returns KIM_STATUS_FAIL.

     (3) Particle type matching: For each particle type symbol name listed by
         the Test, the Model's list of particle type symbol names is searched
         for the corresponding entry. If every particle type symbol listed by
         the Test is found in the Model's symbol list, a positive match is
         obtained. If both the Model's and Test's list are empty, then the
         model and test support a single "unnamed" particle type and a positive
         match is obtained. Otherwise, a negative match is obtained and the
         routine returns KIM_STATUS_FAIL.

     (4) Neighbor list access matching: The following is a list of the possible
         cases and their matching result.

               Test Model Matching Result
               ----------- ------------- ---------------
               Iter Iter positive match
               Iter Loca negative match
               Iter Iter&Loca positive match
               Iter Both negative match
               Loca Iter negative match
               Loca Loca positive match
               Loca Iter&Loca positive match
               Loca Both negative match
               Iter&Loca Iter positive match
               Iter&Loca Loca positive match
               Iter&Loca Iter&Loca positive match
               Iter&Loca Both positive match
               Both Iter positive match
               Both Loca positive match
               Both Iter&Loca positive match
               Both Both positive match

     (5) NBC method matching: If one or more of the Test's listed NBC methods is
         also in the Model's listed NBC methods a positive match is obtained.
         When more than one common NBC method exists, the NBC method listed
         closest to the top of the Test's NBC method list is selected as the
         "active" NBC method. If no common NBC methods are listed a negative
         match is obtained and the routine returns KIM_STATUS_FAIL.

     (6) Matching for MODEL_INPUT arguments required by active NBC method:
         Depending on the active NBC method (determined in step (5) above), the
         Model's and Test's MODEL_INPUT section must include the following
         arguments:

               CLUSTER:
                     `coordinates'

               NEIGH_PURE_H:
                     `coordinates'
                     `numberContributingParticles'
                     `neighObject'
                     `get_neigh'

               NEIGH_PURE_F:
                     `coordinates'
                     `neighObject'
                     `get_neigh'

               NEIGH_RVEC_H:
                     `coordinates'
                     `numberContributingParticles'
                     `neighObject'
                     `get_neigh'

               NEIGH_RVEC_F:
                     `coordinates'
                     `neighObject'
                     `get_neigh'

               MI_OPBC_H:
                     `coordinates'
                     `boxSideLengths'
                     `numberContributingParticles'
                     `neighObject'
                     `get_neigh'

               MI_OPBC_F:
                     `coordinates'
                     `boxSideLengths'
                     `neighObject'
                     `get_neigh'

         If all required arguments are found in both the Model's and Test's list
         of MODEL_INPUT arguments, then a positive match is obtained.
         Otherwise, a negative match is obtained and the routine returned
         KIM_STATUS_FAIL.

     (7) Mandatory arguments matching: The following arguments are mandatory and
         must be included in any KIM descriptor file:

               MODEL_INPUT section
                 `numberOfParticles'
                 `numberParticleTypes'
                 `particleTypes' (not required if the
                                       SUPPORTED_ATOM/PARTICLES_TYPES
                                       section is empty. That is, if the Model
                                       and Test only support a single "unnamed"
                                       particle type.)
                 `coordinates' (this has already been confirmed in step 6)
               MODEL_OUTPUT section
                 `compute'
                 `reinit'
                 `destroy'
                 `cutoff'

         If all of the above arguments (except possibly, as described above,
         `particleTypes') are present in both the Model and Test KIM descriptor
         files, then a positive match is obtained. Otherwise, a negative match
         is obtained and the routine returns KIM_STATUS_FAIL.

     (8) MODEL_INPUT matching: If the Model's MODEL_INPUT list of arguments
         includes `process_dEdr' or `process_d2Edr2', then its MODEL_OUTPUT list
         of arguments may not contain `virial', `particleVirial', or `hessian'.
         If it does, a negative match is obtained and the routine returns
         KIM_STATUS_FAIL.

         If the Test's MODEL_INPUT list of arguments includes `process_dEdr' or
         `process_d2Edr2', then its MODEL_OUTPUT list of arguments may not
         contain `virial', `particleVirial', or `hessian'. If it does, a
         negative match is obtained and the routine returns KIM_STATUS_FAIL.

         If the Model's MODEL_INPUT list of arguments contains `process_dEdr'
         and the Test's MODEL_INPUT list of arguments does not contain
         `process_dEdr', then (in memory) the routine adds `virial' and
         `particleVirial' arguments, with the `optional' keyword, to the Model's
         MODEL_OUTPUT list of arguments.

         If the Model's MODEL_INPUT list of arguments contains `process_dEdr'
         and `process_d2Edr2', then (in memory) the routine adds `hessian', with
         the `optional' keyword, to the Model's MODEL_OUTPUT list of arguments.

         For each non-optional argument in the Model's MODEL_INPUT list of
         arguments, a corresponding entry must exist in the Test's MODEL_INPUT
         list of arguments. If at least one such argument is missing from the
         Test's list, a negative match is obtained and the routine returns
         KIM_STATUS_FAIL.

         For each argument in the Test's MODEL_INPUT list of arguments, a
         corresponding entry (either "optional" or required) must exist in the
         Model's MODEL_INPUT list of arguments. If all such arguments are
         present, then a positive match is obtained. If at least one such
         argument is missing from the Model's list, a negative match is obtained
         and the routine returns KIM_STATUS_FAIL.

         Finally, all optional arguments in the Model's MODEL_INPUT list of
         arguments which are not contained in the Test's MODEL_INPUT list of
         arguments have their `compute' flag set to KIM_COMPUTE_FALSE.

     (9) MODEL_OUTPUT matching: For each non-optional argument in the Model's
         MODEL_OUTPUT list of arguments, a corresponding entry must exist in the
         Test's MODEL_OUTPUT list of arguments. If at least one such argument
         is missing from the Test's list, a negative match is obtained and the
         routine returns KIM_STATUS_FAIL.

         For each argument in the Test's MODEL_OUTPUT list of arguments, a
         corresponding entry (either "optional" or required) must exist in the
         Model's MODEL_OUTPUT list of arguments. If all such arguments are
         present, then a positive match is obtained. If at least one such
         argument is missing from the Model's list, a negative match is obtained
         and the routine returns KIM_STATUS_FAIL.

   For each argument in the Test's MODEL_OUTPUT list of arguments, the
   argument's `compute' flag is set to KIM_COMPUTE_TRUE. If the `virial'
   or `particleVirial' arguments were added to the Model's MODEL_OUTPUT
   list of arguments in step 8 and any of these arguments are contained in
   the Test's MODEL_OUTPUT list of arguments, then the `process_dEdr'
   argument's `compute' flag is set to KIM_COMPUTE_TRUE. If the `hessian'
   argument was added to the Model's MODEL_OUTPUT list of arguments in
   step 8 and this argument is in the Test's MODEL_OUTPUT list of
   arguments, then the `compute' flags of the `process_dEdr' and
   `process_d2Edr2' arguments are set to KIM_COMPUTE_TRUE.

Arguments:

       void *kimmdl
               reference pointer to KIM_API_model object (in C++ style, the
               definition will be (KIM_API_model **)).

       char *testname
               null terminated character string that defines Test name. The
               routine prepends the Tests' directory string (defined by the Make
               variable KIM_TESTS_DIR, which is specified in the file
               Makefile.KIM_Config) to obtain the file location of the Test
               descriptor (.kim) file.

       char *modelname
               null terminated character string that defines the Model name.
               The routine uses this name to find the character string in memory
               that contains the Model's descriptor (.kim) file. At compile
               time the descriptor file is stored in the Model's binary library
               file. Thus, it is necessary to perform a `make' of the
               openkim-api package after editing a Model's .kim file.

Dear Ryan,

Thank you very much for your very comprehensive answers, and for the updated documentation. Just one more question, to make sure that I am doing the right thing.

I do the following:

1) call KIM_API_string_init() to match the mode.

2) set cutoff, numberOfParticles and numberOfParticleTypes pointer with KIM_API_set_data()

3) Initialize the model. I need to know the cutoff before I know how many ghost atoms will be on each processor, therefore I do not initialize arrays yet, and I have set the number of particles to 0.

4) Get neighborlist type, and set numberContributingParticle pointer if relevant.

5) Distribute atoms and prepare neighbor lists.

6) set coordinates, energy, forces etc etc with KIM_API_setm_data(). At this point the number of atoms is known. The total size of the arrays is given in the KIM_API_setm_data(). QUESTION: Do I also need to call KIM_API_set_shape() on each array? The examples do not do this, and the documentation seems to indicate that this overwrites some data (what really happens when this is called? Reallocation?) On the other hand, full info about the shape is not given in any other place, and KIM_API_set_shape() must serve some purpose :slight_smile:

7) Call the compute method and harvest the results. Do Molecular Dynamics. When atoms migrate between processors, jump to 5 (i.e. reallocate data and call KIM_API_setm_data() again).

Does this make sense?

Best regards

Jakob

Hi Jakob,

Some updated documentaiton:

int KIM_API_set_data(void *kimmdl, char *nm, intptr_t size, void *dt);
int KIM_API_set_method(void *kimmdl, char *nm, intptr_t size, func_ptr dt);

  These routines search for the string `nm' in the KIM API object `kimmdl'. If
  found, it stores in the KIM API object the value of `dt' which points to the
  location in memory where the data/method associated with `nm' is to be stored.
  Upon successful completion this routine returns KIM_STATUS_OK. If `nm' is not
  in the KIM API object this routine returns KIM_STATUS_ARG_UNKNOWN. If an
  existing pointer in the KIM API object is overwritten by this operation, a
  memory leak may result. (This could indicate that storage for the same data
  has been allocated more than once.) Thus, care must be taken if this routine
  is used in conjunction with KIM_API_allocate for array data. This routine
  assumes that the shape of the associated argument has been fully specified,
  except possibly for the `fast-index' (see discussion of shape entries in
  standard.kim). The routine computes the product of the shape entries not
  including the fast-index and sets a new shape value for the fast-index by
  dividing `size' by this product.

So, as long as an argument's shape is given in terms of *constants*, except for the fast-index, in the kim file (or is of rank 0) then you can simply use the KIM_API_set_data() routine and it will update the arguments fast-index shape value accordingly. This will work, for example, with the following standard argumnets:

   numberOfParticles (rank 0)
   numberContributingParticles (rank 0)
   numberParticleTypes (rank 0)
   particleTypes (rank 1)
   coordinates (rank 2 with slow-index of 3)
   particleCharge (rank 1)
   particleSize (rank 1)
   get_neigh (rank 0)
   process_dEdr (rank 0)
   process_d2Edr2 (rank 0)
   neighObject (rank 0)
   boxSideLengths (rank 1 with fast-index of 3)
   temperature (rank 0)

   compute (rank 0)
   reinit (rank 0)
   destroy (rank 0)
   cutoff (rank 0)
   energy (rank 0)
   forces (rank 2 with slow-index of 3)
   particleEnergy (rank 1)
   virial (rank 1 with fast-index of 6)
   particleVirial (rank 2 with slow-index of 6)

BUT, it will not work with the following standard arguments:

   hessian (rank 4 with 2nd-fastest-index unknown)

I do the following:

1) call KIM_API_string_init() to match the mode.

Good.

2) set cutoff, numberOfParticles and numberOfParticleTypes pointer with KIM_API_set_data()

Good.

3) Initialize the model. I need to know the cutoff before I know how many ghost atoms will be on each processor, therefore I do not initialize arrays yet, and I have set the number of particles to 0.

Good.

4) Get neighborlist type, and set numberContributingParticle pointer if relevant.

Good.

5) Distribute atoms and prepare neighbor lists.

Good.

6) set coordinates, energy, forces etc etc with KIM_API_setm_data(). At this point the number of atoms is known. The total size of the arrays is given in the KIM_API_setm_data(). QUESTION: Do I also need to call KIM_API_set_shape() on each array? The examples do not do this, and the documentation seems to indicate that this overwrites some data (what really happens when this is called? Reallocation?) On the other hand, full info about the shape is not given in any other place, and KIM_API_set_shape() must serve some purpose :slight_smile:

As described above, This will work because everything you have said you are using has all but the fast-index pre-determined. However, if you were using the `hessian' argument or some similar argument added in the future, then you would need to make a call to

   KIM_API_set_shape()

In order to specify all of the slow-indicies before you make a call to

   KIM_API_set_data()

7) Call the compute method and harvest the results. Do Molecular Dynamics. When atoms migrate between processors, jump to 5 (i.e. reallocate data and call KIM_API_setm_data() again).

Good.

Maybe this is clear. If not, keep asking!

Ryan