The purpose of this thread is to facilitate discussion of methods for unique identification of interatomic models, to be adopted by NIST, KIM, etc. It has been requested that an end date be placed on discussions, after which a decision must be made. Thus, I propose that discussions remain open to any interested party until 5:00 PM EDT, September 12, 2014.
I will initialize discussions. I propose a system that will identify both the description of an interatomic model and subsequent instantiations:
Description Identifier: E_A_Y_U
E: Element or compound information
A: Author information
Y: Publication year or year of availability
U: Unique string, number, DOI, etc.
Instantiation Identifier: E_A_Y_U_I
I: Additional information relating to the specific implementation of the model.
Examples:
Ni-Al-Co_SmithJJ_2014_SAX1EYJ7
---> Ni-Al-Co_SmithJJ_2014_SAX1EYJ7_KIM_EAM_Dynamo__MO_123412341234_001
---> Ni-Al-Co_SmithJJ_2014_SAX1EYJ7_EAM_Alloy_setfl_MD5_99a175c11698c523c4e6d84dbbcbfd12
---> Ni-Al-Co_SmithJJ_2014_SAX1EYJ7_EAM_Phi_table_MD5_3d48be9074b91eabad33952ba71adc8f
ZnS-H20_DoeJJ_2008_DOI:10.10/101010/1010
---> ZnS-H20_DoeJJ_2008_DOI:10.10/101010/1010_REAX_MD5_00860e77f28b404d47e017f46b809953
A couple of basic constraints from the OpenKIM side:
* The general form of our openkim.org ID is:
PREFIX__CC_DDDDDDDDDDDD_VVV
- PREFIX is limited to 100 characters maximum and only alpha-numeric
characters (including underscore) are allowed. (In particular, dashes,
dots, colons, slashes, etc. are NOT ALLOWED)
- The only "double underscore" allowed is the one between the PREFIX and CC
parts.
- CC is a two-letter alphabetical code describing the KIM Item Type:
MO - Model
MD - Model Driver
TE - Test
TD - Test Driver
RD - Reference Data
VZ - Visualizer
MV - Model Verification
TV - Test Verification
VV - Visualizer Verification
- The DDDDDDDDDDDD is a 12-digit unique decimal number randomly assigned to
each KIM Item
- The VVV is a 3-digit version number starting at 000
* Generally, I would advocate for keeping the entire ID as short as possible.
Discussion areas:
Identifier components (E, A, Y, U, and I):
Are these sufficient, excessive?
* This seems like plenty.
* The order in which these appear is worth considering. These Ids will often show up in alphabetical lists (such as from 'ls -1') and this ordering will be more or less useful depending on the order of these components.
Currently on openkim.org we recommend
<model type/name>_<developer name(s)>_<model_info>_<supported specie(s)>__CC_...
Roughly speaking this corresponds to your: U_A_Y_E_I
This "sorts by model type", where as your ordering, E_A_Y_U_I, "sorts by species/elements".
E: Element or compound information
Do we want to simply list elements (Zn-S-H-O) or include more information (Zn-ZnS-H2O-O-H)? Recommendations for convention?
* OpenKIM would require the use of underscores instead of "-".
* Having the "primary" species first is good since these can be ordered
alphabetically for easy search in a list (as in a file listing a-la "ls -1")
* A simple list will be shorter, but has less information; I don't really have
any strong feeling here.
A: Author information
First or all?
* All is typically not possible. I would probably suggest first and second
with _et_al_ when more that 2 authors exist.
Y: Publication year or year of availability
People commonly refer to a model by the author and year, so I think it is important to include both. Thoughts?
* Agreed, the year is worth having
U: Unique string, DOI, number, etc
This can be useful in the situation that an author publishes multiple models in the same year and/or for more information, such as paper DOI. Thoughts?
* Good as long as we limit to alpha-numeric (with underscores)
I: Instantiation Identifier
String with few limitations used to describe instantiation. Thoughts?
I think you need to say more about what an "instantiation" is. If I understand your meaning, I think OpenKIM would treat each "instantiation" as a separate Item and assign different 12-digit DDDDDDDDDDDD codes to them.
Is that what you have in mind, or something different?
Ryan