Use extensive or intensive/normalized properties?

Hi there,
probably a short (maybe stupid) question, but I couldn’t find an answer in the ICET documentation or the papers.

The properties that we feed in order to train a cluster expansion, do they need to be provided as extensive properties? For example, if I want to train a CE on the total energy of a system, do I need to provide the total energy as calculated for every single training structure? Or does ICET rely on normalized quantities and I need to normalize the total energy by the number of atoms in the respective training structure first?

I ask because my first feeling was to use total energies or energies of mixing in the way that they scale with the corresponding system size. Double the system = double number of present clusters = double the predicted total energy. But using the total energies as-is, the RMSE value looks really unreasonable, and I am not sure if it is a problem of the fitting or the data itself (because not normalized). Normalizing it looks much better, but then also the numbers are much smaller and the RMSE will be lower anyways…

Could anybody clarify?
Thanks a lot,

Hi @Marcel_S,

Not a stupid question at all! The properties you use need to be intensive, meaning energies have to be normalized by the number of atoms in the structure. The reason behind this is that each structure is described by its cluster vector, and the cluster vector is itself normalized by number of atoms – it carries no information about the how large the structure used to calculate it was. For example, think about two structures, one being a primitive structure and the other being that primitive structure repeated three times in periodic directions. They will have the exact same cluster vector, and thereby any property associated with them must be the same too.


1 Like

Hi Magnus,
thanks for the fast response, that made everything perfectly clear!

1 Like