What features are used to train the model? Do we need to add extra features?


I have a question about the input features used in ICET.

When we are building the training models, according to the example on ICET mannual, we basically just use lattice structures as input features, and mixing energy as output to train the model. I’m not sure if this is enough, since in many of the papers, people add hundreds of features, like atomic information, physical properties of the elementary substance, etc.

Do you think we need to add extra features? And if so, how can we achieve this in ICET? (Is it adding them to the properties dictionary in StructureContainer?)

Thank you so much.

Is anyone familiar with this question? Look forward to your reply!

icet deals with cluster-expansion and does not add any additional features.

This sounds more like trying to describe/predict trends in alloys and materials, rather than for a single system predict properties based on the occupation of the lattice which is what cluster expansion are used for.

Hi Erik,

Thank you so much for your reply! It really helps!

Now I see the point. I read the source code of StructureContainer.get_fit_data(), it transforms the input structures into cluster vectors and target properties. At first I thought the cluster vectors generated may include some other features such as the elemental composition, the bond lengths and the coordination numbers. Now it seems that it only includes lattice structures and their occupations.

Say something off the topic, do you think it will be helpful to add additional features to predict the properties of a single system as what we do with CE?

Thank you so much!

The cluster vector already encodes things like concentration of each species, number of nearest neighbours for each species etc.

I haven’t tried to include additional features and I dont think it would be necessary for most systems. Increasing the cutoffs and order of clusters will allow you to model more complex functions that depend on the occupation of the lattice.

Hi Erik,

Thanks for the reply!

I’m curious what type of clustering algorithm are you using in ICET? K-means or hierarchical clustering algorithm? And what features are included in this clustering algorithm?

Thank you so much!

No clustering algorithm is used, see here for more backogrund on cluster expansions.