What features are used to train the model? Do we need to add extra features?

Calvin_Cui · March 18, 2023, 1:50pm

Hi,

I have a question about the input features used in ICET.

When we are building the training models, according to the example on ICET mannual, we basically just use lattice structures as input features, and mixing energy as output to train the model. I’m not sure if this is enough, since in many of the papers, people add hundreds of features, like atomic information, physical properties of the elementary substance, etc.

Do you think we need to add extra features? And if so, how can we achieve this in ICET? (Is it adding them to the properties dictionary in StructureContainer?)

Thank you so much.

Calvin_Cui · March 27, 2023, 1:25pm

Hi,
Is anyone familiar with this question? Look forward to your reply!

erikfransson · March 31, 2023, 6:11am

icet deals with cluster-expansion and does not add any additional features.

This sounds more like trying to describe/predict trends in alloys and materials, rather than for a single system predict properties based on the occupation of the lattice which is what cluster expansion are used for.

Calvin_Cui · March 31, 2023, 2:20pm

Hi Erik,

Thank you so much for your reply! It really helps!

Now I see the point. I read the source code of StructureContainer.get_fit_data(), it transforms the input structures into cluster vectors and target properties. At first I thought the cluster vectors generated may include some other features such as the elemental composition, the bond lengths and the coordination numbers. Now it seems that it only includes lattice structures and their occupations.

Say something off the topic, do you think it will be helpful to add additional features to predict the properties of a single system as what we do with CE?

Thank you so much!

erikfransson · March 31, 2023, 3:35pm

The cluster vector already encodes things like concentration of each species, number of nearest neighbours for each species etc.

I haven’t tried to include additional features and I dont think it would be necessary for most systems. Increasing the cutoffs and order of clusters will allow you to model more complex functions that depend on the occupation of the lattice.

Calvin_Cui · March 31, 2023, 3:45pm

Hi Erik,

Thanks for the reply!

I’m curious what type of clustering algorithm are you using in ICET? K-means or hierarchical clustering algorithm? And what features are included in this clustering algorithm?

Thank you so much!

erikfransson · March 31, 2023, 4:18pm

No clustering algorithm is used, see here for more backogrund on cluster expansions.