In my opinion such a technique could be risky. There is really no guarantee that the structure produced by structure predictor will correspond to the one in your data set. The structure prediction could be incorrect, or even if the predictor correctly predicts the ground state structure (its job), this might not be the same structure that is in your data set. If you end up adding features for the wrong structure, it’s not clear that this is going to help your machine learning.
Also note that structure predictions are unoptimized, so even in the case of a correct prediction that actually represents the material in your data set (best case scenario), things like bond lengths, lattice parameter values, etc. will not be exactly right. It is typically expected that one will feed the results of a structure predictor into a DFT run to refine these parameters. If you do end up featurizing based on a structure prediction, I would suggest preferring features that rely more on overall topology of the structure or is more volume-independent rather than features that depend heavily on specifics of bond lengths or atom distances.
This could still be interesting to try, but I think it will require a lot of validation to see if it works. So I’d suggest some testing of course. I’m a little afraid that this might make things better on average, while making the worst cases worse than they would have been if you didn’t do this.
Another idea, rather than picking a single structure from structure predictor, is to take a weighted average of features from the top 10 structures produced by structure predictor (weighted by probability). But this will add complexity and take more time to featurize.
As for where to implement, I agree that conversions.py is a good place.
On Sunday, January 6, 2019 at 7:39:18 AM UTC-8, Logan Ward wrote:
I think this feature would be a nice additional to matminer instead of/in addition to automatminer.
I haven’t used the structure_prediction model (specifically, I’m looking at Substitutor), so I can’t say much about how easy it would be to automate. But, the methods underlying it are very sound and it would fill a nice need for providing additional features to compounds that are crystalline but the structure is not provided in the dataset.
Do you want to take a crack at it? It would be good to add it in to the conversions.py module.
From: thomas heiman
Sent: Saturday, January 5, 2019 4:36 PM
Subject: pymatgen structure_prediction?
Is it possible to use the pymatgen structure_prediction within automatminer to generate structural variables (given just the formula) to attach to the data frame? Or is there a better approach? Or is it a bad idea?Just wondered… Thank you!
You received this message because you are subscribed to the Google Groups “matminer” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.