Hello,
I’m trying to train a cluster expansion with the species as Li, Mn, Ti, O and F with the primitive cell being that of rocksalt (primitive_structure = bulk(‘NaCl’,‘rocksalt’,a=3.0)).
All the structures are generated using icet.tools.enumerator.
I’m using cutoffs = [7.1,4,4].
My cluster space has 1032 parameters.
If I train my cluster expansion with less than 1032 structures, I get a underdetermined warning.
However, when I train my cluster expansion with more than 1032 structures, I get a condition number is large warning.
I read in the documentation that the cluster expansion is unreliable if the condition number is large.
To mitigate this, I scanned the pair cutoffs from 4 to 10 angstrom, and all the cases give the condition number is large warning.
My rmse_validation is low (around 0.09 eV/atom), and if I test it on new structures, the test rmse is lower than rmse_validation.
In this case, can the cluster expansion be trusted?
Thank you.
Do you know why the condition number is large?
Do you have some restrictions in concentrations or correlated concentrations in your training set?
If a pairwise cutoff of 3.0 or 4.0 (without any higher order clusters) yields large condition number then maybe training structures are not great to span the configurational space?
If test and validation errors are small, possibly its fine, hard to know without understanding the reason for why you get large condition number.
Thank you for the reply.
I’m not sure why the condition number is large.
I do have some restrictions on concentrations:
- Considering only charge-neutral concentrations (based on a dictionary of possible oxidation states for each element).
- Considering concentrations with Li to other cation ratio > 1.
Hi,
Okay possibly or likely the charge compensation leads to this, a good indication of this would be if with all your 1000+ structures with cutoffs 0.1 Å still u get large condition number.
If this is the reason then I wouldnt worry about it , and likely CE can be trusted if test and validation errors are small.
Hello,
I receive a large condition number warning whenever the number of structures exceeds the number of cluster expansion parameters, regardless of how small the cutoff is (I observed this when I performed cutoff tuning).
The charge compensation explanation seems reasonable.
Thank you.