Why does enumerate_structures stop working?

Calvin_Cui · March 12, 2023, 2:53pm

Hi,
When I’m using enumerate_structures to enumerate new structures, it stops working for large size range. It works well for smaller size range, but it got stuck when the size range goes beyond 14. Why is this?

Here’s the code:

db = connect(‘vasp2db.db’)
slab = bulk(‘VN’, crystalstructure=‘rocksalt’, a=4.44)
conc_restr = {‘Ti’: (0.05, 0.96),‘V’: (0.05, 0.96),‘N’: (0.45, 0.96)}
data = {‘name’: ,‘conc_Ti’: ,‘conc_V’: ,‘conc_N’: ,‘predicted_energy’: }

db_pred = connect(‘predstruc.db’)

cs = ClusterSpace(structure=slab,
cutoffs=[11.5, 5.5, 3.5],
chemical_symbols=[[‘Ti’, ‘V’],[‘N’,‘X’]])

sc = StructureContainer(cluster_space=cs)
for row in db.select():
sc.add_structure(structure=row.toatoms(),
properties={‘mixing_energy’: row.mixing_energy})

opt = CrossValidationEstimator(fit_data=sc.get_fit_data(key=‘mixing_energy’), fit_method=‘lasso’)
opt.validate()
opt.train()
print(opt)

ce = ClusterExpansion(cluster_space=cs, parameters=opt.parameters, metadata=opt.summary)
ce.write(‘mixing_energy.ce’)

i=299034

for structure in enumerate_structures(structure=slab, sizes=range(14, 16), chemical_symbols=[[‘Ti’,‘V’], [‘N’,‘C’]], concentration_restrictions=conc_restr):
i = i+1
name = int(i)
conc_1 = structure.get_chemical_symbols().count(‘Ti’) / (structure.get_chemical_symbols().count(‘Ti’) + structure.get_chemical_symbols().count(‘V’))
conc_2 = structure.get_chemical_symbols().count(‘N’) / (structure.get_chemical_symbols().count(‘N’) + structure.get_chemical_symbols().count(‘C’))
with open(‘sublat.csv’,‘a+’) as f:
f.write(‘{}\t{}\t{}\t\n’.format(name,conc_1,conc_2))
structure =structure[~(structure.symbols == ‘C’)]
db_pred.write(structure, name = int(i))

ideal_structure, info = map_structure_to_reference(structure, slab,inert_species=['Ti', 'V'])
concTi = structure.get_chemical_symbols().count('Ti') / len(structure)
concV = structure.get_chemical_symbols().count('V') / len(structure)
concN = structure.get_chemical_symbols().count('N') / len(structure)

data['name'].append(int(i))
data['conc_Ti'].append(concTi)
data['conc_V'].append(concV)
data['conc_N'].append(concN)
data['predicted_energy'].append(ce.predict(ideal_structure))
with open('predstrudata.csv','a+') as f:
    f.write('{}\t{}\t{}\t{}\t{}\t\n'.format(name,concTi,concV,concN,ce.predict(ideal_structure)))

erikfransson · March 13, 2023, 1:37pm

The number of structures produced by enumeration grows exponentially with size, so I think it is expected that for larger size it will simply never finish.

See documentation for additional info.

Calvin_Cui · March 13, 2023, 4:26pm

Hi,

Thank you so much for the reply. It really helps!

After trying the method in the mannal, it starts to work. However, the structures generated didn’t follow the concentration regime I set, and it tends to generate many structures in the same concentration and composition. What’s the reason of this? Is there a way to overcome this?

Thank you so much.

Calvin_Cui · March 13, 2023, 9:25pm

Hi

I have anotherquestion:

If I trained my model with structures of size range (2,12), do you think it’s okay to use the model to predict structures whose size range is larger than (2,12), like up to (13,22) ?

erikfransson · March 14, 2023, 7:43am

it tends to generate many structures in the same concentration and composition. What’s the reason of this? Is there a way to overcome this?

Enumeration generates all possible supercells and occupations, this can lead to a majority of the generated structures being at a specific concentration. Depending on what you want to with these structure it may or may not be a good approach.

If I trained my model with structures of size range (2,12), do you think it’s okay to use the model to predict structures whose size range is larger than (2,12), like up to (13,22) ?

Yes that should usually be fine as long as training went well.

Calvin_Cui · March 14, 2023, 1:54pm

Thank you for the reply!

I have another question regarding how to set the concentration range. I found that the structures generated didn’t follow the concentration regime I set.

For example, the structures I want to enumerate have two sub-lattices,

chemical_symbols=[[‘Ti’,‘V’], [‘N’,‘C’]]

Should I set a specific concentration range for each of the element type, i.e.

conc_restr = {‘Ti’: (0.7, 0.81), ‘V’: (0.2, 0.3), ‘C’: (0.10, 0.20), ‘N’: (0.8, 0.91)}

Or should I only set one concentration range for each of the sub-lattice, like:

conc_restr = {‘Ti’: (0.7, 0.81),‘N’: (0.8, 0.91)}

I’ve tried both way, it didn’t work well sometimes, I’m not sure what’s the reason of it.

Thank you so much for your time and attention.

erikfransson · March 15, 2023, 5:57pm

If you read the documentation it states

Concentration is here always defined as the number of atoms of the specified element divided by the total number of atoms in the structure, without respect to site restrictions.

So the concentrations in the restriction dict refers to the total system and not the sublattice. I think it should be enough to set the restriction for one of the species on each sublattice.