Why does enumerate_structures stop working?

Hi,
When I’m using enumerate_structures to enumerate new structures, it stops working for large size range. It works well for smaller size range, but it got stuck when the size range goes beyond 14. Why is this?

Here’s the code:

db = connect(‘vasp2db.db’)
slab = bulk(‘VN’, crystalstructure=‘rocksalt’, a=4.44)
conc_restr = {‘Ti’: (0.05, 0.96),‘V’: (0.05, 0.96),‘N’: (0.45, 0.96)}
data = {‘name’: [],‘conc_Ti’: [],‘conc_V’: [],‘conc_N’: [],‘predicted_energy’: []}

db_pred = connect(‘predstruc.db’)

cs = ClusterSpace(structure=slab,
cutoffs=[11.5, 5.5, 3.5],
chemical_symbols=[[‘Ti’, ‘V’],[‘N’,‘X’]])

sc = StructureContainer(cluster_space=cs)
for row in db.select():
sc.add_structure(structure=row.toatoms(),
properties={‘mixing_energy’: row.mixing_energy})

opt = CrossValidationEstimator(fit_data=sc.get_fit_data(key=‘mixing_energy’), fit_method=‘lasso’)
opt.validate()
opt.train()
print(opt)

ce = ClusterExpansion(cluster_space=cs, parameters=opt.parameters, metadata=opt.summary)
ce.write(‘mixing_energy.ce’)

i=299034

for structure in enumerate_structures(structure=slab, sizes=range(14, 16), chemical_symbols=[[‘Ti’,‘V’], [‘N’,‘C’]], concentration_restrictions=conc_restr):
i = i+1
name = int(i)
conc_1 = structure.get_chemical_symbols().count(‘Ti’) / (structure.get_chemical_symbols().count(‘Ti’) + structure.get_chemical_symbols().count(‘V’))
conc_2 = structure.get_chemical_symbols().count(‘N’) / (structure.get_chemical_symbols().count(‘N’) + structure.get_chemical_symbols().count(‘C’))
with open(‘sublat.csv’,‘a+’) as f:
f.write(‘{}\t{}\t{}\t\n’.format(name,conc_1,conc_2))
structure =structure[~(structure.symbols == ‘C’)]

db_pred.write(structure, name = int(i))

ideal_structure, info = map_structure_to_reference(structure, slab,inert_species=['Ti', 'V'])
concTi = structure.get_chemical_symbols().count('Ti') / len(structure)
concV = structure.get_chemical_symbols().count('V') / len(structure)
concN = structure.get_chemical_symbols().count('N') / len(structure)

data['name'].append(int(i))
data['conc_Ti'].append(concTi)
data['conc_V'].append(concV)
data['conc_N'].append(concN)
data['predicted_energy'].append(ce.predict(ideal_structure))
with open('predstrudata.csv','a+') as f:
    f.write('{}\t{}\t{}\t{}\t{}\t\n'.format(name,concTi,concV,concN,ce.predict(ideal_structure)))

The number of structures produced by enumeration grows exponentially with size, so I think it is expected that for larger size it will simply never finish.

See documentation for additional info.

Hi,

Thank you so much for the reply. It really helps!

After trying the method in the mannal, it starts to work. However, the structures generated didn’t follow the concentration regime I set, and it tends to generate many structures in the same concentration and composition. What’s the reason of this? Is there a way to overcome this?

Thank you so much.

Hi

I have anotherquestion:

If I trained my model with structures of size range (2,12), do you think it’s okay to use the model to predict structures whose size range is larger than (2,12), like up to (13,22) ?

it tends to generate many structures in the same concentration and composition. What’s the reason of this? Is there a way to overcome this?

Enumeration generates all possible supercells and occupations, this can lead to a majority of the generated structures being at a specific concentration. Depending on what you want to with these structure it may or may not be a good approach.

If I trained my model with structures of size range (2,12), do you think it’s okay to use the model to predict structures whose size range is larger than (2,12), like up to (13,22) ?

Yes that should usually be fine as long as training went well.

Thank you for the reply!

I have another question regarding how to set the concentration range. I found that the structures generated didn’t follow the concentration regime I set.

For example, the structures I want to enumerate have two sub-lattices,

chemical_symbols=[[‘Ti’,‘V’], [‘N’,‘C’]]

Should I set a specific concentration range for each of the element type, i.e.

conc_restr = {‘Ti’: (0.7, 0.81), ‘V’: (0.2, 0.3), ‘C’: (0.10, 0.20), ‘N’: (0.8, 0.91)}

Or should I only set one concentration range for each of the sub-lattice, like:

conc_restr = {‘Ti’: (0.7, 0.81),‘N’: (0.8, 0.91)}

I’ve tried both way, it didn’t work well sometimes, I’m not sure what’s the reason of it.

Thank you so much for your time and attention.

If you read the documentation it states

Concentration is here always defined as the number of atoms of the specified element divided by the total number of atoms in the structure, without respect to site restrictions.

So the concentrations in the restriction dict refers to the total system and not the sublattice. I think it should be enough to set the restriction for one of the species on each sublattice.