ICET tutorials

Hi there. I’m still curious about how to extract structures generated from ICET for the DFT calculations.
According to the tutorial, these mixing energies are extracted from the database, correct? So my question is (1) how do I get the ICET-generated structures, probably, as cif files so that I can run them in VASP, for example. (2) After the DFT calculations, how do I put the DFT-calculated energies back to the database (in this case, “reference_data.db”), so that I can proceed with the ECI fitting process, and so on.

Thank you in advance.

Hi @Wiwittawin,

The first step is to generate a set of inequivalent structures for your particular system. There are many ways to do that, and one way could be enumeration as described here. In that tutorial, each structure is saved to an ASE database, but you can also write them to file with ASE (to cif-format, or whatever is suitable for the DFT code you will use),

from ase.io import write
...
i = 0
for structure in enumerate_structures(...):
     i += 1
     write(f'structure-{i}.cif', structure)

Once you have done the DFT calculations, you should construct a StructureContainer. For that you need to have each structure in the form of an ASE Atoms object and its corresponding (mixing) energy. To create ASE Atoms object, you could, for example, read the previously generated .cif-files,

from ase.io import read
structure = read('structure-1.cif')

In other words, it is not necessary to work with databases as in the tutorial, but the crucial part is that you need to have the structure that you have calculated with DFT in the form of an ASE Atoms object before you can add it to a structure container. The exact way to do this is typically dependent on your DFT workflow.

Thank you @magnusrahm! It helps a lot. I can now export cif files for the DFT calculations. However, I still have no idea how to add structures and the corresponding mixing energies to StructureContainer. I tried manually adding things with the code as follows:

structure = read('structure-1.cif')
sc.add_structure(structure=structure, properties={'mixing_energy': 0.977})

It doesn’t work, obviously. Please excuse my ignorance. Your suggestions would be greatly appreciated.

In what sense does it not work? Could you please provide full script and any error messages?

Dear @magnusrahm,

I generated structures as you suggested, and then tried to construct a StructureContainer, with an arbitrary mixing energy . But it seems that I’ve missed something.



What do you think?

For some reason the volume of the loaded structure differs a little bit from the volume of the primitive cell. I’m not sure why but I guess some precision was lost when you wrote the cif file. Does the same thing happen if you write an xyz file instead (write(f'structure-{i}.xyz', structure))?

Btw, in your scripts above it seems like the sqlite database is actually not used. It is not necessary to deal with databases when working with icet, it is only used in the tutorials because it can make it a little easier to handle a large number of structures. If you still want to use databases I can strongly recommend ASE databases, because with them you don’t have to write custom SQL commands, that stuff is conveniently handled under the hood.

Dear @magnusrahm

The xyz file format works!

from ase.io import write 

primitive_structure = bulk('Ag')
connection = sqlite3.connect("AgPd-fcc.db")

i = 0
for structure in enumerate_structures(primitive_structure,
                                      range(1, 7),
                                      ['Pd', 'Ag']):
    i += 1
    write(f'structure-{i}.xyz', structure)

cs = ClusterSpace(structure=primitive_structure,
                  cutoffs=[13.5, 6.0, 5.5],  # Angstrom
                  chemical_symbols=['Ag', 'Pd'])

connection = sqlite3.connect("AgPd-fcc.db")
cursor = connection.cursor()
sc = StructureContainer(cluster_space=cs)
mixing_energy = [0, 0, -0.0398, -0.0286, -0.0485, -0.0178, -0.0557, -0.0297,
                -0.0477, -0.0173, -0.0356, -0.0331, -0.0139, -0.0459, -0.0478]

for i in range(len(mixing_energy)):
    structure = read('structure-{}.xyz'.format(i+1))
    sc.add_structure(structure=structure, user_tag=structure.get_chemical_formula(), 
                     properties={'mixing_energy': mixing_energy[i]})
print(sc)

opt = CrossValidationEstimator(fit_data=sc.get_fit_data(key='mixing_energy'),
                               fit_method='lasso')
opt.validate()
opt.train()

ce = ClusterExpansion(cluster_space=cs, parameters=opt.parameters, metadata=opt.summary)
ce.write('mixing_energy.ce')

ce = ClusterExpansion.read('mixing_energy.ce')
data = {'concentration': [], 'reference_energy': [], 'predicted_energy': []}

db = connect('AgPd-fcc.db')
for row in db.select():
    data['concentration'].append(row.concentration)
    # the factor of 1e3 serves to convert from eV/atom to meV/atom
    data['reference_energy'].append(1e3 * row.mixing_energy)
    data['predicted_energy'].append(1e3 * ce.predict(row.toatoms()))

fig, ax = plt.subplots(figsize=(5, 4))
ax.set_xlabel(r'Pd concentration')
ax.set_ylabel(r'Mixing energy (meV/atom)')
ax.set_xlim([0, 1])
ax.set_ylim([-69, 15])
ax.scatter(data['concentration'], data['reference_energy'],
           marker='o', label='reference')
ax.scatter(data['concentration'], data['predicted_energy'],
           marker='x', label='CE prediction')

plt.legend()
plt.show();

image

I’m really confused about the last part of the code. How do I get the reference_energy?

Reference energy refers to the DFT energies that you fitted your cluster expansion against and thus want to reproduce with your cluster expansion. In your case it would be the list mixing_energies. So to get this toy example working, I would replace the last loop with something like

for i in range(len(mixing_energy)):
    structure = read('structure-{}.xyz'.format(i+1))
    conc = structure.get_chemical_symbols().count('Pd') / len(structure)
    data['concentration'].append(conc)
    data['reference_energy'].append(1e3 * mixing_energy[i-1])
    data['predicted_energy'].append(1e3 * ce.predict(structure))

Dear @magnusrahm

It works! Thanks so much. You’re the saviour. One last question, though.

data['reference_energy'].append(1e3 * mixing_energy[i-1])

Should mixing_energy[i-1] be mixing_energy[i] instead?

No worries. You’re right, it should be just i.

Dear @magnusrahm

Thanks so much!

Anyway, I’ve been trying different other materials apart from just binary alloys, but the code gets me confused once again. So, I’m interested in the cubic CsPbI3, the structure of which is shown below.

What I want is to enumerate structures by substituting I with some other elements, let’s say Br.

i = 0
for structure in enumerate_structures(primitive_structure, range(5, 7), ['I', 'Br']):
    i += 1
    write(f'structure-{i}.xyz', structure)

It doesn’t seem to work. What do you think?

I think the error is probably related to the definition of the species you want to replace; you need to specify “inactive” species as well. My guess is that you should write something like

for structure in enumerate_structures(primitive_structure, range(5, 7), [['Cs'], ['Pb'], ['I', 'Br']]):

but I can only tell for sure if you provide the full script including the primitive structure (and preferably also error messages).

Dear @magnusrahm

I queried the structure by Materials Project as a cif file, and coded as suggested.

connection = sqlite3.connect("CsPbI.db")
primitive_structure = io.read('mp-1069538.cif')

i = 0
for structure in enumerate_structures(primitive_structure, range(5, 7), [['Cs'], ['Pb'], ['I', 'Br']]):
    i += 1
    write(f'structure-{i}.xyz', structure)

And I got this error message.

Oops, I should have written
for structure in enumerate_structures(primitive_structure, range(5, 7), [['Cs'], ['Pb'], ['I', 'Br'], ['I', 'Br'], ['I', 'Br']]):
i.e., for each atom in the primitive structure, you should specify the species allowed on that site.

Wow! That works! A ridiculous amount of structures are generated. Anyway, is it possible for the CE to predict energies of concentrations not generated by ICET? For example, can the algorithm predict energies of structures with concentrations 0.1, 0.55, etc?

The result from the tutorial seems to me that only the specified concentrations are predicted.
image

Great! Yes, icet will provide an energy regardless of concentration. It is, however, always best to train the model with concentrations that are similar to what you want to predict in the end. For example, if all of your training data has concentrations between 0 and 0.5, you shouldn’t use the model to predict energies at 0.8. Some interpolation is perfectly fine though; predicting energies at 0.1 or 0.55 in the example above would be alright.

That’s great! How do I implement that concept in the code though? According to the tutorial, the conc values are solely those of the generated structures. What if I want to get some interpolation of, let’s say, concentrations not included in conc (e.g. 0.1, 0.55, 0.9 etc), how am I supposed to do that?

ce = ClusterExpansion.read('mixing_energy.ce')
data = {'concentration': [], 'reference_energy': [], 'predicted_energy': []}
db = connect('AgPd-fcc.db')

for i in range(len(mixing_energy)):
    structure = read('structure-{}.xyz'.format(i+1))
    conc = structure.get_chemical_symbols().count('Pd') / len(structure)
    data['concentration'].append(conc)
    data['reference_energy'].append(1e3 * mixing_energy[i])
    data['predicted_energy'].append(1e3 * ce.predict(structure))

Then you need to create such structures one way or another. You could for example enumerate structures with a different size compared to those you used for training.