I am trying to get the phonon spectrum of a silicon structure, the rmse_train is very low, but the rmse_test is high?

what I should do in this situation?

This is a sign of overfitting.

It is probably because you have too many parameters and too little data.

The linear problem is of the form `A * x = f`

, where `f`

is the target dft forces, `x`

the unknown parameters and `A`

is the sensing matrix. The number of rows in `A`

(number of force components) should be larger than the number of columns in `A`

(the number of parameters) to avoid too much overfitting.

How much larger is hard to say and depends on desired accuracy etc, but a good starting point is about 5-10x larger.

The number of parameters depends on the cutoff you chose, in general one should always try to select cutoff carefully and try to do convergence testing with respect to the cutoff.

The numer of parameters can also depend on the symmetry tolerenace `symprec`

given to the `ClusterSpace`

so maybe double check that the `ClusterSpace`

finds the expected spacegroup for the primitive structure.

You can also try to solve the problem by adding more data, i.e. more training structures.

Dear Erik

Thank you very much for your reply

I doubled the number of structures and again the result was the same as before.

What is the proper way to determine the number of training structures?

Also, I chose the cut-off value equal to half of the lattice parameter.

What is the right way to determine the cutoff value?

I had another question too, what should be the unit of force for rattled structures?

best regards,

Mostafa,

Isfahan University of Technology

But how many force components and number of parameters do you have? The number of force components should ideally be much larger, as written above, than the number of parameters.

I think its good to use eV/Å. But I dont think units of the forces will change the overfitting problem.

Dear Eric

I reduced the CutOff and also changed the optimization to ‘ardr’. I also put ev/A as the unit of forces.

The precision raised to a desire value.

I think the choice of CutOff values is crucial.

Thank you very much