How to generate Average Bond Length, Average Bond Angle features by using matmier

Hi,
I am trying to use matmier to create Average Bond Length, Average Bond Angle(mean, minimum and maximum) features.
However, the results I get all show as “NaN”.
I do not really understand how to generate the Average Bond Length, Average Bond Angle features.

Thanks!
CM

1 Like

Using this script:

from matminer.datasets import load_dataset
from matminer.utils.io import store_dataframe_as_json, load_dataframe_from_json
from matminer.featurizers.structure import SiteStatsFingerprint
from matminer.featurizers.site import AverageBondAngle, AverageBondLength
from pymatgen.analysis.local_env import CrystalNN

df = load_dataset("matbench_dielectric").loc[:50]

nnalgo = CrystalNN()
bl = AverageBondLength(nnalgo)

sspbl = SiteStatsFingerprint(site_featurizer=bl)
df2 = sspbl.featurize_dataframe(df, ignore_errors=True)
print(df2)

I obtained:

                                           structure         n  mean Average bond length  std_dev Average bond length
0   [[4.29304147 2.4785886  1.07248561] S, [4.2930...  1.752064                  3.174907                 9.028351e-02
1   [[3.95051434 4.51121437 0.28035002] K, [4.3099...  1.652859                  2.206091                 5.008021e-01
2   [[-1.78688104  4.79604117  1.53044621] Rb, [-1...  1.867858                  2.589553                 4.302209e-01
3   [[4.51438064 4.51438064 0.        ] Mn, [0.133...  2.676887                  2.029381                 6.349086e-02
4   [[-4.36731958  6.8886097   0.50929706] Li, [-2...  1.793232                  1.903985                 1.013608e-01
5   [[0.04903784 0.0347292  0.08458426] Ca, [3.740...  1.677494                       NaN                          NaN
6   [[5.60800905 1.10640796 1.26351442] Se, [3.762...  5.083372                  2.675575                 2.518279e-01
7   [[ 4.5571751   2.77317895 16.01717369] O, [0.5...  2.032340                  2.247036                 1.465551e-01
8   [[ 3.1542105   1.89817452 12.52003705] Se, [ 3...  3.369174                  2.547492                 2.573790e-05
9   [[-2.13978374 -5.56226053 -1.13086951] Li, [-0...  2.077187                  2.186696                 4.027415e-02
10  [[0.28676293 2.7738996  2.7888571 ] Cl, [2.787...  1.908463                  2.710232                 6.008843e-01
11  [[3.96689625 3.09523996 3.68041023] O, [0.4683...  1.890488                  2.567017                 3.410198e-01
12  [[4.04253107 2.77535164 3.48909396] O, [0.4548...  2.010793                  2.450686                 3.920373e-01
13  [[-2.77324648  7.42816341  4.20843416] Na, [2....  2.275844                  3.152296                 1.405590e-01
14  [[ 4.43793908  0.         -2.555934  ] F, [-1....  1.451558                  2.665543                 2.034963e-01
15  [[1.03676842 6.45567085 2.45393308] N, [-2.204...  2.069268                  1.747689                 4.288707e-03
16  [[2.29244061 1.42818648 6.66173868] C, [1.8975...  2.323719                  1.878402                 3.840968e-01
17  [[ 1.48235765  3.87922721 -0.72097143] Na, [4....  4.248906                  3.318527                 8.907826e-02
18  [[3.77373076 0.1451147  1.45923688] Na, [0.702...  3.574819                  3.184361                 8.749273e-02
19  [[4.33915507 4.34897429 3.26957646] P, [1.4463...  2.553705                  3.085813                 4.917441e-01
20  [[2.25154928 4.50309856 0.        ] Si, [2.251...  1.375500                  1.617741                 1.110223e-16
21  [[0.35811096 2.54611182 4.63788551] H, [3.2070...  1.556806                  1.699835                 2.680102e-01
22  [[ 5.2731016   3.52871086 10.82659415] Li, [0....  1.901418                  2.038458                 1.062024e-01
23  [[ 4.40230261  2.69887041 14.46003091] O, [0.5...  2.207225                  2.125141                 1.773272e-01
24  [[-1.99860238  2.7793773   2.61570739] Tl, [3....  3.710341                       NaN                          NaN
25  [[-2.16282069e-06  2.50126067e+00  2.22027183e...  2.162603                  3.040063                 3.755259e-04
26  [[4.74389956 1.47960112 3.6077309 ] Os, [2.161...  1.610746                  1.853980                 6.653040e-02
27  [[3.81035768 0.39859479 3.9070261 ] F, [-0.075...  1.413846                  2.096349                 4.597056e-01
28  [[0.         3.28319495 3.25003588] Ag, [3.283...  2.317365                  2.874799                 2.832431e-03
29  [[-1.12603636e-08  1.68513498e+00  5.23808207e...  2.416126                  2.014180                 7.513147e-02
30  [[ 1.92625151  1.11212181 11.82194194] S, [-2....  2.421976                  2.360791                 8.409329e-04
31  [[0.21065474 5.01053914 0.38209325] Cs, [4.709...  1.893252                  1.931403                 5.006779e-01
32  [[1.91506173 1.23473956 4.58373805] P, [ 5.553...  2.697724                  2.864067                 5.373651e-01
33  [[3.4300433  3.64173896 7.84491758] N, [0.7844...  1.889802                  2.361532                 3.886955e-01
34  [[2.29482834 1.95321986 3.60716797] O, [5.1089...  1.980909                  1.933209                 8.725635e-02
35  [[4.99432705 2.12312189 5.44023504] P, [3.9276...  1.934425                  2.501269                 6.477121e-01
36  [[4.57168809 2.63946529 4.19495967] O, [-4.571...  1.797154                  2.870058                 2.038919e-01
37  [[3.27700627 4.15128652 3.96454891] P, [ 3.964...  2.189977                  3.253514                 1.608879e-01
38  [[ 3.62862279  3.06033074 -3.27564327] Si, [ 0...  2.338099                  3.076876                 4.142150e-01
39  [[2.73874594 2.66555844 4.22366261] O, [0.8324...  3.046883                  2.050832                 3.157393e-02
40  [[ 1.98441243  0.57253711 -0.32858193] O, [ 4....  1.706842                  2.226095                 2.200541e-01
41  [[1.03826453 7.26408314 3.1129093 ] Li, [2.070...  2.464983                  1.981486                 4.375429e-02
42  [[ 1.98702629  0.         -0.58759112] F, [-0....  1.465017                  2.629111                 2.532921e-01
43  [[0.         5.51382723 2.27087553] O, [0.    ...  4.168144                  2.161742                 1.077488e-01
44  [[1.58784365 0.93523576 9.12839572] Ga, [ 4.64...  7.057081                  2.615594                 8.240588e-02
45  [[2.84060786 2.90899216 4.93954299] Ga, [5.686...  3.032020                       NaN                          NaN
46  [[2.8208602  2.92136374 4.88574971] Al, [5.640...  2.878486                  2.633613                 3.371368e-01
47  [[1.91830399 1.10753331 2.25418556] Be, [-2.25...  2.868607                  2.530164                 4.456154e-01
48  [[1.12598761 3.77264171 2.51370455] O, [ 2.638...  1.820936                  2.609573                 3.014734e-01
49  [[4.57051006 2.58903739 5.5936041 ] K, [ 1.523...  2.928569                  3.341792                 1.774818e-01
50  [[2.37720422 1.37247947 2.93630557] Mg, [-2.37...  3.186465                  3.090715                 1.860550e-01

Most of the entries have the average bond lengths computed without error. A few of them have problems with the nn algorithm. You can get around this by specifying more arguments to the nn algorithm you’re using.

For further help, please post your full code here along with your system specifications and the matminer version you’re using.

Hi Ardunn,

Thank you very much for your clear guidance. I will try as you suggest.

Best Regards
CM

1 Like

Hello, thank you both for this intersting topic.

I’ve a similar query. I want to calculate the average bond length for pure element.
First I wanted to try your example but it seems that the col_id for the featurize_dataframe() is missing.

In my df with Materialproject I try to use the following col_id :
“task_id”, “pretty_formula” and “composition” .
I always get NaN as results for the mean and the std.

what is the col_id that you used ?
thanks!

Hi @Theo_Langlois you should use whatever col_id contains the pymatgen structure objects you’re trying to featurize. So usually it is “structure”

Note that you can also set ignore_errors=False in order to see more easily what the errors are (i.e., why you’re getting NaNs!)

hello @ardunn ,
tysm for the reply.

Indeed, I was trying to featurize with a wrong column, so I needed to add the “structure” column just as you said, and now it is working perfectly.

Best regards,
Theo

1 Like