BagofBonds featurizer

Hello,

why does BagofBonds featurize require a Structure obj and not a Molecule obj

given that this method is done for molecules?

Thanks

FR

Hey Francesco,

As I understand it, canonical BoB is just a way of getting uniform length descriptors (ordered by bond) for a given dataset of molecules (generated from its Coulomb matrices, where the matrices may have different dimensions). Since there are extensions of the Coulomb Matrix intended for bulk phases (e.g., https://arxiv.org/pdf/1503.07406.pdf which we have implemented as SineCoulombMatrix), the BagofBonds class is simply a way to get uniform length descriptors ordered by bond for bulk phases while using CoulombMatrix or SineCoulombMatrix.

However, BoB is not the only way of getting uniform length descriptors from these nonuniform matrices. It would be nice to see an eigenvalue-based method, as we mentioned in (Structure featurizers should return equal-length vector of features · Issue #213 · hackingmaterials/matminer · GitHub) and (Provide Flat Outputs for All Structure Featurizers · Issue #300 · hackingmaterials/matminer · GitHub), as it is more standard practice, but not ordered by bond.

Thanks,

Alex

···

On Wednesday, February 6, 2019 at 8:17:27 AM UTC-8, [email protected] wrote:

Hello,

why does BagofBonds featurize require a Structure obj and not a Molecule obj

given that this method is done for molecules?

Thanks

FR

Yes, you’re right it’s possible to generalize it for bulk phases.

But, my point was that I would expect this featurizer working for Structure obj as well as Molecule obj.

Given that the lattice does not play any role, the code could be easily edited to work with both obj.

Am I right?

I’m asking this because I know people working on molecules that would be interested in applying this method.

···

Il giorno sabato 9 febbraio 2019 01:02:46 UTC+1, [email protected] ha scritto:

Hey Francesco,

As I understand it, canonical BoB is just a way of getting uniform length descriptors (ordered by bond) for a given dataset of molecules (generated from its Coulomb matrices, where the matrices may have different dimensions). Since there are extensions of the Coulomb Matrix intended for bulk phases (e.g., https://arxiv.org/pdf/1503.07406.pdf which we have implemented as SineCoulombMatrix), the BagofBonds class is simply a way to get uniform length descriptors ordered by bond for bulk phases while using CoulombMatrix or SineCoulombMatrix.

However, BoB is not the only way of getting uniform length descriptors from these nonuniform matrices. It would be nice to see an eigenvalue-based method, as we mentioned in (https://github.com/hackingmaterials/matminer/issues/213) and (https://github.com/hackingmaterials/matminer/issues/300), as it is more standard practice, but not ordered by bond.

Thanks,

Alex

On Wednesday, February 6, 2019 at 8:17:27 AM UTC-8, [email protected] wrote:

Hello,

why does BagofBonds featurize require a Structure obj and not a Molecule obj

given that this method is done for molecules?

Thanks

FR

Hey Francesco,

I actually have a PR now that removes the unnecessary restriction on Molecule vs. Structure. I haven’t tried it with a molecule yet, but I’m guessing it will probably work without this restriction. If not though, and you have a PR that will allow BoB Featurizer to generalize to molecules, we welcome it!

Not sure we are going to support molecule featurizers (as a separate featurizer type from crystal structures), but I see no problem with structure featurizers implicitly supporting molecules.

Let me know how it goes

Alex

Also on this PR, I just realized CoulombMatrix and SineCoulombMatrix were written to implicitly support Molecules as well. We actually added eigenvalue methods of flattening too. So I’d check those out too once it is merged

···

On Wednesday, February 6, 2019 at 8:17:27 AM UTC-8, [email protected] wrote:

Hello,

why does BagofBonds featurize require a Structure obj and not a Molecule obj

given that this method is done for molecules?

Thanks

FR