ElementFraction does not work on a pandas dataframe

Hi, I cannot use ElementFraction on a pandas dataframe, but it works when it is used on just one variable. ElementFraction does not work even though I use the code provided by the materials project workshop (Machine Learning with MatMiner - The Materials Project Workshop 1).
When I run the code provided in the workshop, I get into an infinite loop which produces the text:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

This is the code from the workshop that I try to run:

from pymatgen.core import Composition
from matminer.featurizers.composition.element import ElementFraction
from matminer.datasets.dataset_retrieval import load_dataset
ef = ElementFraction()
df = load_dataset(“brgoch_superhard_training”)
df.head()
df = ef.featurize_dataframe(df, “composition”)
print(df.head())

Hello Amalie,

It does look like you are running into problems with Python’s multiprocessing module. Matminer uses multiprocessing by default when featurizing dataframes, and multiprocessing has some OS-specific issues which cause the worker threads to fail to launch and the program to stall.

Have you already tried running your code inside a if __name__ == '__main__' block?

Logan

1 Like