I am new to matminer, and i am slowly exploring its features.
However, I realised that it doesnt seem to work with Spyder IDE, which is what i mainly use for my work. After running any featurizer commands, there will be no output, and I suspect it is something to do with the display of the progress bar, which somehow doesnt work for Spyder.
Right now, I am working with jupyter notebook, but I wish to be able to use Spyder and would like to know if it is possible.
I’ve used Spyder on Linux a little bit but have not had this problem. From a quick google search it seems problems with the progress bar (we use the fairly standard tqdm package) are fairly common:
To see whether this is a problem with tqdm or matminer, could you try disabling the progress bar option in matminer. You can do this by passing pbar=False when you featurize_dataframe(...) or featurize_many(...).
You can also test whether this is a tqdm problem with Spyder is to run a quick test:
thanks for the suggestion, I’ve tried to test tqdm in Spyder and it works as intended.
I also tried to pass pbar=False in the argument for the featurizers but its still not working. There is no output and the code just runs forever in Spyder.
Just to check I also ran the " Matminer introduction - Predicting bulk modulus" commands in Spyder which resulted in the same issue; though they run fine in jupyter notebook.
I am running Python 3.7.4, matminer 0.6.2, pymatgen 2020.1.28, tqdm 4.42.1, jupyter-core 4.6.1, spyder 4.0.1.
It should be noted that i first installed pymatgen from conda, before installing matminer from pip, because the conda installer for matminer doesnt work for me.
I figured out the problem, it seems to not work in spyder version 4.0 and later.
After downgrading to 3.3.6, it seems to work, however, the progress bar doesnt show, though the code runs. Though I think it isnt a big issue as long as the featurizer runs.
So the “feature importance” in matminer (as per the notebook) is quite separate from the actual matminer code, if I remember correctly. It’s all done via sklearn once the dataframe is already populated with descriptors.
You could extract the feature importances from an automatminer pipeline but the procedure would depend on the best model which was found (this cannot be known with certainty ahead of time). If the pipeline’s best model is a random forest (like the matminer notebook), then essentially repeat the same procedure as in the notebook to extract the importances. However, there are many other possible models in the automatminer model space which don’t have analogs for “feature importance”. For example, classifying the “feature importance” of a kNN or linear regression is not the same thing as for a forest ensemble model.
If you’re interested in a further discussion, open up another forum topic specific to this and we can go over some more details