Automatminer stalls for benchmarking

Dear developers,
I updated to the latest version of automatminer last night and discovered that the frequent problem of TPOT crashing has been resolved in this version. However, I’m facing a new problem while trying to benchmark a data set using automatminer. I have almost 160 samples and I’m doing a 5 fold nested cross validation for a regression problem as described in the documentation. Also, I’m choosing the predefined config ‘heavy’ as it was giving high accuracy before the update. The first fold is fitted nicely, with a reasonable MSE value. However, when the code is trying to featurize for the 2nd fold, it complains of OpenBlas resource unavailability and stalls. The error output is given below.
##################################################################################
XRDPowderPattern: 1%|\u258f | 1/125 [00:00<00:26, 4.65it/s]OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 12 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 13 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 14 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 15 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 16 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 17 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 18 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 19 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 20 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 21 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 22 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 23 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 24 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 25 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 26 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 27 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 28 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 29 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 30 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 31 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 32 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 33 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 34 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 35 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 36 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 37 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 38 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 39 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max
OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 40: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 4096 current, 1540506 max

---------------------------and so on
###############################################################################

Note that I’m aware that running other process at the same time could cause this and therefore have been careful to not run any cpu intensive process concurrently. Also I’ve tried to play with the environment variable OPENBLAS_NUM_THREADS both from the python script and the shell. Sill the problem persists. Also, I’ve observed that it occurs at XRDpowder featurizer every time. After these errors are printed, I can see 2-6 cpu cores running at 100%, but leaving it running at this stalled state for even 10-12 hours does not show any progress.

Regards,
Arnab

Well I for one am surprised that the heavy preset actually gives good results for anything, but it is encouraging that it does for you. Have you tried simply specifying the number of processes? Usually 160 samples should be quite easy for even the heavy featurizers to do, provided the structures aren’t huge.

How do I specify the number of processes for the featurizers? Using OPENBLAS_NUM_THREADS or n_jobs I can tell TPOT to use a certain number of threads, and it works as expected. But it seems the featurizers are always trying to use all available cores of the machine, irrespective of the value of OPENBLAS_NUM_THREADS.

One more feedback. The digest() method seems to be inaccessible for matpipe in the new version. I got:

pipe.digest(‘pipe.digest’)
AttributeError: ‘MatPipe’ object has no attribute ‘digest’

Oh I see you’ve replaced digest with summarize and inspect! Nice!

Hey Arnab,

You can set the n_jobs in the AutoFeaturizer constructor (and same for the TPOTAdaptor) if you are using custom pipelines. You can also set it with the n_jobs powerup in MatPipe.from_preset (or get_preset_config, which is just a function from_preset calls). The preset powerups basically set certain options across the entire pipeline while keeping the rest of the preset the same. So you could do something like pipe = MatPipe.from_preset(“heavy”, n_jobs=10) and it should work.

I wouldn’t recommend using env variables to set the number of parallel processes for automatminer or matminer. Let me know if this works for you