Hello, all
I met an error when launching the workflow into queue, and I still have no idea how to slove it after several tries. I would appreciate it much if you can give me some suggestions. Here is the specific information:
- error message
`(atomate) [umjzhh@psn004 wf_test]$ slaunch
Successfully loaded your FW_config.yaml!
2018-03-11 22:16:25,403 INFO getting queue adapter
2018-03-11 22:16:25,406 INFO Found previous block, using /home/export/online1/umjzhh/calculations/block_2018-03-06-14-33-45-310798
2018-03-11 22:16:25,428 INFO 0 jobs in queue. Maximum allowed by user: 0
2018-03-11 22:16:25,461 INFO Launching a rocket!
2018-03-11 22:16:25,615 INFO reserved FW with fw_id: 4
2018-03-11 22:16:25,627 INFO Created new dir /home/export/online1/umjzhh/calculations/block_2018-03-06-14-33-45-310798/launcher_2018-03-11-14-16-25-615544
2018-03-11 22:16:25,631 INFO moving to launch_dir /home/export/online1/umjzhh/calculations/block_2018-03-06-14-33-45-310798/launcher_2018-03-11-14-16-25-615544
2018-03-11 22:16:25,648 INFO submitting queue script
2018-03-11 22:16:25,674 ERROR ----|vvv|----
2018-03-11 22:16:25,675 ERROR Error writing/submitting queue script!
2018-03-11 22:16:25,678 ERROR Traceback (most recent call last):
File “/home/export/online1/umjzhh/opt/anaconda3/lib/python3.6/site-packages/fireworks/queue/queue_launcher.py”, line 136, in launch_rocket_to_queue
raise RuntimeError('queue script could not be submitted, check queue ’
RuntimeError: queue script could not be submitted, check queue script/queue adapter/queue server status!2018-03-11 22:16:25,679 ERROR ----|^^^|----
2018-03-11 22:16:25,680 INFO Un-reserving FW with fw_id, launch_id: 4, 2
2018-03-11 22:16:25,721 ERROR ----|vvv|----
2018-03-11 22:16:25,722 ERROR Error with queue launcher rapid fire!
2018-03-11 22:16:25,723 ERROR Traceback (most recent call last):
File “/home/export/online1/umjzhh/opt/anaconda3/lib/python3.6/site-packages/fireworks/queue/queue_launcher.py”, line 236, in rapidfire
raise RuntimeError(“Launch unsuccessful!”)
RuntimeError: Launch unsuccessful!2018-03-11 22:16:25,723 ERROR ----|^^^|----`
- Queue management system: custom LSF
This system is developed by imitating standard LSF. Therefore part of commands are different.
-
submit job by script
bsub -f FW_submit.script \ -f is indispensable. It seems that in standard LSF there is no need to use -f -
batch=1 nodes=1 ntasks-per-node=16 queue=q_x86_expr job-name=Si-structure_optimiz output=Si-structure_optimiz.out command=rlaunch -c /home/export/online1/umjzhh/atomate/config singleshot --fw_id 4
the format of FW_submit.script
I changed the template of fireworks/user_objects/queue_adapters/LoadSharingFacility_template.txt to satisify this custom LSF. The finally FW_submit.script produced by atomate is like following. Besides, when I use “bsub -f FW_submit.script”, the firetask can be submitted but it soon would become FIZZLED. -
query command
It seems that the optional choices of “bjobs” are different from standard LSF. The usage of custom LSF “bjobs” is as following
(atomate) [umjzhh@psn004 launcher_2018-03-11-14-16-25-615544]$ bjobs -h Usage: bjobs [-w] [-l] [-J jobname] [-q queue] [-u user] [-a|-d|-e|-p|-r] [jobid]
elif self.q_type == "LoadSharingFacility": #use no header and the wide format so that there is one line per job, and display only running and pending jobs #status_cmd.extend(['-p','-r','-o', 'jobID user queue', '-noheader', '-u', username]) status_cmd.extend(['-u', username])
And I found that if don’t change fireworks/user_objects/queue_adapters/common_adapter.py as following, when I use “slaunch”, the program will get stuck and nothing will be print out. However, I don’t know weather my change is right or it would cause any other problem.
-
Other information
-
about “slaunch”
I add some codes in ~/.bash-mpenvrc. The codes is like following
function slaunch() { cd ${MP_LAUNCH_ROOT:=$HOME/online1/launcher} n_launcher=${1:-1} qlaunch -r rapidfire --nlaunches $n_launcher -b 10000 cd $OLDPWD }
-
platform: Linux version 2.6.32-431.29.2.lustre.el6.x86_64 (root@gio017) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-11) (GCC) ) #1 SMP Fri Jul 31 09:39:58 CST 2015
my python version: Python 3.6.4 :: Anaconda, Inc.
atomate: downloaded from github last week
Thank you a lot for your help!
Best Regards!