qlaunch rapidfire problem

Dear AJ,

I am trying to launch a new HT applications on Edison Nersc. While I was testing my workflow, the workflow can inserted successfully into the FireWorks Database (see attached picture), but when I was trying the qlaunch rapidfire the workflow, it ran into the following problems:

Do you have some idea how to solve this problem, BTW, I updated the FW codes to its latest version.

Thank you!

2016-03-21 16:02:50,224 ERROR ----|vvv|----

2016-03-21 16:02:50,224 ERROR Error trying to get the number of jobs in the queue

The error response reads: Traceback (most recent call last):

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_adapter.py”, line 58, in target

self.process = subprocess.Popen(self.command, **kwargs)

File “/usr/common/usg/python/2.7.3/lib/python2.7/subprocess.py”, line 679, in init

errread, errwrite)

File “/usr/common/usg/python/2.7.3/lib/python2.7/subprocess.py”, line 1249, in _execute_child

raise child_exception

OSError: [Errno 2] No such file or directory

2016-03-21 16:02:50,254 ERROR ----|^^^|----

^C2016-03-21 16:03:03,659 ERROR ----|vvv|----

2016-03-21 16:03:03,680 ERROR Error with queue launcher rapid fire!

2016-03-21 16:03:03,681 ERROR Traceback (most recent call last):

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_launcher.py”, line 183, in rapidfire

jobs_in_queue = _get_number_of_jobs_in_queue(qadapter, njobs_queue, l_logger)

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_launcher.py”, line 248, in _get_number_of_jobs_in_queue

time.sleep(RETRY_INTERVAL)

KeyboardInterrupt

2016-03-21 16:03:03,696 ERROR ----|^^^|----

It’s difficult for me to say offhand. I think NERSC upgraded all their systems to SLURM instead of PBS. Did you switch your my_qadapter.yaml for SLURM? Otherwise it might be issuing PBS commands.

Best,

Anubhav

···

On Mon, Mar 21, 2016 at 4:12 PM, Shaun Ziqin Rong [email protected] wrote:

Dear AJ,

I am trying to launch a new HT applications on Edison Nersc. While I was testing my workflow, the workflow can inserted successfully into the FireWorks Database (see attached picture), but when I was trying the qlaunch rapidfire the workflow, it ran into the following problems:

Do you have some idea how to solve this problem, BTW, I updated the FW codes to its latest version.

Thank you!

2016-03-21 16:02:50,224 ERROR ----|vvv|----

2016-03-21 16:02:50,224 ERROR Error trying to get the number of jobs in the queue

The error response reads: Traceback (most recent call last):

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_adapter.py”, line 58, in target

self.process = subprocess.Popen(self.command, **kwargs)

File “/usr/common/usg/python/2.7.3/lib/python2.7/subprocess.py”, line 679, in init

errread, errwrite)

File “/usr/common/usg/python/2.7.3/lib/python2.7/subprocess.py”, line 1249, in _execute_child

raise child_exception

OSError: [Errno 2] No such file or directory

2016-03-21 16:02:50,254 ERROR ----|^^^|----

^C2016-03-21 16:03:03,659 ERROR ----|vvv|----

2016-03-21 16:03:03,680 ERROR Error with queue launcher rapid fire!

2016-03-21 16:03:03,681 ERROR Traceback (most recent call last):

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_launcher.py”, line 183, in rapidfire

jobs_in_queue = _get_number_of_jobs_in_queue(qadapter, njobs_queue, l_logger)

File “/global/u1/r/rongzq/sr_neb/codes/fireworks/fireworks/queue/queue_launcher.py”, line 248, in _get_number_of_jobs_in_queue

time.sleep(RETRY_INTERVAL)

KeyboardInterrupt

2016-03-21 16:03:03,696 ERROR ----|^^^|----

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

Visit this group at https://groups.google.com/group/fireworkflows.

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/c09f0917-fc2b-42dd-b6be-3898533fdbe5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Yes. It is the SLURM error. So I fixed it and several errors message afterwards, and I am able to submit jobs through qlaunch now. Thank you

Ziqin

A quick follow up question. What is the key work in Firework _queueadapter for Node numbers in SLURM? I tried nnodes from the documentation but it is not working.

fw = Firework([task1, task2, task3], spec={
                                           "_queueadapter": {'nnodes': 2, 'walltime': "15:00:00",
                                                             'queue': 'regular'}})

Also, the submission script is below, is it necessary to change

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC into #SBATCH -v DB_LOC,FW_CONFIG_FILE,VENV_LOC as well?

Thank you

#!/bin/bash -l

#SBATCH --time=15:00:00

#SBATCH --partition=regular

#SBATCH --account=jcesr

#SBATCH --job-name=Unnamed_FW

#SBATCH --output=Unnamed_FW-%j.out

#SBATCH --error=Unnamed_FW-%j.error

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC

module load python/2.7.3

module load vasp/5.3.5

source $VENV_LOC

cd /global/project/projectdirs/jcesr/rongzq/HT_ApproxNEB/test/block_2016-03-22-08-56-57-399842/launcher_2016-03-22-08-56-57-732170

rlaunch -c /global/u1/r/rongzq/sr_neb/config/config_Edison singleshot --fw_id 1

CommonAdapter (SLURM) completed writing Template

···

On Monday, March 21, 2016 at 6:16:42 PM UTC-7, Shaun Ziqin Rong wrote:

Yes. It is the SLURM error. So I fixed it and several errors message afterwards, and I am able to submit jobs through qlaunch now. Thank you

Ziqin

All parameters for SLURM (and other queue adapters) are listed clearly in their templates, e.g.:

fireworks/fireworks/user_objects/queue_adapters/SLURM_template.txt

For #PBS -v, you might want to contact MP team. That command certainly be pointless for SLURM. Whether you need to replace with an alternative depends on the workflows MP is maintaining. Note that MPEnv documents many of the changes for SLURM. Please note that I am no longer responsible for any of those codebases. This list and the support I provide is about FireWorks only.

···

On Tue, Mar 22, 2016 at 2:00 AM, Shaun Ziqin Rong [email protected] wrote:

A quick follow up question. What is the key work in Firework _queueadapter for Node numbers in SLURM? I tried nnodes from the documentation but it is not working.

fw = Firework([task1, task2, task3], spec={
                                           "_queueadapter": {'nnodes': 2, 'walltime': "15:00:00",
                                                             'queue': 'regular'}})

Also, the submission script is below, is it necessary to change

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC into #SBATCH -v DB_LOC,FW_CONFIG_FILE,VENV_LOC as well?

Thank you

#!/bin/bash -l

#SBATCH --time=15:00:00

#SBATCH --partition=regular

#SBATCH --account=jcesr

#SBATCH --job-name=Unnamed_FW

#SBATCH --output=Unnamed_FW-%j.out

#SBATCH --error=Unnamed_FW-%j.error

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC

module load python/2.7.3

module load vasp/5.3.5

source $VENV_LOC

cd /global/project/projectdirs/jcesr/rongzq/HT_ApproxNEB/test/block_2016-03-22-08-56-57-399842/launcher_2016-03-22-08-56-57-732170

rlaunch -c /global/u1/r/rongzq/sr_neb/config/config_Edison singleshot --fw_id 1

CommonAdapter (SLURM) completed writing Template

On Monday, March 21, 2016 at 6:16:42 PM UTC-7, Shaun Ziqin Rong wrote:

Yes. It is the SLURM error. So I fixed it and several errors message afterwards, and I am able to submit jobs through qlaunch now. Thank you

Ziqin

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

Visit this group at https://groups.google.com/group/fireworkflows.

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/9802f379-7afa-48f1-a92f-5809ffb86807%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Understood AJ. Thank you very much!

Rong

···

On Tuesday, March 22, 2016 at 9:57:50 AM UTC-7, ajain wrote:

All parameters for SLURM (and other queue adapters) are listed clearly in their templates, e.g.:

fireworks/fireworks/user_objects/queue_adapters/SLURM_template.txt

For #PBS -v, you might want to contact MP team. That command certainly be pointless for SLURM. Whether you need to replace with an alternative depends on the workflows MP is maintaining. Note that MPEnv documents many of the changes for SLURM. Please note that I am no longer responsible for any of those codebases. This list and the support I provide is about FireWorks only.

On Tue, Mar 22, 2016 at 2:00 AM, Shaun Ziqin Rong [email protected] wrote:

A quick follow up question. What is the key work in Firework _queueadapter for Node numbers in SLURM? I tried nnodes from the documentation but it is not working.

fw = Firework([task1, task2, task3], spec={
                                           "_queueadapter": {'nnodes': 2, 'walltime': "15:00:00",
                                                             'queue': 'regular'}})

Also, the submission script is below, is it necessary to change

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC into #SBATCH -v DB_LOC,FW_CONFIG_FILE,VENV_LOC as well?

Thank you

#!/bin/bash -l

#SBATCH --time=15:00:00

#SBATCH --partition=regular

#SBATCH --account=jcesr

#SBATCH --job-name=Unnamed_FW

#SBATCH --output=Unnamed_FW-%j.out

#SBATCH --error=Unnamed_FW-%j.error

#PBS -v DB_LOC,FW_CONFIG_FILE,VENV_LOC

module load python/2.7.3

module load vasp/5.3.5

source $VENV_LOC

cd /global/project/projectdirs/jcesr/rongzq/HT_ApproxNEB/test/block_2016-03-22-08-56-57-399842/launcher_2016-03-22-08-56-57-732170

rlaunch -c /global/u1/r/rongzq/sr_neb/config/config_Edison singleshot --fw_id 1

CommonAdapter (SLURM) completed writing Template

On Monday, March 21, 2016 at 6:16:42 PM UTC-7, Shaun Ziqin Rong wrote:

Yes. It is the SLURM error. So I fixed it and several errors message afterwards, and I am able to submit jobs through qlaunch now. Thank you

Ziqin

You received this message because you are subscribed to the Google Groups “fireworkflows” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

Visit this group at https://groups.google.com/group/fireworkflows.

To view this discussion on the web visit https://groups.google.com/d/msgid/fireworkflows/9802f379-7afa-48f1-a92f-5809ffb86807%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.