Hi,
I am trying to get fireworks qlaunch running on a remote server and having issues . The remote server uses slurm to manage the jobs. I am working through the queue tutorial (found here Launch Rockets through a queue — FireWorks 1.9.7 documentation).
I have set up the directory as specified in the tutorial, but when try to run qlaunch singleshot I get the following error.
$ qlaunch singleshot
Found many potential paths for LAUNCHPAD_LOC: ['/home/dda/fw_things/fun/queue/my_launchpad.yaml', '/home/dda/fw_things/fireworks/my_launchpad.yaml']
Choosing as default: /home/dda/fw_things/fun/queue/my_launchpad.yaml
Traceback (most recent call last):
File "/home/dda/miniconda3/envs/fw37/bin/qlaunch", line 33, in <module>
sys.exit(load_entry_point('FireWorks', 'console_scripts', 'qlaunch')())
File "/home/dda/fw_things/fireworks/fireworks/scripts/qlaunch_run.py", line 224, in qlaunch
do_launch(args)
File "/home/dda/fw_things/fireworks/fireworks/scripts/qlaunch_run.py", line 62, in do_launch
queueadapter = load_object_from_file(args.queueadapter_file)
File "/home/dda/fw_things/fireworks/fireworks/utilities/fw_serializers.py", line 391, in load_object_from_file
f_format = filename.split('.')[-1]
AttributeError: 'NoneType' object has no attribute 'split'
I have previously had success running simple, single-core jobs on the remote server by connecting to mongodb atlas database. I have verified that my configurations are valid and I am able to add and launch fireworks using lpad and launch commands.
For example:
(fw37) dda at hpc → [~/fw_things/fun/queue]
$ lpad get_fws
Found many potential paths for LAUNCHPAD_LOC: ['/home/dda/fw_things/fun/queue/my_launchpad.yaml', '/home/dda/fw_things/fireworks/my_launchpad.yaml']
Choosing as default: /home/dda/fw_things/fun/queue/my_launchpad.yaml
{
"fw_id": 1,
"created_on": "2021-04-16T21:37:31.502195",
"updated_on": "2021-04-16T21:37:31.502442",
"state": "READY",
"name": "Unnamed FW"
}
I am currently running python 3.7 because 3.8 and 3.9 had some issues with fireworks.
$ python
Python 3.7.9 (default, Aug 31 2020, 12:42:55)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
I have also tried to run a simple fireworks command with slurm. That also produces no error and is able to connect to the remote database.
$ cat slurm_test.sh
#!/bin/bash
#SBATCH -p high,med,low
#SBATCH --job-name=XX
#SBATCH --output=job.out
#SBATCH --error=job.err
#SBATCH --time=00:01:00
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
lpad get_fws
(fw37) dda at hpc1 → [~/fw_things/fun/queue]
$ cat job.out
==========================================
SLURM_JOB_ID = 5273485
SLURM_NODELIST = c7-31
CUDA_VISIBLE_DEVICES =
==========================================
Found many potential paths for LAUNCHPAD_LOC: ['/home/dda/fw_things/fun/queue/my_launchpad.yaml', '/home/dda/fw_things/fireworks/my_launchpad.yaml']
Choosing as default: /home/dda/fw_things/fun/queue/my_launchpad.yaml
{
"fw_id": 1,
"created_on": "2021-04-16T21:24:53.824024",
"updated_on": "2021-04-16T21:24:53.824383",
"state": "READY",
"name": "Unnamed FW"
}
===========================================================================
Job Finished
Name : XX
User : dda
Partition : high
Nodes : c7-31
Cores : 1
State : COMPLETED
Submit : 2021-04-16T14:33:02
Start : 2021-04-16T14:33:02
End : 2021-04-16T14:33:09
Reserved walltime : 00:01:00
Used walltime : 00:00:07
Used CPU time : 00:00:01
% User (Computation): 73.21%
% System (I/O) : 26.72%
Mem reserved : 0/node
Max Mem used : 0.00 (c7-31)
Max Disk Write : 0.00 (c7-31)
Max Disk Read : 0.00 (c7-31)
I believe the issue is related to qlaunch not recognizing that I have a remote database, but I am not sure how to correct the error. Does anyone have any idea how to fix this issue?