Hi,
I am trying to use MongoDB on local PC (in WSL) while jobs will run on supercomputer. I could install MongoDB on WSL and configured all the necessary files for Fireworks on both local PC and supercomputer.
I open a ssh tunnel between PC and supercomputer by ssh -R 27017:127.0.0.1:27017 user@ipsupercomputer
When creating a workflow on local PC, I can access it without problem on supercomputer with lpad get_wflows
(base) [f-gimbert@atlas ATOMATE_TESTS]$ lpad get_wflows
{
“state”: “READY”,
“name”: “Co–1”,
“created_on”: “2021-08-20T01:00:06.479000”,
“states_list”: “W-W-W-REA”
}
The problem occurs when I am trying to run the job with qlaunch rapidfire (not sure if it is best command).
(base) [f-gimbert@atlas ATOMATE_TESTS]$ qlaunch rapidfire
2021-08-20 10:01:39,204 INFO getting queue adapter
2021-08-20 10:01:39,209 INFO Created new dir /home/f-gimbert/ATOMATE_TESTS/block_2021-08-20-01-01-39-204692
2021-08-20 10:01:39,219 INFO The number of jobs currently in the queue is: 0
2021-08-20 10:01:39,219 INFO 0 jobs in queue. Maximum allowed by user: 0
2021-08-20 10:01:40,946 INFO Launching a rocket!
2021-08-20 10:01:40,957 INFO Created new dir /home/f-gimbert/ATOMATE_TESTS/block_2021-08-20-01-01-39-204692/launcher_2021-08-20-01-01-40-952356
2021-08-20 10:01:40,957 INFO moving to launch_dir /home/f-gimbert/ATOMATE_TESTS/block_2021-08-20-01-01-39-204692/launcher_2021-08-20-01-01-40-952356
2021-08-20 10:01:40,958 INFO submitting queue script
2021-08-20 10:01:40,971 INFO Job submission was successful and job_id is 51108
2021-08-20 10:01:40,971 INFO Sleeping for 5 seconds…zzz…
2021-08-20 10:01:45,984 INFO Launching a rocket!
2021-08-20 10:01:45,995 INFO Created new dir /home/f-gimbert/ATOMATE_TESTS/block_2021-08-20-01-01-39-204692/launcher_2021-08-20-01-01-45-991402
2021-08-20 10:01:45,996 INFO moving to launch_dir /home/f-gimbert/ATOMATE_TESTS/block_2021-08-20-01-01-39-204692/launcher_2021-08-20-01-01-45-991402
2021-08-20 10:01:45,997 INFO submitting queue script
2021-08-20 10:01:46,009 INFO Job submission was successful and job_id is 51109
2021-08-20 10:01:46,010 INFO Sleeping for 5 seconds…zzz…
(I killed here the process)
And when I checked the error file for one launch
(base) [f-gimbert@atlas launcher_2021-08-20-01-01-40-952356]$ more FeXbo_4.e51108
Traceback (most recent call last):
File “/home/f-gimbert/miniconda3/bin/rlaunch”, line 8, in
sys.exit(rlaunch())
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/fireworks/scripts/rlaunch_run.py”, line 141, in rlaunch
timeout=args.timeout, local_redirect=args.local_redirect)
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/fireworks/core/rocket_launcher.py”, line 98, in rapidfire
while (skip_check or launchpad.run_exists(fworker)) and time_ok():
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/fireworks/core/launchpad.py”, line 781, in run_exists
return bool(self._get_a_fw_to_run(query=q, checkout=False))
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/fireworks/core/launchpad.py”, line 1074, in _get_a_fw_to_run
sort=sortby)
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/collection.py”, line 1328, in find_one
for result in cursor.limit(-1):
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/cursor.py”, line 1238, in next
if len(self.__data) or self._refresh():
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/cursor.py”, line 1130, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py”, line 1935, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py”, line 1883, in __start_session
server_session = self._get_server_session()
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py”, line 1921, in _get_server_session
return self._topology.get_server_session()
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/topology.py”, line 520, in get_server_session
session_timeout = self._check_session_support()
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/topology.py”, line 502, in _check_session_support
None)
File “/home/f-gimbert/miniconda3/lib/python3.7/site-packages/pymongo/topology.py”, line 220, in _select_servers_loop
(self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id
: 611efef6c203b3222b57a3ec, topology_type: Single, servers: [<ServerDescription (‘localhost’, 27017) server_type: Unknown, rtt: None, error=AutoReconnec
t(‘localhost:27017: [Errno 111] Connection refused’)>]>
It looks like the connection to MongoDB was not possible while I can read workflows with lpad get_wflows
(base) [f-gimbert@atlas ATOMATE_TESTS]$ lpad get_wflows
{
“state”: “READY”,
“name”: “Co–1”,
“created_on”: “2021-08-20T01:00:06.479000”,
“states_list”: “W-W-W-REA”
}
I tried with -c option for qlaunch but same result. I also used a python script to launch job, no success. I tried also with qlaunch singleshot but error is the same.
I also tried to modify the admin user on MongoDB but nothing changed.
I am lost, so any help is welcome !
Best regards
Florian
My db.json / my_launchpad.yaml files on local PC / supercomputer :
{“host”: “localhost”, “port”: 27017, “database”: “Fireworks”, “collection”: “tasks”, “admin_user”: admin, “admin_password”: password, “readonly_use
r”: user, “readonly_password”: password, “aliases”: {}}
my_launchpad.yaml
host: localhost
logdir: null
mongoclient_kwargs: {}
name: Fireworks
password: password
port: 27017
ssl_ca_file: null
strm_lvl: INFO
user_indices: []
username: admin
wf_user_indices: []
I am lost.