CalcDB class, when loading from file, does not load SSL configuration (Identified fix)

Hi friends,

This issue is about the CalcDB from_file method, but I have identified a fix. I am interested in using the built-in Firetasks which write to a given database based on a db.json file. I recently realized that if I wanted to connect to a MongoDB that involves an SSL file, the Firetask which opens a connection to the MongoDB by reading a db.json file cannot parse these options.

For reference, the snippet I’m thinking of is here: https://github.com/hackingmaterials/atomate/blob/master/atomate/utils/database.py in the from_db_file method.

I have reproduced the snippet below and bolded the key part, where only one kwarg is let through to the cls method:

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** kwargs = {}**
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

Since the only information which is allowed past from the db_file are the authsource, host, port, database, collection, username, and password, no information about the SSL can be passed.

The below fix solves this issue, by

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** **kwargs = dict(creds)
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]
if ‘ssl’ in creds:
kwargs[‘ssl’] = bool(creds[‘ssl’])

    # Remove keys which may cause collision when calling MongoClient
    for key in ['host','port','database','collection','admin_password','admin_user',
           'readonly_user','readonly_password','password','username']:
        if key in kwargs:
            del kwargs[key]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

This solves the issue because it allows ssl-related variables past as a kwarg. It also removes kwargs which, if present, would cause
problems when the cls method tries to open a MongoClient instance with keyword arguments that are redundant with e.g. the hostname and port number,
which are passed in as arguments already. I also copy the dictionary into kwargs in the beginning because otherwise we delete the ‘host’, ‘port’, etc keys
before we pass them to the cls method.
The above patch solved the problem on my end.
I know that for pymatgen I’d open an issue on Github and then make a pull request, but with this Google groups format, I don’t know if such an ‘unsolicited’ pull request
is welcome. Please feel free to take/use the above snippet if you see no issues with it; I can also open a pull request if that is more convenient for you-- let me know.
Thanks!

Update: I created a pull request here. I ended up implemented a ‘safer’ version of the above fix which only allows SSL commands through instead of other input, which can cause an error. I think this would be less likely to break people’s current implementations.

I hope this helps, and please let me know if you spot any issues that I overlooked.

https://github.com/hackingmaterials/atomate/pull/298

···

On Tuesday, June 18, 2019 at 3:04:55 PM UTC-7, Steven Torrisi wrote:

Hi friends,

This issue is about the CalcDB from_file method, but I have identified a fix. I am interested in using the built-in Firetasks which write to a given database based on a db.json file. I recently realized that if I wanted to connect to a MongoDB that involves an SSL file, the Firetask which opens a connection to the MongoDB by reading a db.json file cannot parse these options.

For reference, the snippet I’m thinking of is here: https://github.com/hackingmaterials/atomate/blob/master/atomate/utils/database.py in the from_db_file method.

I have reproduced the snippet below and bolded the key part, where only one kwarg is let through to the cls method:

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** kwargs = {}**
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

Since the only information which is allowed past from the db_file are the authsource, host, port, database, collection, username, and password, no information about the SSL can be passed.

The below fix solves this issue, by

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** **kwargs = dict(creds)
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]
if ‘ssl’ in creds:
kwargs[‘ssl’] = bool(creds[‘ssl’])

    # Remove keys which may cause collision when calling MongoClient
    for key in ['host','port','database','collection','admin_password','admin_user',
           'readonly_user','readonly_password','password','username']:
        if key in kwargs:
            del kwargs[key]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

This solves the issue because it allows ssl-related variables past as a kwarg. It also removes kwargs which, if present, would cause
problems when the cls method tries to open a MongoClient instance with keyword arguments that are redundant with e.g. the hostname and port number,
which are passed in as arguments already. I also copy the dictionary into kwargs in the beginning because otherwise we delete the ‘host’, ‘port’, etc keys
before we pass them to the cls method.
The above patch solved the problem on my end.
I know that for pymatgen I’d open an issue on Github and then make a pull request, but with this Google groups format, I don’t know if such an ‘unsolicited’ pull request
is welcome. Please feel free to take/use the above snippet if you see no issues with it; I can also open a pull request if that is more convenient for you-- let me know.
Thanks!

(this issue should be resolved, see discussion on the PR that Steven posted)

···

On Tuesday, June 18, 2019 at 5:22:10 PM UTC-7, Steven Torrisi wrote:

Update: I created a pull request here. I ended up implemented a ‘safer’ version of the above fix which only allows SSL commands through instead of other input, which can cause an error. I think this would be less likely to break people’s current implementations.

I hope this helps, and please let me know if you spot any issues that I overlooked.

https://github.com/hackingmaterials/atomate/pull/298

On Tuesday, June 18, 2019 at 3:04:55 PM UTC-7, Steven Torrisi wrote:

Hi friends,

This issue is about the CalcDB from_file method, but I have identified a fix. I am interested in using the built-in Firetasks which write to a given database based on a db.json file. I recently realized that if I wanted to connect to a MongoDB that involves an SSL file, the Firetask which opens a connection to the MongoDB by reading a db.json file cannot parse these options.

For reference, the snippet I’m thinking of is here: https://github.com/hackingmaterials/atomate/blob/master/atomate/utils/database.py in the from_db_file method.

I have reproduced the snippet below and bolded the key part, where only one kwarg is let through to the cls method:

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** kwargs = {}**
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

Since the only information which is allowed past from the db_file are the authsource, host, port, database, collection, username, and password, no information about the SSL can be passed.

The below fix solves this issue, by

def from_db_file(cls, db_file, admin=True):
“”"
Create MMDB from database file. File requires host, port, database,
collection, and optionally admin_user/readonly_user and
admin_password/readonly_password
Args:
db_file (str): path to the file containing the credentials
admin (bool): whether to use the admin user
Returns:
MMDb object
“”"
creds = loadfn(db_file)
… Omitted some stuff here …
** **kwargs = dict(creds)
if “authsource” in creds:
kwargs[“authsource”] = creds[“authsource”]
else:
kwargs[“authsource”] = creds[“database”]
if ‘ssl’ in creds:
kwargs[‘ssl’] = bool(creds[‘ssl’])

    # Remove keys which may cause collision when calling MongoClient
    for key in ['host','port','database','collection','admin_password','admin_user',
           'readonly_user','readonly_password','password','username']:
        if key in kwargs:
            del kwargs[key]

    return cls(creds["host"], int(creds["port"]), creds["database"], creds["collection"],
               user, password, **kwargs)

``

This solves the issue because it allows ssl-related variables past as a kwarg. It also removes kwargs which, if present, would cause
problems when the cls method tries to open a MongoClient instance with keyword arguments that are redundant with e.g. the hostname and port number,
which are passed in as arguments already. I also copy the dictionary into kwargs in the beginning because otherwise we delete the ‘host’, ‘port’, etc keys
before we pass them to the cls method.
The above patch solved the problem on my end.
I know that for pymatgen I’d open an issue on Github and then make a pull request, but with this Google groups format, I don’t know if such an ‘unsolicited’ pull request
is welcome. Please feel free to take/use the above snippet if you see no issues with it; I can also open a pull request if that is more convenient for you-- let me know.
Thanks!