Dear all,
After having given a dozen of tutorials on FireWorks I have found that one thing that makes the quick start slow is the installation and setup of MongoDB server. The official quick-start guide says it takes five minutes - I agree, after setting up mongodb database and configure fireworks (launchpad file).
What do you think about using an embedded database, a surrogate for MongoDB, for quick-start guides and tutorials? IMHO such a configuration option would be sufficient for the most tutorials (without queuing system and clusters). Examples of such embedded databases: tynymongo, mongitaDB, montyDB.
Best regards,
Ivan
Hi Ivan,
This would be great! When we first started FireWorks, it was indeed five minutes as MongoDB was much easier to get started with. Nowadays, it is much more complicated after MongoDB Atlas which tries to enforce scalable behavior from the get-go, but increases the activation energy for getting started.
I had always anticipated that someone could make a sqllite equivalent to MongoDB that could be used for offline work or for test purposes for FireWorks. However, after looking until about the year 2018 there didn’t seem to be much there.
It’s great to here there may now be offline alternatives to MongoDB. I think this is a great idea, but regrettably don’t have the bandwidth to work on it. However, if you’d like to take a crack at it and there is support you need from my group (including rapid review of PRs, etc.) just let me know as I’d very much like to see this happen as well.
Hi Anubhav,
thanks a lot for your reply! This is great that you support this idea and prioritize the PR review! Currently, we are evaluating several embedded databases for possible use with FireWorks. As a drop of bitterness, we do not expect that any embedded database package implements all the variety of pymongo functionality needed in FireWorks. We might end up in a “FireWorks light” for this use case or alternatively commit us to extending the most suitable embedded database package. But let us see.
We are working on it. I will post here later on about the progress.
Best regards,
Ivan
Hi Ivan,
Great to hear you are working on it. Should you need any support from our side please do not hesitate to reach out, as mentioned we really want to see this happen and can commit some resources to push things along as needed.
Hi Anubhav,
thanks a colleague in my group, who has carried out a thorough evaluation, we have selected mongomock as best candidate and not tynymongo, mongita, and montydb. Nevertheless, mongomock needed to be extended with a persistent storage (it only had volatile storage, destroyed after the process using it exits). I have made all necessary changes and have created this pull request.
Our interactive tests work very well, both with the Python API in Jupyter and the CLI (the lpad, rlaunch, etc. tools). From all CI tests, 12 failed: two script tests and some gridfs/filepad tests failed. Theoretically all tests should pass. Therefore, I would need some help in taking a decision about
- how to manage the mongomock extension:
- wait for mongomock to merge and release my changes, or
- create a custom package based on mongomock, something like persistent-mongomock
- how to proceed with the failing tests:
- leave them like this and do no testing of this feature
- fix the tests (need help) - then we have to think how to configure a separate CI job with mongomock
Best regards,
Ivan
Hi Ivan,
Thanks for your effort on this!
Regarding the mongomock extension, I think Plan A should be to have the changes officially integrated into mongomock and released. This will prevent divergence of the codebases which creates additional complications. Of course, if there is the feeling that the changes will not be merged and released on a timely basis, we can explore other plans.
Regarding the FireWorks tests, could you clarify a few things?
- Is GridFS even supported at all for mongomock? If not, then I guess these tests for GridFS and FilePad will not really pass. Before knowing what to do I guess it is important to clarify if GridFS functionality is supported by mongomock.
- What is the issue with the script task tests? I assume this would one should have been straightforward.
Dear Anubhav,
gridfs is supported by mongomock. I have also added the necessary imports and settings. But unfortunately for the time I could not go into the shallows of this part of fireworks (I mean gridfs and filepad). The script task tests assume some state of the database that cannot be reproduced using mongomock. I will have to understand the tests first in order to make them work. Independently, here we have tested ScriptTask and many other tasks and many other features - all work with mongomock.
Anyway: if we want to thoroughly test fireworks with mongomock we will have to create a separate CI job - currently there are two jobs: pytest and pytest_pymongo4. IMO it is not possible to use mongomock and mogodb in the same CI environment - if I enable mongomock I have to disbale testing with mongodb. The default setting is not to use mongomock, therefore the pipelines have passed (see PR status in github). If we want to create another CI job, I can try to do this and turn off the few failing tests per pytest marker. Do we want this? Then we can incrementally fix the thus disabled 12 failing tests in another pull request. Otherwise, IMO we do not need to fix the failing tests.
Regarding mongomock: no response there by now. As last resort, I tend to create a package based on mongomock adding the persistence functionality.
Best regards,
Ivan
Hi Ivan,
Ok - too bad regarding the lack of response from mongomock. That would really have been the best solution. I guess having your own fork is the next best option so let’s go with that.
Regarding the CI, I think it is best if we indeed set up another CI for mongomock. Otherwise, people may unknowingly push changes that break the mongomock functionality, so I think it’s worth the up-front investment to set up CI.
Is there anything you need from me to set up the CI? If so, just let me know. If you need support from my group I can arrange that as well.
Hi Anubhav,
thank you very much for supporting! Then I will do these things: 1) set up a custom mongomock package 2) set up a CI job with name pytest_mongomock
next to pytest
and pytest_pymongo4
. In case of difficulties, I will ask you for help. Otherwise, I will be back here as soon as the branch is ready for merge.
Best regards,
Ivan
Ok! If you need anything in the meantime let me know.
Hi Anubhav,
sorry for the delays. Now everything is completed as we have planned:
-
A new mongomock package with the feature to store the database in a local file: mongomock-persistence · PyPI
-
A CI job for mongomock called pytest_mongomock (I renamed the existing job to pytest_mongodb to make the difference between them more clear). I have written two comments regarding the tests.
Thus I will appreciate very much if you review the PR under Optionally use mongomock instead of pymongo/mongodb by ikondov · Pull Request #520 · materialsproject/fireworks · GitHub. Thanks!
Best regards,
Ivan