How to extract the chemical formulas of all OPTIMADE entries?

I’m surprised that I didn’t respond to this already. Sorry about that! I appreciate your comments. The idea is composition (i.e. chemical formula)-based materials discovery where the validation dataset (containing hopefully hundreds of thousands of potential formulations, even theoretical ones) gets ranked/sorted based on the criteria for the materials discovery campaign (e.g. mat_discover). In other words, what’s every composition that anyone has ever thought of/put into a database? Rather than trying to generate and rank compositions “from scratch”. From scratch would require consideration of chemistry rules, and could contain even more outlandish suggestions than some of the theoretical materials from the databases, and so starting off with the “sum of every composition that scientists have ever spent time on” seemed like an interesting way to go. If you know of any from scratch generative models for composition, I’d be interested to hear, but that’s a bit off-topic for this post.