Can't decode datasets

Hi, I’m having teething troubles using MatMiner and I hope someone can tell me what the solution is. When I try to import an existing dataset I get an error because the data is not being decoded correctly. It is trying to use a “cp1252” decoder: is this right? If it should be using a different decoder, how can I can change it to the correct one? Any advice would be grateful appreciated.


from matminer.datasets.convenience_loaders import load_elastic_tensor

df = load_elastic_tensor()


UnicodeDecodeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18160/1984697021.py in
1 from matminer.datasets.convenience_loaders import load_elastic_tensor
2
----> 3 df = load_elastic_tensor()

~\anaconda3\lib\site-packages\matminer\datasets\convenience_loaders.py in load_elastic_tensor(version, include_metadata, data_home, download_if_missing)
26 Returns: (pd.DataFrame)
27 “”"
—> 28 df = load_dataset(“elastic_tensor” + “_” + version, data_home, download_if_missing)
29
30 if not include_metadata:

~\anaconda3\lib\site-packages\matminer\datasets\dataset_retrieval.py in load_dataset(name, data_home, download_if_missing)
41
42 if _dataset_dict is None:
—> 43 _dataset_dict = _load_dataset_dict()
44
45 if name not in _dataset_dict:

~\anaconda3\lib\site-packages\matminer\datasets\utils.py in _load_dataset_dict()
17 “”"
18 with open(os.path.join(os.path.dirname(os.path.abspath(file)), “dataset_metadata.json”)) as infile:
—> 19 dataset_dict = json.load(infile)
20
21 return dataset_dict

~\anaconda3\lib\json_init_.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
291 kwarg; otherwise JSONDecoder is used.
292 “”"
→ 293 return loads(fp.read(),
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,

~\anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
—> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 61517: character maps to

Just in case I am not the only person to run into this problem, I will say that I managed to solve it by defining the encoding to be utf-8 in line 18 of matminer\datasets\utils.py . The line now reads

with open(os.path.join(os.path.dirname(os.path.abspath(file)), “dataset_metadata.json”), encoding=‘utf-8’) as infile:

Hey @DaveL!

Thanks for posting the solution here as well. This has been added to the codebase and will be included in the next release:

See fix problem brought up in forum for decoding dataset metadata error by ardunn · Pull Request #674 · hackingmaterials/matminer · GitHub

Thanks
Alex