ICSD cif does not work for pymatgen CifParser


I tried to use pymatgen CifParser to parse cif downloaded from ICSD, and output the bibtex from cifparser. But it seems like all the ICSD cifs do not work.

from pymatgen.io.cif import CifParser

icsd_fn = "YourCustomFileName1_CollCode2356.cif"
parser = CifParser(icsd_fn)

The `get_bibtex_string()` function will give the following error:
TypeError                                 Traceback (most recent call last)
~/anaconda/envs/strumining/lib/python3.6/site-packages/latexcodec/codec.py in encode(self, unicode_, errors)
    804         return (
--> 805             encoder.encode(unicode_, final=True),
    806             len(unicode_),

~/anaconda/envs/strumining/lib/python3.6/site-packages/latexcodec/lexer.py in encode(self, unicode_, final)
    478             return self.emptychar.join(
--> 479                 self.get_latex_bytes(unicode_, final=final))
    480         except UnicodeEncodeError as e:

~/anaconda/envs/strumining/lib/python3.6/site-packages/latexcodec/codec.py in get_latex_bytes(self, unicode_, final)
    725                 "expected unicode for encode input, but got {0} instead"
--> 726                 .format(unicode_.__class__.__name__))
    727         # convert character by character

TypeError: expected unicode for encode input, but got list instead

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
<ipython-input-53-e8cf025fac04> in <module>()
----> 1 parser.get_bibtex_string()

~/anaconda/envs/strumining/lib/python3.6/site-packages/monty/dev.py in decorated(*args, **kwargs)
     90             if not self.condition:
     91                 raise RuntimeError(self.message)
---> 92             return _callable(*args, **kwargs)
     94         return decorated

~/anaconda/envs/strumining/lib/python3.6/site-packages/pymatgen/io/cif.py in get_bibtex_string(self)
   1123             entries['cif-reference-{}'.format(idx)] = Entry('article', list(bibtex_entry.items()))
-> 1125         return BibliographyData(entries).to_string(bib_format='bibtex')
   1127     def as_dict(self):

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/__init__.py in to_string(self, bib_format, **kwargs)
    284         """
    285         writer = find_plugin('pybtex.database.output', bib_format)(**kwargs)
--> 286         return writer.to_string(self)
    288     def to_bytes(self, bib_format, **kwargs):

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/output/__init__.py in to_string(self, bib_data)
     52     def to_string(self, bib_data):
---> 53         result = self._to_string_or_bytes(bib_data)
     54         return result if self.unicode_io else result.decode(self.encoding)

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/output/__init__.py in _to_string_or_bytes(self, bib_data)
     47     def _to_string_or_bytes(self, bib_data):
     48         stream = io.StringIO() if self.unicode_io else io.BytesIO()
---> 49         self.write_stream(bib_data, stream)
     50         return stream.getvalue()

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/output/bibtex.py in write_stream(self, bib_data, stream)
    167                 self._write_persons(stream, persons, role)
    168             for type, value in entry.fields.items():
--> 169                 self._write_field(stream, type, value)
    170             stream.write(u'\n}\n')

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/output/bibtex.py in _write_field(self, stream, type, value)
    122     def _write_field(self, stream, type, value):
--> 123         stream.write(u',\n    %s = %s' % (type, self.quote(self._encode(value))))
    125     def _format_name(self, stream, person):

~/anaconda/envs/strumining/lib/python3.6/site-packages/pybtex/database/output/bibtex.py in _encode(self, text)
    105         import latexcodec  # NOQA
--> 107         return codecs.encode(text, 'ulatex+{}'.format(self.encoding))
    109     def _encode_with_comments(self, text):

TypeError: encoding with 'ulatex+UTF-8' codec failed (TypeError: expected unicode for encode input, but got list instead)

Can anyone help with this? I tested cifs from other DBs, like from COD (Crystallography Open Database), the function works great. But all the cifs I got from ICSD do not work.

Hi @roadtripper, welcome!

Can you share the header for the CIF file you’re looking at? It’s difficult to diagnose problems like this without being able to run an example. As a guess, it’s likely related to some special characters in either an author name or a title.

Hi @mkhorton,

Actually I was initially planning to provide the CIF files in the post, but I didn’t find how to attach files. I uploaded some CIFs in a google drive link google drive

I also show the header of one CIF below, not sure if the format is still strictly copied. This problem I think exists for all the ICSD cifs I tested so far.

#(C) 2019 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure.  All rights reserved.
_database_code_ICSD 2356
_audit_creation_date 1980-01-01
_audit_update_record 2012-08-01
_chemical_name_systematic 'Barium pentaoxodititanate'
_chemical_formula_structural 'Ba Ti2 O5'
_chemical_formula_sum 'Ba1 O5 Ti2'
_chemical_name_structure_type V2GaO5
_exptl_crystal_density_diffrn 5.13
_publ_section_title 'Refinement of barium dititanate'

Acta Crystallographica, Section B: Structural Crystallography and Crystal
; 1974 30 2894 2896 ACBCAR
'Tillmanns, E.'

Ok, thanks! I’ll look into it.

@roadtripper Try deleting the first line of every cif file (it contains the copyright symbol, which I think causes the issue).

I tried but still doesn’t work

If delete the first line, same errorlog

#(C) 2019 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure. All rights reserved.

If continue deleting the first two lines, parser.get_bibtex_string() will give empty output.

#(C) 2019 by FIZ Karlsruhe - Leibniz Institute for Information Infrastructure.  All rights reserved.