Sep-05-2017, 02:12 PM
As of Unicode 10, the black diamond question mark (hex fffd) is used "to replace an unknown, unrecognized or unrepresentable character". Trying to decode it as utf-8 will fail, because utf-8 has already said it doesn't know what it is. One possible cause is that the characters do not lie in the Basic Multilingual Plane of 65424 code points. Might we ask, what language the original file is written in?
It seems you should be able to test for those characters, and if they exist, split them out, if they don't, proceed as normal.
It seems you should be able to test for those characters, and if they exist, split them out, if they don't, proceed as normal.
If it ain't broke, I just haven't gotten to it yet.
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition
OS: Windows 10, openSuse 42.3, freeBSD 11, Raspian "Stretch"
Python 3.6.5, IDE: PyCharm 2018 Community Edition