Python Forum

Full Version: Special Characters read-write
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have a directory filled with .gz text archives. To scan these archives, I use the following python code:

    with gzip.open(logDir+"\\"+fileName, mode="rb") as archive:
        for filename in archive:
            print(filename.decode().strip())
All used to work, however, the new system adds lines similar to this:

:§f Press [§bJ§f]

Python gives me this error:

File "C:\Users\Me\Documents\Python\ConvertLog.py", line 16, in readZIP print(filename.decode().strip())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa7 in position 49: invalid start byte
Anyone know a way of dealing with strange characters that pop up? I can't just ignore the line. This happens to be one of the few lines I need to strip out and write to a condensed report.

I tried other modes, besides "rb". I really have no idea what else to try.
Try to use the chardet module to detect the filename's encoding
>>> import chardet
>>> b = 'bépoç%$'.encode('latin1')
>>> b
b'b\xe9po\xe7%$'
>>> b.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 1: invalid continuation byte
>>> chardet.detect(b)
{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
>>> enc = chardet.detect(b)['encoding']
>>> b.decode(enc)
'bépoç%$'