Python Forum

Full Version: UTF-8 decoder reports bad byte that is not there
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i am getting the error:

Output:
Traceback (most recent call last): File "/home/pdh/aptdata.py", line 233, in <module> result=main(argv) File "/home/pdh/aptdata.py", line 206, in main loadinfo(f,data) File "/home/pdh/aptdata.py", line 123, in loadinfo for line in file: File "/usr/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
but there is no byte 0x8b anywhere in the whole file. what should i do about this?

the file is one of the files Ubuntu's package management works with. the name of the file is /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_xenial-security_main_binary-amd64_Packages. i have added diagnostic output to the code and appears that the problem is happening just before EOF. the last 2 bytes of the file are 0x0a and 0x0a (an empty line at the end. the is iterating over the lines in the file with for line in file: and every line of the file has been iterated. it's like the utf-8' codec is making this up. maybe some lower level i/o code is leaving garbage in a buffer.