Python Forum
UTF-8 decoder reports bad byte that is not there - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: UTF-8 decoder reports bad byte that is not there (/thread-13349.html)



UTF-8 decoder reports bad byte that is not there - Skaperen - Oct-11-2018

i am getting the error:

Output:
Traceback (most recent call last): File "/home/pdh/aptdata.py", line 233, in <module> result=main(argv) File "/home/pdh/aptdata.py", line 206, in main loadinfo(f,data) File "/home/pdh/aptdata.py", line 123, in loadinfo for line in file: File "/usr/lib/python3.5/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
but there is no byte 0x8b anywhere in the whole file. what should i do about this?

the file is one of the files Ubuntu's package management works with. the name of the file is /var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_xenial-security_main_binary-amd64_Packages. i have added diagnostic output to the code and appears that the problem is happening just before EOF. the last 2 bytes of the file are 0x0a and 0x0a (an empty line at the end. the is iterating over the lines in the file with for line in file: and every line of the file has been iterated. it's like the utf-8' codec is making this up. maybe some lower level i/o code is leaving garbage in a buffer.