The \x00 you see, are bytes in hexadecimal representation. This is the representation of the string.
This representation is used in
All characters, which can not displayed or are control characters, are displayed in this format.
If you print them, you don't see this internal representation of string literals.
With your data:
This representation is used in
str
, bytes
, bytearray
.All characters, which can not displayed or are control characters, are displayed in this format.
If you print them, you don't see this internal representation of string literals.
With your data:
items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85'] for item in items: print(item)
Output:Alexander Lepsveridze
John Comeau
ÎÎºÎ·Ï Î¤ÏιάμηÏ
ÎÎ¼Î¹Î»Î¿Ï Î¤ÏοÏ
Ï
λίοÏ
Now with a module, which can fix broken encodings:import ftfy items = ['', 'Alexander Lepsveridze', 'John Comeau', '\xce\x86\xce\xba\xce\xb7\xcf\x82 \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82', '\xce\x8c\xce\xbc\xce\xb9\xce\xbb\xce\xbf\xcf\x82 \xce\xa4\xcf\x83\xce\xbf\xcf\x84\xcf\x85\xce\xbb\xce\xaf\xce\xbf\xcf\x85'] for item in items: print(ftfy.fix_encoding(item))
Output:Alexander Lepsveridze
John Comeau
Άκης Τσιάμης
Όμιλος Τσοτυλίου
The string was originally utf8, but was encoded with latin1.print(items[-1].encode('latin1').decode('utf8'))
Output:Όμιλος Τσοτυλίου
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!