Jan-21-2021, 12:24 AM
i am getting some large strings (or bytes) with many kinds of backslash sequences including unicode sequences. is there a codec that can decode all of these to make a better string (str)?
>>> s = b'hello \xf0\x9f\xa4\xa8' >>> s.decode() # Same as s.decode('utf-8') 'hello 🤨' >>> s = 'hello 🤨' >>> s.encode() #Same as s.encode('utf-8') b'hello \xf0\x9f\xa4\xa8From file.
with open('uni_hello.txt', encoding='utf-8') as f: print(f.read())
Output:hello 🤨
Defect encoding or fix if it's all messed up🧬open("content.txt", encoding='utf-8', errors='replace') open("content.txt", encoding='latin-1', errors='ignore')chardet | python-ftfy | Unidecode
(Jan-21-2021, 10:12 AM)snippsat Wrote: [ -> ]It depend on the source encoding,if it's utf-8 it should be strait forward.
>>> s = b'hello \xf0\x9f\xa4\xa8' >>> s.decode() # Same as s.decode('utf-8') 'hello 🤨' >>> s = 'hello 🤨' >>> s.encode() #Same as s.encode('utf-8') b'hello \xf0\x9f\xa4\xa8From file.
with open('uni_hello.txt', encoding='utf-8') as f: print(f.read())Defect encoding or fix if it's all messed up🧬
Output:hello 🤨
open("content.txt", encoding='utf-8', errors='replace') open("content.txt", encoding='latin-1', errors='ignore')chardet | python-ftfy | Unidecode
(Jan-21-2021, 12:49 PM)Aspire2Inspire Wrote: [ -> ]Hey! I know you've answered this one. When i ned decoding like this i use just latin, what is the difference between latin and latin-1
? If you know, thank you!
Standard Encodingslatin-1
is the top name name for this type of Western Europe encoding.latin
.>>> s = '¼ cup of flour' >>> s.encode('latin-1') b'\xbc cup of flour' >>> >>> s.encode('latin') b'\xbc cup of flour' >>> >>> s.encode('iso-8859-1') b'\xbc cup of flour' >>> >>> s.encode('L1') b'\xbc cup of flour'