Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
unicode to utf-8
#1
if i have a list with a series of Unicode code point values as ints and want to convert to a list of utf-8 code values as ints, how could i achieve this in Python? the reverse would be nice, too.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Please post your current code, outputs and errors?
Are you using Python3,right?

Are you getting this error?
Output:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
You can use replace or ignore when decoding:

>>> b'\x80abc'.decode("utf-8", "strict")  
Traceback (most recent call last):
    ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0:
  invalid start byte
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "backslashreplace")
'\\x80abc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
Reply
#3
no current code, still planning. but i want to avoid all these error messages and get the results, even for many "invalid" cases like overlong codes (when doing utf8 to unicode) and out of range (when doing unicode to utf8).

according to some past readings (i no longer have the urls) about some aws stuff that was developed in python on their servers (i can't get the source) python's utf8 code does not handle some conversion cases correctly. in order to see if some errors i am getting at aws are related, i need some conversion that handles all cases. so it can't use what python already has. so i am writing my own. i still need a means to check if my own is correct.

i will be doing mine to run in python2.7 and python3.5 (hopefully all python3).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 3,973 Nov-23-2018, 09:17 PM
Last Post: Gribouillis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020