Python Forum
tool wanted: to convert utf8 <-> unicode in hex
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
tool wanted: to convert utf8 <-> unicode in hex
#11
i can see Japanese in your post. i can't read it, though.

i was hoping for something easier. maybe i can script around iconv and xxd. there are unicode codes that are beyond 16 bits. python3 handles them in string by using, or at least by giving the impression it uses, more than 16 bits. the "\U" coding sequence lets you use more than 4 hexadecimal digits.

utf-16 uses a sequence of two 16-bit codes of special value not assigned to anything in unicode (called surrogates) to represent these. utf-8 does not allow encoding of the surrogate values but could if forced through the logic. utf-8 can encode the higher values of unicode (up to 21 bits in 4 octets/bytes, up to 26 bits in 5 octets/bytes, or up to 31 bits in 6 octets/bytes). i've done that in python but didn't have the tool(s) on hand to verify if it was right.

utf-16 is, imho, an ugly hack. i'm glad python doesn't go there.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020