Python Forum
my un_utf8() function
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
my un_utf8() function
#1
i am making a function named un_utf8() that converts UTF-8 back to Unicode. i already wrote to_utf8() that converts Unicode to UTF-8. i know Python3 already has a working means to do these conversions. but my conversion does more. it supports Modified UTF-8 (MUTF-8), eXtended UTF-8 (XUTF-8), Ultra UTF-8 (UUTF-8), as well as plain old UTF-8 (UTF-8). UTF-8 supports up to 31 bits. XUTF-8 supports up to 36 bits. UUTF-8 supports up to 42 bits. the latter two (XUTF-8 and UUTF-8) are for special case needs such as advanced display device operations and generate octet codes that require the highest values (254 and 255) of an 8 bit stream.

my functions support sequences of codes in a variety of types and return the resulting codes in a like type..

conversion from UTF-8 back to code points can be problematic. there can be bad codes that cannot be converted to Unicode or larger code points. maybe the caller only wants codes up to a certain limit (such as 1114112 for Unicode or much higher with XUTF-8 or UUTF-8). un_utf8() will need to handle error conditions like this in a reasonable way to allow calling programs to report errors to users with sufficient detail to make it reasonably easy for users to correct these problems,

one way i am looking at doing this is by modifying mutable containers, so that when there is more than one sequence or an incomplete sequence, the sequence in error, and everything that follows it, is left remaining in the container, and everything that was successfully converted is removed. so if the caller wants this to happen it provides the sequence in a mutable form.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020