Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
utf-8
#2
I started with it, but there is still a bug inside. The last sign consumes two chars.
Code is for Python 3.x

from binascii import hexlify
import unicodedata

def str_to_hex_str_with_space(char):
   hex_bytes = hexlify(char.encode())
   hex_str = hex_bytes.decode()
   hex_str = ' '.join(hex_str[n:n+2] for n in range(0, len(hex_str), 2))
   return hex_str.upper()


def get_table(code_points):
   for code in code_points:
       char = chr(code)
       name = unicodedata.name(char)
       unic = 'U+{:05X}'.format(code)
       encoded = str_to_hex_str_with_space(char)
       yield char, name, unic, encoded, char


def print_table(code_points):
   header = ['Char', 'Name', 'Unicode', 'UTF-8', 'Decoded']
   fmt_str = '{:<10s}{:<45s}{:<10s}{:<10s}'
   print(fmt_str.format(*header))
   for row in get_table(code_points):
       row = fmt_str.format(*row)
       print(row)


if __name__ == '__main__':
   code_points = (0x0041, 0x00F6, 0x0416, 0x20AC, 0x1D11E)
   print_table(code_points)
Output:
Output:
Char      Name                                         Unicode   UTF-8     A         LATIN CAPITAL LETTER A                       U+00041   41         ö         LATIN SMALL LETTER O WITH DIAERESIS          U+000F6   C3 B6     Ж         CYRILLIC CAPITAL LETTER ZHE                  U+00416   D0 96     €         EURO SIGN                                    U+020AC   E2 82 AC   ?         MUSICAL SYMBOL G CLEF                        U+1D11E   F0 9D 84 9E
Someone other should improve it.
Don't post this code. What I whish: a good replacement for  str_to_hex_str_with_space
And a bugfix for the last sign. It takes two spaces.
After it has been fixed, we should put get_table and print_table in one function.
It's also possible to use directly the unicode signs instead of codepoints.

Edit: I'm not able to post the last sign. The forum doesn't accept it :-(
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
utf-8 - by Skaperen - Jun-25-2017, 04:23 AM
RE: utf-8 - by DeaD_EyE - Jun-25-2017, 03:07 PM
RE: utf-8 - by Skaperen - Jun-26-2017, 03:04 AM
RE: utf-8 - by DeaD_EyE - Jun-26-2017, 03:50 AM
RE: utf-8 - by Skaperen - Jun-27-2017, 02:40 AM
RE: utf-8 - by snippsat - Jun-27-2017, 11:16 AM
RE: utf-8 - by DeaD_EyE - Jun-27-2017, 12:05 PM
RE: utf-8 - by Skaperen - Jun-29-2017, 04:19 AM
RE: utf-8 - by snippsat - Jun-29-2017, 05:01 AM

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020