Unicode character widths

Skaperen · Apr-15-2019, 02:30 AM

while trying to print out a display map of various Unicode characters paired with their UTF-8 bytes in hex, i am finding unpredictability in how much space a character uses to know how many spaces need to follow it. it looks like what i need to do is some form of absolute positioning around each character i don't know the width of (most of them).

is there a database of this info in Python, somewhere?

do the tools that manage text screen displays support the full Unicode set? they'd need to know how to handle them the right way to format the screen correctly.

DeaD_EyE · Apr-15-2019, 08:21 AM

import wcwidth


symbols = [
    '\N{ZERO WIDTH SPACE}',
    '\N{NARROW NO-BREAK SPACE}',
    '\N{MEDIUM MATHEMATICAL SPACE}',
    '\N{IDEOGRAPHIC SPACE}'
    ]
for symbol in symbols:
    print(symbol, wcwidth.wcwidth(symbol))

Output: 0
  1
  1
　 2

Skaperen · Apr-16-2019, 05:11 AM

it looks like Unicode has lots of oddities that will make a character code chart very hard to make. but at least my program to show utf-8 byte codes for unicode and beyond (codes all the way up to 2**42 can be encoded if you don't mind having FE and FF in the results).

DeaD_EyE · Apr-16-2019, 08:09 AM

Yes, Unicode is complicated.
Ever heard about ligatures?: https://github.com/tonsky/FiraCode

Unicode character widths

User Panel Messages

Announcements