Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode character widths
#1
while trying to print out a display map of various Unicode characters paired with their UTF-8 bytes in hex, i am finding unpredictability in how much space a character uses to know how many spaces need to follow it. it looks like what i need to do is some form of absolute positioning around each character i don't know the width of (most of them).

is there a database of this info in Python, somewhere?

do the tools that manage text screen displays support the full Unicode set? they'd need to know how to handle them the right way to format the screen correctly.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
import wcwidth


symbols = [
    '\N{ZERO WIDTH SPACE}',
    '\N{NARROW NO-BREAK SPACE}',
    '\N{MEDIUM MATHEMATICAL SPACE}',
    '\N{IDEOGRAPHIC SPACE}'
    ]
for symbol in symbols:
    print(symbol, wcwidth.wcwidth(symbol))
Output:
0   1   1   2
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
it looks like Unicode has lots of oddities that will make a character code chart very hard to make. but at least my program to show utf-8 byte codes for unicode and beyond (codes all the way up to 2**42 can be encoded if you don't mind having FE and FF in the results).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
Yes, Unicode is complicated.
Ever heard about ligatures?: https://github.com/tonsky/FiraCode
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020