Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
width of Unicode character
#1
i have been print()ing Unicode characters (not on paper) from a script that wraps them in double quotes. this overlapped so i added an extra space after the 1st quote and before the 2nd quote. some characters cause the quotes to show closer together as if they occupy no space (without my added space the 2 quotes would be jammed together as if nothing was between them). yet these odd characters still have a glyph that gets shown. in a few cases, the character is so wide it still overlaps the 2nd quote even with the added space (i might need to add more).

i know the displayed result is not controlled by Python. but, is there any data available in Python that can tell how the character will be printed, including right-to-left ones such as Hebrew and Arabic? knowing can help the script format the output (to make a nice dump of all printable characters).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
Can you give an example?
Reply
#3
I think it depends on the operating system or at least the Python library you are using.
Reply
#4
(Sep-26-2021, 02:47 AM)bowlofred Wrote: Can you give an example?

http://ipal.net/python-forum/20210926131...532892.png

this output shows the Unicode code, its decimal value between parenthesis, the UTF-8 octets in hexadecimal, and if printable an ' = ' followed by the raw Unicode character between '" ' and ' "'. note how U+0483 .. U+0489 are shifted left and reduce the total space between the double quotes. this output is formed by xfce4terminal version 4.12 in Xubuntu 18.04.5.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#5
(Sep-26-2021, 03:28 AM)SamHobbs Wrote: I think it depends on the operating system or at least the Python library you are using.

i think the OS (Xubuntu) and Python are just passing the bytes along (the UTF-8 after the Python library does the encoding). i think it is the terminal emulator rendering it that way. i suspect some kind of Unicode standard says to do it that way. what i am hoping for is some kind of data that can describe how to expect it to be rendered (by the terminal emulator).

the script, in this case, wrote the output to a file. it wrote different files based on how long their UTF-8 string would be. this image shows file "2" because these are 2 byte UTF-8 codes.

sources can be accessed at:
http://ipal.net/python-forum/listutf8.py
http://ipal.net/python-forum/to_utf8.py
http://ipal.net/python-forum/un_utf8.py
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
Python has unicodedata.east_asian_width(), but the information there doesn't seem to correspond to the different ways the characters are displayed.
Reply
#7
it seems some characters are intended to go back and overstrike the previous character and have a positional width of zero. i don't know how that should work with wider characters. and have seen at least one that looks to be triple wide while having a positional width of just one. i have seen a few double wide that act different whether followed by a space or not. i think i am going to have to dig into this terminal program code and see how it decides what to do. in the mean time my challenge will be to output a grid of at least 2048 Unicode characters in a way to see the code value easily.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python code to set column width 1418 11 1,228 Jan-20-2024, 07:20 AM
Last Post: Pedroski55
  Fixed colum width for rowLabels i Matplotlib pandabay 0 428 Jun-10-2023, 03:40 PM
Last Post: pandabay
  [solved] unexpected character after line continuation character paul18fr 4 3,419 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  image.thumbnail(width, height) not working PCesarano 2 3,443 Apr-08-2021, 06:09 PM
Last Post: PCesarano
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,186 Jul-13-2020, 07:05 PM
Last Post: snippsat
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,754 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  How can I get the width of a string in Python? aquerci 14 16,198 May-27-2019, 06:00 PM
Last Post: heiner55
  fixed width numbers Skaperen 15 8,635 May-27-2019, 09:42 AM
Last Post: Skaperen
  Replace changing string including uppercase character with lowercase character silfer 11 6,215 Mar-25-2019, 12:54 PM
Last Post: silfer
  # of bytes used to store a Unicode character insearchofanswers87 3 2,712 Jan-19-2019, 04:01 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020