width of Unicode character

Skaperen · Sep-26-2021, 02:16 AM

i have been print()ing Unicode characters (not on paper) from a script that wraps them in double quotes. this overlapped so i added an extra space after the 1st quote and before the 2nd quote. some characters cause the quotes to show closer together as if they occupy no space (without my added space the 2 quotes would be jammed together as if nothing was between them). yet these odd characters still have a glyph that gets shown. in a few cases, the character is so wide it still overlaps the 2nd quote even with the added space (i might need to add more).

i know the displayed result is not controlled by Python. but, is there any data available in Python that can tell how the character will be printed, including right-to-left ones such as Hebrew and Arabic? knowing can help the script format the output (to make a nice dump of all printable characters).

bowlofred · Sep-26-2021, 02:47 AM

Can you give an example?

SamHobbs · Sep-26-2021, 03:28 AM

I think it depends on the operating system or at least the Python library you are using.

Skaperen · Sep-26-2021, 05:33 PM

(Sep-26-2021, 02:47 AM)bowlofred Wrote: Can you give an example?

http://ipal.net/python-forum/20210926131...532892.png

this output shows the Unicode code, its decimal value between parenthesis, the UTF-8 octets in hexadecimal, and if printable an ' = ' followed by the raw Unicode character between '" ' and ' "'. note how U+0483 .. U+0489 are shifted left and reduce the total space between the double quotes. this output is formed by xfce4terminal version 4.12 in Xubuntu 18.04.5.

Skaperen · Sep-26-2021, 06:09 PM

(Sep-26-2021, 03:28 AM)SamHobbs Wrote: I think it depends on the operating system or at least the Python library you are using.

i think the OS (Xubuntu) and Python are just passing the bytes along (the UTF-8 after the Python library does the encoding). i think it is the terminal emulator rendering it that way. i suspect some kind of Unicode standard says to do it that way. what i am hoping for is some kind of data that can describe how to expect it to be rendered (by the terminal emulator).

the script, in this case, wrote the output to a file. it wrote different files based on how long their UTF-8 string would be. this image shows file "2" because these are 2 byte UTF-8 codes.

sources can be accessed at:
http://ipal.net/python-forum/listutf8.py
http://ipal.net/python-forum/to_utf8.py
http://ipal.net/python-forum/un_utf8.py

bowlofred · Sep-26-2021, 08:55 PM

Python has unicodedata.east_asian_width(), but the information there doesn't seem to correspond to the different ways the characters are displayed.

Skaperen · (This post was last modified: Sep-27-2021, 12:42 AM by Skaperen.)

it seems some characters are intended to go back and overstrike the previous character and have a positional width of zero. i don't know how that should work with wider characters. and have seen at least one that looks to be triple wide while having a positional width of just one. i have seen a few double wide that act different whether followed by a space or not. i think i am going to have to dig into this terminal program code and see how it decides what to do. in the mean time my challenge will be to output a grid of at least 2048 Unicode characters in a way to see the code value easily.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Python code to set column width	1418	11	8,217	Jan-20-2024, 07:20 AM Last Post: Pedroski55
	Fixed colum width for rowLabels i Matplotlib	pandabay	0	1,149	Jun-10-2023, 03:40 PM Last Post: pandabay
	[solved] unexpected character after line continuation character	paul18fr	4	7,396	Jun-22-2021, 03:22 PM Last Post: deanhystad
	image.thumbnail(width, height) not working	PCesarano	2	5,026	Apr-08-2021, 06:09 PM Last Post: PCesarano
	SyntaxError: unexpected character after line continuation character	siteshkumar	2	4,263	Jul-13-2020, 07:05 PM Last Post: snippsat
	how can i handle "expected a character " type error , when I input no character	vivekagrey	2	3,675	Jan-05-2020, 11:50 AM Last Post: vivekagrey
	How can I get the width of a string in Python?	aquerci	14	21,032	May-27-2019, 06:00 PM Last Post: heiner55
	fixed width numbers	Skaperen	15	12,735	May-27-2019, 09:42 AM Last Post: Skaperen
	Replace changing string including uppercase character with lowercase character	silfer	11	8,760	Mar-25-2019, 12:54 PM Last Post: silfer
	# of bytes used to store a Unicode character	insearchofanswers87	3	3,643	Jan-19-2019, 04:01 PM Last Post: ichabod801

width of Unicode character

User Panel Messages

Announcements