Python Forum

Full Version: Unicode string index problem
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hey, guys. I met a problem when indexing a unicode string.
Here is the code:
>>> string = "ábcdefg"
>>> string[0]
a
The index result is a, but I hope it is . It seems the string index is based on unicode scalar rather than extended grapheme cluster.
Is there any way for me to get the real character , or in other words, can I visit string in human readable pattern ?
string[0] is a, not á
Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
>>> t = "\u00E1bcdefg"
>>> t
'ábcdefg'
>>> t[0]
'á'
>>> ord(t[0])
225
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
(Nov-23-2019, 09:42 AM)Gribouillis Wrote: [ -> ]Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
 >>> t = "\u00E1bcdefg" >>> t 'ábcdefg' >>> t[0] 'á' >>> ord(t[0]) 225 
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
Here is another try with your example:
string2 = "ábcdefg"  # this is mine
string = "ábcdefg"   # this is yours
print(string[0], string2[0])
# output
á a
This is really strange, they look same, but they are not equal. How to distinguish these two strings ?
How do you say they look the same?
they don't
luoheng Wrote:How to distinguish these two strings ?
You cannot distinguish them visually but the two arrays of unicode characters are different. Your string has 8 characters instead of 7 and the two first characters are different.
(Nov-23-2019, 11:06 AM)Larz60+ Wrote: [ -> ]How do you say they look the same? they don't
I can't find any difference...
They are equal in swift language, because swift regards they look same.