Python Forum
Unicode string index problem - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Unicode string index problem (/thread-22693.html)



Unicode string index problem - luoheng - Nov-23-2019

Hey, guys. I met a problem when indexing a unicode string.
Here is the code:
>>> string = "ábcdefg"
>>> string[0]
a
The index result is a, but I hope it is . It seems the string index is based on unicode scalar rather than extended grapheme cluster.
Is there any way for me to get the real character , or in other words, can I visit string in human readable pattern ?


RE: Unicode string index problem - Larz60+ - Nov-23-2019

string[0] is a, not á


RE: Unicode string index problem - Gribouillis - Nov-23-2019

Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
>>> t = "\u00E1bcdefg"
>>> t
'ábcdefg'
>>> t[0]
'á'
>>> ord(t[0])
225
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE


RE: Unicode string index problem - luoheng - Nov-23-2019

(Nov-23-2019, 09:42 AM)Gribouillis Wrote: Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
 >>> t = "\u00E1bcdefg" >>> t 'ábcdefg' >>> t[0] 'á' >>> ord(t[0]) 225 
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
Here is another try with your example:
string2 = "ábcdefg"  # this is mine
string = "ábcdefg"   # this is yours
print(string[0], string2[0])
# output
á a
This is really strange, they look same, but they are not equal. How to distinguish these two strings ?


RE: Unicode string index problem - Larz60+ - Nov-23-2019

How do you say they look the same?
they don't


RE: Unicode string index problem - Gribouillis - Nov-23-2019

luoheng Wrote:How to distinguish these two strings ?
You cannot distinguish them visually but the two arrays of unicode characters are different. Your string has 8 characters instead of 7 and the two first characters are different.


RE: Unicode string index problem - luoheng - Nov-23-2019

(Nov-23-2019, 11:06 AM)Larz60+ Wrote: How do you say they look the same? they don't
I can't find any difference...
They are equal in swift language, because swift regards they look same.