Hey, guys. I met a problem when indexing a unicode string.
Here is the code:
>>> string = "ábcdefg"
>>> string[0]
a
The index result is
a
, but I hope it is
á
. It seems the string index is based on unicode scalar rather than extended grapheme cluster.
Is there any way for me to get the real character
á
, or in other words, can I visit string in human readable pattern ?
Indeed, in this string
string[0]
is
a
and
string[1]
is unicode
U+0301 COMBINING ACUTE ACCENT. This string is different from
>>> t = "\u00E1bcdefg"
>>> t
'ábcdefg'
>>> t[0]
'á'
>>> ord(t[0])
225
where
t[0]
is
U+00E1, LATIN SMALL LETTER A WITH ACUTE
(Nov-23-2019, 09:42 AM)Gribouillis Wrote: [ -> ]Indeed, in this string string[0]
is a
and string[1]
is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from >>> t = "\u00E1bcdefg" >>> t 'ábcdefg' >>> t[0] 'á' >>> ord(t[0]) 225
where t[0]
is U+00E1, LATIN SMALL LETTER A WITH ACUTE
Here is another try with your example:
string2 = "ábcdefg" # this is mine
string = "ábcdefg" # this is yours
print(string[0], string2[0])
# output
á a
This is really strange, they look same, but they are not equal. How to distinguish these two strings ?
How do you say they look the same?
they don't
luoheng Wrote:How to distinguish these two strings ?
You cannot distinguish them visually but the two arrays of unicode characters are different. Your string has 8 characters instead of 7 and the two first characters are different.
(Nov-23-2019, 11:06 AM)Larz60+ Wrote: [ -> ]How do you say they look the same? they don't
I can't find any difference...
They are equal in swift language, because swift regards they look same.