Unicode string index problem - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Unicode string index problem (/thread-22693.html) |
Unicode string index problem - luoheng - Nov-23-2019 Hey, guys. I met a problem when indexing a unicode string. Here is the code: >>> string = "ábcdefg" >>> string[0] aThe index result is a , but I hope it is á . It seems the string index is based on unicode scalar rather than extended grapheme cluster.Is there any way for me to get the real character á , or in other words, can I visit string in human readable pattern ?
RE: Unicode string index problem - Larz60+ - Nov-23-2019 string[0] is a, not á RE: Unicode string index problem - Gribouillis - Nov-23-2019 Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from>>> t = "\u00E1bcdefg" >>> t 'ábcdefg' >>> t[0] 'á' >>> ord(t[0]) 225where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
RE: Unicode string index problem - luoheng - Nov-23-2019 (Nov-23-2019, 09:42 AM)Gribouillis Wrote: Indeed, in this stringHere is another try with your example: string2 = "ábcdefg" # this is mine string = "ábcdefg" # this is yours print(string[0], string2[0]) # output á aThis is really strange, they look same, but they are not equal. How to distinguish these two strings ? RE: Unicode string index problem - Larz60+ - Nov-23-2019 How do you say they look the same? they don't RE: Unicode string index problem - Gribouillis - Nov-23-2019 luoheng Wrote:How to distinguish these two strings ?You cannot distinguish them visually but the two arrays of unicode characters are different. Your string has 8 characters instead of 7 and the two first characters are different. RE: Unicode string index problem - luoheng - Nov-23-2019 (Nov-23-2019, 11:06 AM)Larz60+ Wrote: How do you say they look the same? they don'tI can't find any difference... They are equal in swift language, because swift regards they look same. |