Python Forum
Unicode string index problem
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode string index problem
#1
Hey, guys. I met a problem when indexing a unicode string.
Here is the code:
>>> string = "ábcdefg"
>>> string[0]
a
The index result is a, but I hope it is . It seems the string index is based on unicode scalar rather than extended grapheme cluster.
Is there any way for me to get the real character , or in other words, can I visit string in human readable pattern ?
Reply
#2
string[0] is a, not á
Reply
#3
Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
>>> t = "\u00E1bcdefg"
>>> t
'ábcdefg'
>>> t[0]
'á'
>>> ord(t[0])
225
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
Reply
#4
(Nov-23-2019, 09:42 AM)Gribouillis Wrote: Indeed, in this string string[0] is a and string[1] is unicode U+0301 COMBINING ACUTE ACCENT. This string is different from
 >>> t = "\u00E1bcdefg" >>> t 'ábcdefg' >>> t[0] 'á' >>> ord(t[0]) 225 
where t[0] is U+00E1, LATIN SMALL LETTER A WITH ACUTE
Here is another try with your example:
string2 = "ábcdefg"  # this is mine
string = "ábcdefg"   # this is yours
print(string[0], string2[0])
# output
á a
This is really strange, they look same, but they are not equal. How to distinguish these two strings ?
Reply
#5
How do you say they look the same?
they don't
Reply
#6
luoheng Wrote:How to distinguish these two strings ?
You cannot distinguish them visually but the two arrays of unicode characters are different. Your string has 8 characters instead of 7 and the two first characters are different.
Reply
#7
(Nov-23-2019, 11:06 AM)Larz60+ Wrote: How do you say they look the same? they don't
I can't find any difference...
They are equal in swift language, because swift regards they look same.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  f string concatenation problem growSeb 3 407 Jun-28-2021, 05:00 AM
Last Post: buran
Question Problem with string and \n Falassion 6 520 Jun-15-2021, 03:59 PM
Last Post: Falassion
  string index out of range jade_kim 4 710 Jan-13-2021, 05:41 AM
Last Post: jade_kim
  how to deal with problem of converting string to int usthbstar 1 571 Jan-05-2021, 01:33 PM
Last Post: perfringo
  string problem Mathisdlg 6 1,185 Aug-05-2020, 09:31 AM
Last Post: Mathisdlg
  String index out of bounds ( Python : Dict ) kommu 2 852 Jun-25-2020, 08:52 PM
Last Post: menator01
  Remove escape characters / Unicode characters from string DreamingInsanity 5 3,840 May-15-2020, 01:37 PM
Last Post: snippsat
  String index out of range - help please DudleyDiccle 7 1,315 Mar-27-2020, 12:10 AM
Last Post: DudleyDiccle
  Unicode problem Hobson 4 1,286 Feb-10-2020, 02:59 PM
Last Post: Hobson
  How to get the index of a character from a string chihaya 1 879 Dec-03-2019, 12:54 PM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020