Python Forum
Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unicode character search
#1
I rarely have to worry about unicode, especially at the point (character) level.
I'm finding out that I really don't know how to, and can't find a whole lot of help
by searching google (perhaps because I don't know how to formulate my question)

I need to replace certain UTF8 points in my file because Microsoft does not include them
in their UTF8 definition. the self.ms_no_points dictionary causes the error


# from Kebap: May I suggest A I I D Y instead of Á Í Ï Ð Ý


class Utf8stuff:
    def __init__(self, infile_name=None, outfile_name=None):
        self.infile_name = infile_name
        self.outfile_name = outfile_name
        self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}

        with open(self.infile_name) as f:
            self.inbuff = f.readlines()
        self.process_input()

    def process_input(self):
        linecount = 1
        for line in self.inbuff:
            for key, value in self.ms_no_points.items():
                if key in line:
                    pos = line.index(key)
                    print('found {} at pos: {} in line {}'.format(key, pos, linecount))
            linecount += 1

if __name__ == '__main__':
    ifile = 'er.sql'
    ofile = 'erNew.sql'
    Utf8stuff(infile_name=ifile, outfile_name=ofile)
traceback:
Error:
  File " .../myconv.py", line 9     self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}                                 ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
And so how's it done?
Reply
#2
\u is an Unicode escape in Python 3.
Turn around(/) or raw string.
>>> s = '\u'
Traceback (most recent call last):
 File "python", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape 

>>> s = '/u'
>>> s
'/u'

>>> s = r'\u'
>>> s
'\\u'  
Reply
#3
Thanks snippsat
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  width of Unicode character Skaperen 6 2,649 Sep-27-2021, 12:41 AM
Last Post: Skaperen
  [solved] unexpected character after line continuation character paul18fr 4 3,296 Jun-22-2021, 03:22 PM
Last Post: deanhystad
  SyntaxError: unexpected character after line continuation character siteshkumar 2 3,111 Jul-13-2020, 07:05 PM
Last Post: snippsat
  how can i handle "expected a character " type error , when I input no character vivekagrey 2 2,675 Jan-05-2020, 11:50 AM
Last Post: vivekagrey
  Replace changing string including uppercase character with lowercase character silfer 11 6,077 Mar-25-2019, 12:54 PM
Last Post: silfer
  # of bytes used to store a Unicode character insearchofanswers87 3 2,657 Jan-19-2019, 04:01 PM
Last Post: ichabod801
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 3,919 Nov-23-2018, 09:17 PM
Last Post: Gribouillis
  How to specify a Unicode character with Autokey keyboard.sendkeys()? ineuw 5 5,853 Nov-04-2017, 08:43 PM
Last Post: ineuw
  SyntaxError: unexpected character after line continuation character Saka 2 18,476 Sep-26-2017, 09:34 AM
Last Post: Saka

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020