Bottom Page

Thread Rating:
  • 1 Vote(s) - 3 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Unicode character search
#1
I rarely have to worry about unicode, especially at the point (character) level.
I'm finding out that I really don't know how to, and can't find a whole lot of help
by searching google (perhaps because I don't know how to formulate my question)

I need to replace certain UTF8 points in my file because Microsoft does not include them
in their UTF8 definition. the self.ms_no_points dictionary causes the error


# from Kebap: May I suggest A I I D Y instead of Á Í Ï Ð Ý


class Utf8stuff:
    def __init__(self, infile_name=None, outfile_name=None):
        self.infile_name = infile_name
        self.outfile_name = outfile_name
        self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}

        with open(self.infile_name) as f:
            self.inbuff = f.readlines()
        self.process_input()

    def process_input(self):
        linecount = 1
        for line in self.inbuff:
            for key, value in self.ms_no_points.items():
                if key in line:
                    pos = line.index(key)
                    print('found {} at pos: {} in line {}'.format(key, pos, linecount))
            linecount += 1

if __name__ == '__main__':
    ifile = 'er.sql'
    ofile = 'erNew.sql'
    Utf8stuff(infile_name=ifile, outfile_name=ofile)
traceback:
Error:
  File " .../myconv.py", line 9     self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}                                 ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
And so how's it done?
Quote
#2
\u is an Unicode escape in Python 3.
Turn around(/) or raw string.
>>> s = '\u'
Traceback (most recent call last):
 File "python", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape 

>>> s = '/u'
>>> s
'/u'

>>> s = r'\u'
>>> s
'\\u'  
Quote
#3
Thanks snippsat
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Replace changing string including uppercase character with lowercase character silfer 11 723 Mar-25-2019, 12:54 PM
Last Post: silfer
  # of bytes used to store a Unicode character insearchofanswers87 3 333 Jan-19-2019, 04:01 PM
Last Post: ichabod801
  clean unicode string to contain only characters from some unicode blocks gmarcon 2 559 Nov-23-2018, 09:17 PM
Last Post: Gribouillis
  How to specify a Unicode character with Autokey keyboard.sendkeys()? ineuw 5 2,280 Nov-04-2017, 08:43 PM
Last Post: ineuw
  SyntaxError: unexpected character after line continuation character Saka 2 11,596 Sep-26-2017, 09:34 AM
Last Post: Saka

Forum Jump:


Users browsing this thread: 1 Guest(s)