Python Forum
Unicode character search - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Unicode character search (/thread-3023.html)



Unicode character search - Larz60+ - Apr-25-2017

I rarely have to worry about unicode, especially at the point (character) level.
I'm finding out that I really don't know how to, and can't find a whole lot of help
by searching google (perhaps because I don't know how to formulate my question)

I need to replace certain UTF8 points in my file because Microsoft does not include them
in their UTF8 definition. the self.ms_no_points dictionary causes the error


# from Kebap: May I suggest A I I D Y instead of Á Í Ï Ð Ý


class Utf8stuff:
    def __init__(self, infile_name=None, outfile_name=None):
        self.infile_name = infile_name
        self.outfile_name = outfile_name
        self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}

        with open(self.infile_name) as f:
            self.inbuff = f.readlines()
        self.process_input()

    def process_input(self):
        linecount = 1
        for line in self.inbuff:
            for key, value in self.ms_no_points.items():
                if key in line:
                    pos = line.index(key)
                    print('found {} at pos: {} in line {}'.format(key, pos, linecount))
            linecount += 1

if __name__ == '__main__':
    ifile = 'er.sql'
    ofile = 'erNew.sql'
    Utf8stuff(infile_name=ifile, outfile_name=ofile)
traceback:
Error:
  File " .../myconv.py", line 9     self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'}                                 ^ SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
And so how's it done?


RE: Unicode character search - snippsat - Apr-25-2017

\u is an Unicode escape in Python 3.
Turn around(/) or raw string.
>>> s = '\u'
Traceback (most recent call last):
 File "python", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape 

>>> s = '/u'
>>> s
'/u'

>>> s = r'\u'
>>> s
'\\u'  



RE: Unicode character search - Larz60+ - Apr-25-2017

Thanks snippsat