Unicode character search - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Unicode character search (/thread-3023.html) |
Unicode character search - Larz60+ - Apr-25-2017 I rarely have to worry about unicode, especially at the point (character) level. I'm finding out that I really don't know how to, and can't find a whole lot of help by searching google (perhaps because I don't know how to formulate my question) I need to replace certain UTF8 points in my file because Microsoft does not include them in their UTF8 definition. the self.ms_no_points dictionary causes the error # from Kebap: May I suggest A I I D Y instead of Á Í Ï Ð Ý class Utf8stuff: def __init__(self, infile_name=None, outfile_name=None): self.infile_name = infile_name self.outfile_name = outfile_name self.ms_no_points = {'\u+081': 'A', '\u+08d': 'I', '\u+08f': 'I', '\u+090':'D', '\u+09d': 'Y'} with open(self.infile_name) as f: self.inbuff = f.readlines() self.process_input() def process_input(self): linecount = 1 for line in self.inbuff: for key, value in self.ms_no_points.items(): if key in line: pos = line.index(key) print('found {} at pos: {} in line {}'.format(key, pos, linecount)) linecount += 1 if __name__ == '__main__': ifile = 'er.sql' ofile = 'erNew.sql' Utf8stuff(infile_name=ifile, outfile_name=ofile)traceback: And so how's it done?
RE: Unicode character search - snippsat - Apr-25-2017 \u is an Unicode escape in Python 3.Turn around(/) or raw string. >>> s = '\u' Traceback (most recent call last): File "python", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape >>> s = '/u' >>> s '/u' >>> s = r'\u' >>> s '\\u' RE: Unicode character search - Larz60+ - Apr-25-2017 Thanks snippsat |