The number of occurrences of statistical characters - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: The number of occurrences of statistical characters (/thread-6843.html) |
The number of occurrences of statistical characters - RobertW - Dec-10-2017 who can help me to resolve the question,thanks # -*- coding:utf-8 -*- # python 3.x import re patter = [chr(i) for i in range(33,126)] with open("a.txt","r") as file: content = file.read() for i in patter: result = len(re.findall(r"[%s]" % i,content)) if result != 0: print("%s:%d" % (i, result))
RE: The number of occurrences of statistical characters - buran - Dec-10-2017 There are special characters that need to be escaped if you want to use them literally. The error comes at the \ which is RegEx own escape char. So the result is (when not escaped) is invalid pattern.import re, sre_constants patter = [chr(i) for i in range(33,126)] with open("a.txt","r") as file: content = file.read() for i in patter: try: result = len(re.findall(r"[%s]" % i,content)) except sre_constants.error: print('error with {}'.format(i)) change the for body like thistry: result = len(re.findall(r"[%s]" % i,content)) except sre_constants.error: result = len(re.findall(r"[\%s]" % i,content)) if result != 0: print("%s:%d" % (i, result))and it work. That said, note that you also need to escape chars like * , ? or . in order to search for them literally. I will leave this to you
RE: The number of occurrences of statistical characters - RobertW - Dec-10-2017 thanks a lot! U are right,I'll be carefull about escape chars like * ,? or.
RE: The number of occurrences of statistical characters - nilamo - Jan-18-2018 You can also use the re.escape method, to escape the strings before building a regex with it. Before: >>> import re >>> [re.compile(r"[{0}]".format(chr(i))) for i in range(33, 126)] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 1, in <listcomp> File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 224, in compile return _compile(pattern, flags) File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile p = sre_compile.compile(pattern, flags) File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compile p = sre_parse.parse(p, flags) File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse p = _parse_sub(source, pattern, 0) File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_sub itemsappend(_parse(source, state)) File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 545, in _parse source.tell() - here) sre_constants.error: unterminated character set at position 0After: >>> [re.compile(r"[{0}]".format(re.escape(chr(i)))) for i in range(33, 126)] [re.compile('[\\!]'), re.compile('[\\"]'), re.compile('[\\#]'), re.compile('[\\$]'), re.compile('[\\%]'), re.compile('[\\&]'), re.compile("[\\']"), re.compile('[\\(]'), re.compile('[\\)]'), re.compile('[\\*]'), re.compile('[\\+]'), re.compile('[\\,]'), #snipped |