Python Forum

Full Version: The number of occurrences of statistical characters
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
who can help me to resolve the question,thanks  Razz

# -*- coding:utf-8 -*-
# python 3.x
import re

patter = [chr(i) for i in range(33,126)]

with open("a.txt","r") as file:
    content = file.read()
    for i in patter:
        result = len(re.findall(r"[%s]" % i,content))
        if result != 0:
            print("%s:%d" % (i, result))
Error:
Traceback (most recent call last):   File "D:\robot\desk\Script\login_Discuz.py", line 13, in <module>     result = len(re.findall(r"[%s]" % i,content))   File "D:\Python\Python36-32\lib\re.py", line 222, in findall     return _compile(pattern, flags).findall(string)   File "D:\Python\Python36-32\lib\re.py", line 301, in _compile     p = sre_compile.compile(pattern, flags)   File "D:\Python\Python36-32\lib\sre_compile.py", line 562, in compile     p = sre_parse.parse(p, flags)   File "D:\Python\Python36-32\lib\sre_parse.py", line 855, in parse     p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)   File "D:\Python\Python36-32\lib\sre_parse.py", line 416, in _parse_sub     not nested and not items))   File "D:\Python\Python36-32\lib\sre_parse.py", line 523, in _parse     source.tell() - here) sre_constants.error: unterminated character set at position 0
There are special characters that need to be escaped if you want to use them literally. The error comes at the \ which is RegEx own escape char. So the result is (when not escaped) is invalid pattern.

import re, sre_constants
 
patter = [chr(i) for i in range(33,126)]
 
with open("a.txt","r") as file:
    content = file.read()
    for i in patter:
        try:
            result = len(re.findall(r"[%s]" % i,content))
        except sre_constants.error:
            print('error with {}'.format(i))
Output:
error with \ error with ^
change the for body like this
        try:
            result = len(re.findall(r"[%s]" % i,content))
        except sre_constants.error:
            result = len(re.findall(r"[\%s]" % i,content))
        if result != 0:
            print("%s:%d" % (i, result))  
and it work.

That said, note that you also need to escape chars like *, ? or . in order to search for them literally. I will leave this to you
thanks a lot! U are right,I'll be carefull about escape chars like*,?or.
You can also use the re.escape method, to escape the strings before building a regex with it.

Before:
>>> import re
>>> [re.compile(r"[{0}]".format(chr(i))) for i in range(33, 126)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 224, in compile
    return _compile(pattern, flags)
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\re.py", line 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_compile.py", line 536, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 829, in parse
    p = _parse_sub(source, pattern, 0)
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "C:\Users\_\AppData\Local\Programs\Python\Python35-32\lib\sre_parse.py", line 545, in _parse
    source.tell() - here)
sre_constants.error: unterminated character set at position 0
After:
>>> [re.compile(r"[{0}]".format(re.escape(chr(i)))) for i in range(33, 126)]
[re.compile('[\\!]'), re.compile('[\\"]'), re.compile('[\\#]'), re.compile('[\\$]'), re.compile('[\\%]'), re.compile('[\\&]'), re.compile("[\\']"), re.compile('[\\(]'), re.compile('[\\)]'), re.compile('[\\*]'), re.compile('[\\+]'), re.compile('[\\,]'), 
#snipped