May-31-2021, 03:19 PM
(May-28-2021, 06:58 PM)nilamo Wrote: Something still seems off, as that regex won't match the string.>>> import re >>> test = 'ChrX 74226540 T t 50 .' >>> test 'ChrX\t74226540\tT\tt\t50\t.' >>> print(test) ChrX 74226540 T t 50 . >>> raw_regex = r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]{2}" >>> regex = re.compile(raw_regex) >>> regex.match(test) >>> regex re.compile('^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\\t0*[1-9][0-9]*\\t[^\\t]*\\t[ATGC]{2}')
I kinda figured it out.. for some reason when I use the {2} its case insensitive so I just seperated it to do it twice:
def isVCF(file): num_format = re.compile(r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]\t[ATGC]") with open(file, "r+") as my_file: for line in my_file: if line.startswith("#"): continue if num_format.match(line): return True else: return FalseI used the if line.startwith to skip the headline