Python Forum
Using re to find only uppercase letters
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Using re to find only uppercase letters
#1
Hi,
Im trying to solve a problem using re module, and one of the requests is to find a string with the letters ATGC only in uppercase.
this is my code:
def isVCF(file):
    num_format = re.compile(r"^chr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*(?:\t[ATCG]){2}\t")
    with open(file, "r+") as my_file:
        for line in my_file:
            if not num_format.match(line):
                return False
        return True
and this is an example to a line:
Output:
ChrX 74226540 T t 50 .
The problem is that its matching the lowercase "t" aswell and I only want it to find uppercase letters.
I've tried several things but none worked.
Appreciate any kind of help!
Reply
#2
What is exact meaning of 'find a string with the letters ATGC only in uppercase'. Does it mean 'determine whether line contains word constructed only from letters ATGC in any combination'? Or same applied to the whole file? And finally - do you have to use re? Line #2 in your code reminds me one old programmering joke: regex in plural is regrets.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#3
Is there something that isn't shown? This shouldn't have matched: ChrX (your regex only looks for lowercase "chr")
I'm assuming that's the same reason the lowercase "t" was matched.
Reply
#4
(May-27-2021, 06:53 PM)perfringo Wrote: What is exact meaning of 'find a string with the letters ATGC only in uppercase'. Does it mean 'determine whether line contains word constructed only from letters ATGC in any combination'? Or same applied to the whole file? And finally - do you have to use re? Line #2 in your code reminds me one old programmering joke: regex in plural is regrets.

It means that one of the letters(ATGC - just one) appears in columns 4 and 5
and yes, sadly I have to use regex
Reply
#5
(May-27-2021, 09:21 PM)nilamo Wrote: Is there something that isn't shown? This shouldn't have matched: ChrX (your regex only looks for lowercase "chr")
I'm assuming that's the same reason the lowercase "t" was matched.

You are right - its actually like that:
(r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]{2}
Reply
#6
Something still seems off, as that regex won't match the string.
>>> import re
>>> test = 'ChrX        74226540        T       t       50      .'
>>> test
'ChrX\t74226540\tT\tt\t50\t.'
>>> print(test)
ChrX    74226540        T       t       50      .
>>> raw_regex = r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]{2}"
>>> regex = re.compile(raw_regex)
>>> regex.match(test)
>>> regex
re.compile('^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\\t0*[1-9][0-9]*\\t[^\\t]*\\t[ATGC]{2}')
ranbarr likes this post
Reply
#7
(May-28-2021, 06:58 PM)nilamo Wrote: Something still seems off, as that regex won't match the string.
>>> import re
>>> test = 'ChrX        74226540        T       t       50      .'
>>> test
'ChrX\t74226540\tT\tt\t50\t.'
>>> print(test)
ChrX    74226540        T       t       50      .
>>> raw_regex = r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]{2}"
>>> regex = re.compile(raw_regex)
>>> regex.match(test)
>>> regex
re.compile('^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\\t0*[1-9][0-9]*\\t[^\\t]*\\t[ATGC]{2}')

I kinda figured it out.. for some reason when I use the {2} its case insensitive so I just seperated it to do it twice:
def isVCF(file):
    num_format = re.compile(r"^[Cc]hr(?:0?[1-9]|[1-9][0-9]|[MXY])\t0*[1-9][0-9]*\t[^\t]*\t[ATGC]\t[ATGC]")
    with open(file, "r+") as my_file:
        for line in my_file:
            if line.startswith("#"):
                continue
            if num_format.match(line):
                return True
            else:
                return False
I used the if line.startwith to skip the headline
nilamo likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Uppercase problem MarcJuegos_YT 4 2,568 Aug-21-2020, 02:16 PM
Last Post: MarcJuegos_YT
  Check if string is uppercase or lowercase and eliminate Wolfpack2605 1 4,685 Jan-01-2018, 05:03 AM
Last Post: Mekire

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020