Checking if string starts the same but end differently using re module

ranbarr · May-20-2021, 02:21 PM

Hi!
Im learning the re module and having a difficulties with the writing of it so Ill be happy to have some explanations.
I have a file with a lot of columns, and I try to find if its first column starts the same at all the rows but end with different range of numbers.
Example of the columns:

chr10
chr10
chr10
chr11
chr11
chr12
chr15
chr17
chr18
chrX
chrX
chr10
chr10
chr10
chr11
chr11
chr12
chr15
chr17
chr18
chrX
chrX
chr10
chr10
chr10
chr11
chr11
chr12
chr15
chr17
chr18
chrX
chrX
chr10
chr10
chr10
chr11
chr11
chr12
chr15
chr17
chr18
chrX
chrX

what I'm trying to do is to find if all the strings start with "chr" and ends with the range of 1-99 or "X" or "Y" or "M".

so I tried to do something like that:

with open("vcf1.vcf", "r+") as my_file:
        lines = my_file.readlines()
        for line in lines:
            columns = line.split("\t")
            if re.match(r"^chr([1-9][0-9]|[X]|[Y]|[M]?)$", columns[0]):
                print(True)

which return as output:

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

thats the output I should get but I dont feel like I did in the right way and its only considering the chr and not the rest.
appreciate any kind of help!

**Gribouillis** · (This post was last modified: May-20-2021, 06:23 PM by Gribouillis.)

I don't see why M is optional. I would use

r"^chr(?:[1-9][0-9]|[XYM])$"

Note that this matches only the range 10-99, not 1-99. If you want 1-99, you can use

r"^chr(?:[1-9][0-9]?|[XYM])$"

You could write a few unit tests for the regex.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Finding how many times substring is in a string using re module	ranbarr	4	3,943	May-21-2021, 06:14 PM Last Post: nilamo
	Inflow watertank before outward flow starts	orjstrand	6	6,594	May-02-2018, 11:31 AM Last Post: j.crater

Checking if string starts the same but end differently using re module

User Panel Messages

Announcements