Python Forum
ounting the number of CGG in microsatelites
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
ounting the number of CGG in microsatelites
#1
so i have this task to find a number of repeats of CGG in a sequence that stored as a value in a dictionary (named "dict" below as an example). the number of repeats in a row should be 5 or higher. for an example: "CGGCGGCGGCGGCGG" and above. lets call this repeat: "tandem". once i find this kind of tandem, i will have to count how many "CGG"s there is for the particular tandem. here is a dictionary for that example.

dict={ind_1:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGGGCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT", ind_10:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT"}

for an example, in the value of the first key (ind_1), there is only 1 tandem (in Bold), because it consist of 1 repeated CGG that is 5 or higher. in this tandem, it should have 47 "CGG"s in that tandem. meaning, once i find a tandem that has 5 repeated CGGs in a row, i need to count the number of CGG's in that particular Tandem
i tried this code:

dict_results = {}
for key,value in dict.items():
    tandem = 0
    if value.count("CGGCGGCGGCGGCGGCGG"): 
        tandem = value.count("CGG")
    dict_results[key] = tandem
but for the first value (ind_1), it said i have 58 repeats. it counted all of the CGG's in the sequence, and not the onces in that particular tandem (that there's 47 of them).

my goal is to have 47 repeats after the iteration is done.

any chance of how i do it? thanks!

p.s. i have also tried a code with a threshold:

fragile_x_test_results = {}
for key,value in fragile_x_test.items():
    tandem = 0
    if value.count("CGG") > 5: 
        tandem = value.count("CGG")
    fragile_x_test_results[key] = tandem
but still no luck. i got 58 instead of 47 repeats of CGG

another P.S. could not figure how to indent in this format, so i've added [indent] separatly Big Grin
Reply
#2
I'd use a regex to match the target. Something like:

m = re.search(r"(?:CGG){5,}", datastring)
If it's found, then m.span() will give you the start and end. Subtract to get the length of the total match. Divide that length by the length of the pattern to get the number of repeats.
Reply
#3
58 for "ind_1" is right.

   

str.count can count better as you.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020