so i have this task to find a number of repeats of CGG in a sequence that stored as a value in a dictionary (named "dict" below as an example). the number of repeats in a row should be 5 or higher. for an example: "CGGCGGCGGCGGCGG" and above. lets call this repeat: "tandem". once i find this kind of tandem, i will have to count how many "CGG"s there is for the particular tandem. here is a dictionary for that example.
dict={ind_1:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGGGCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT", ind_10:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT"}
for an example, in the value of the first key (ind_1), there is only 1 tandem (in Bold), because it consist of 1 repeated CGG that is 5 or higher. in this tandem, it should have 47 "CGG"s in that tandem. meaning, once i find a tandem that has 5 repeated CGGs in a row, i need to count the number of CGG's in that particular Tandem
i tried this code:
my goal is to have 47 repeats after the iteration is done.
any chance of how i do it? thanks!
p.s. i have also tried a code with a threshold:
another P.S. could not figure how to indent in this format, so i've added [indent] separatly
dict={ind_1:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGGGCCTCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT", ind_10:"ACGGCGAGCGCGGGCGGCGGCGGTGACGGAGGCGCCCGTGCCAGGGGGCGTGCGGCAGCGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGGCGAGCGCCCGCAGCCCACCTCTCGGGGGCGGGCTCCCGGCGCTAGCAGGGCTGAAGAGAAGATGGAGGAGCTGGTGGTGGAAGTGCGGGGCTCCAATGGCGCTTTCTACAAGGTACTTGGCTCTAGGGCAGGCCCCATCTTCGCCCT"}
for an example, in the value of the first key (ind_1), there is only 1 tandem (in Bold), because it consist of 1 repeated CGG that is 5 or higher. in this tandem, it should have 47 "CGG"s in that tandem. meaning, once i find a tandem that has 5 repeated CGGs in a row, i need to count the number of CGG's in that particular Tandem
i tried this code:
dict_results = {} for key,value in dict.items(): tandem = 0 if value.count("CGGCGGCGGCGGCGGCGG"): tandem = value.count("CGG") dict_results[key] = tandembut for the first value (ind_1), it said i have 58 repeats. it counted all of the CGG's in the sequence, and not the onces in that particular tandem (that there's 47 of them).
my goal is to have 47 repeats after the iteration is done.
any chance of how i do it? thanks!
p.s. i have also tried a code with a threshold:
fragile_x_test_results = {} for key,value in fragile_x_test.items(): tandem = 0 if value.count("CGG") > 5: tandem = value.count("CGG") fragile_x_test_results[key] = tandembut still no luck. i got 58 instead of 47 repeats of CGG
another P.S. could not figure how to indent in this format, so i've added [indent] separatly
