Sep-20-2019, 09:25 AM
(Sep-20-2019, 08:47 AM)perfringo Wrote: It's unclear where III comes from.
But maybe this can help:
(1) Create datastructure to hold sequences:
amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'), **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), **dict.fromkeys(['ATG'], 'Methionine')}amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:
# amino_acid {'ATT': 'Isoleucine', 'ATC': 'Isoleucine', 'ATA': 'Isoleucine', 'CTT': 'Leucine', 'CTC': 'Leucine', 'CTA': 'Leucine', 'CTG': 'Leucine', 'TTA': 'Leucine', 'TTG': 'Leucine', 'GTT': 'Valine', 'GTC': 'Valine', 'GTA': 'Valine', 'GTG': 'Valine', 'TTT': 'Phenylalanine', 'TTC': 'Phenylalanine', 'ATG': 'Methionine'}
(2) Chunk DNA sequence:
x = 3 dna = 'ATTCTTTTCATGCTCCTGTTACTAAA' chunks = [dna[y-x:y] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3]Which nicely chops off trailing AA-s:
['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA'](3) Iterate over chunks and replace sequence with amino acid:
>>> [amino_acids[sequence] for sequence in chunks] ['Isoleucine', 'Leucine', 'Phenylalanine', 'Methionine', 'Leucine', 'Leucine', 'Leucine', 'Leucine']
My goodness perfringo , your skills astound me, you sorted this in 2 ticks , only problem is I will not be able to take credit for this work, I will not be able to sleep at night if I had to tell my mentor this was my creation haha. Amazing solution. I only recently learned about dictionaries, and I have never seen it used like that as in the very top section of the code. I will have to read up on it. the basic dictionary below that makes sense to me though.
Thanks for your superb support you are golden