Python Forum
List/String seperation issue - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: List/String seperation issue (/thread-21216.html)

Pages: 1 2


RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019

(Sep-20-2019, 08:47 AM)perfringo Wrote: It's unclear where III comes from.

But maybe this can help:

(1) Create datastructure to hold sequences:

amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), 
               **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), 
               **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
               **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), 
               **dict.fromkeys(['ATG'], 'Methionine')}
amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:

# amino_acid
{'ATT': 'Isoleucine',
 'ATC': 'Isoleucine',
 'ATA': 'Isoleucine',
 'CTT': 'Leucine',
 'CTC': 'Leucine',
 'CTA': 'Leucine',
 'CTG': 'Leucine',
 'TTA': 'Leucine',
 'TTG': 'Leucine',
 'GTT': 'Valine',
 'GTC': 'Valine',
 'GTA': 'Valine',
 'GTG': 'Valine',
 'TTT': 'Phenylalanine',
 'TTC': 'Phenylalanine',
 'ATG': 'Methionine'}


(2) Chunk DNA sequence:

x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]  
         for y in range(x, len(dna)+x, x)  
         if len(dna[y-x:y]) == 3] 
Which nicely chops off trailing AA-s:

['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA']
(3) Iterate over chunks and replace sequence with amino acid:

>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
 'Leucine',
 'Phenylalanine',
 'Methionine',
 'Leucine',
 'Leucine',
 'Leucine',
 'Leucine']

My goodness perfringo , your skills astound me, you sorted this in 2 ticks , only problem is I will not be able to take credit for this work, I will not be able to sleep at night if I had to tell my mentor this was my creation haha. Amazing solution. I only recently learned about dictionaries, and I have never seen it used like that as in the very top section of the code. I will have to read up on it. the basic dictionary below that makes sense to me though.
Thanks for your superb support you are golden


RE: List/String seperation issue - perfringo - Sep-20-2019

If you look at this code you probably notice that in order to explain the solution I created unnecessary list of chunks. We actually don't need to create this, we need the names of amino acids. So we can do lookup right away and we will have oneliner:

[amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] 
Not that it's pretty and easy to read but we don't create unnecessary list and save some memory.

Regarding finding solutions to coding problems: clear your mind, separate what from how and work out what technique suits you best.

One way to approach this problem:

"I have a string and i need to get amino acids names from that. I know that I need three letter chunks from that string. Ok, let's suppose that I already have these chunks, what would I do? I lookup somewhere what amino acid corresponds to chunk. Bingo! I know that in order to do lookup one needs dictionary. Let's build one where we can lookup amino acids by chunks. Now let's solve problems of chunking string and lookup...."


RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019

(Sep-20-2019, 11:17 AM)perfringo Wrote: If you look at this code you probably notice that in order to explain the solution I created unnecessary list of chunks. We actually don't need to create this, we need the names of amino acids. So we can do lookup right away and we will have oneliner:

[amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] 
Not that it's pretty and easy to read but we don't create unnecessary list and save some memory.

Regarding finding solutions to coding problems: clear your mind, separate what from how and work out what technique suits you best.

One way to approach this problem:

"I have a string and i need to get amino acids names from that. I know that I need three letter chunks from that string. Ok, let's suppose that I already have these chunks, what would I do? I lookup somewhere what amino acid corresponds to chunk. Bingo! I know that in order to do lookup one needs dictionary. Let's build one where we can lookup amino acids by chunks. Now let's solve problems of chunking string and lookup...."

Just WOW Big Grin I like the one liner its very, very elegant. I Agree totally with your advice as to how to go about finding a solution to the problem. very sensible . just remember that my toolbox is very sparse still, so even if I have a sensible idea of how to approach it, more often than not I have to research how to execute that steps. I am still a baby coder with only 2 weeks of experience. I will do my best to progress and expand my toolbox as quick and as much as I can. Thanks for all the help , explanations and support it means the world to me.

amino_acids = {'ATT': 'Isoleucine',
               'ATC': 'Isoleucine',
               'ATA': 'Isoleucine',
               'CTT': 'Leucine',
               'CTC': 'Leucine',
               'CTA': 'Leucine',
               'CTG': 'Leucine',
               'TTA': 'Leucine',
               'TTG': 'Leucine',
               'GTT': 'Valine',
               'GTC': 'Valine',
               'GTA': 'Valine',
               'GTG': 'Valine',
               'TTT': 'Phenylalanine',
               'TTC': 'Phenylalanine',
               'ATG': 'Methionine'}



def dnaTranslator(dna):
    x = 3
    aminos = [amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3]             
    return print("Input DNA sequence represents the following amino acids:", aminos)



RE: List/String seperation issue - perfringo - Sep-20-2019

If you repeat yourself too many times (as with writing 'Leucine' into amino acids dictionary) you must always think: Python is not typing machine, there must be a better way!

It's good practice to keep data and display of data separate. This way you can use function in places where you don't need to print it out. You should return only list of amino acids.

For printing there is possibility to unpack:

>>> my_list = ['spam', 'ham', 'eggs']    
>>> print(*my_list)                                                         
spam ham eggs
>>> print(*my_list, sep=', ')                                               
spam, ham, eggs