Python Forum
List/String seperation issue
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
List/String seperation issue
#11
(Sep-20-2019, 08:47 AM)perfringo Wrote: It's unclear where III comes from.

But maybe this can help:

(1) Create datastructure to hold sequences:

amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), 
               **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), 
               **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
               **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), 
               **dict.fromkeys(['ATG'], 'Methionine')}
amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:

# amino_acid
{'ATT': 'Isoleucine',
 'ATC': 'Isoleucine',
 'ATA': 'Isoleucine',
 'CTT': 'Leucine',
 'CTC': 'Leucine',
 'CTA': 'Leucine',
 'CTG': 'Leucine',
 'TTA': 'Leucine',
 'TTG': 'Leucine',
 'GTT': 'Valine',
 'GTC': 'Valine',
 'GTA': 'Valine',
 'GTG': 'Valine',
 'TTT': 'Phenylalanine',
 'TTC': 'Phenylalanine',
 'ATG': 'Methionine'}


(2) Chunk DNA sequence:

x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]  
         for y in range(x, len(dna)+x, x)  
         if len(dna[y-x:y]) == 3] 
Which nicely chops off trailing AA-s:

['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA']
(3) Iterate over chunks and replace sequence with amino acid:

>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
 'Leucine',
 'Phenylalanine',
 'Methionine',
 'Leucine',
 'Leucine',
 'Leucine',
 'Leucine']

My goodness perfringo , your skills astound me, you sorted this in 2 ticks , only problem is I will not be able to take credit for this work, I will not be able to sleep at night if I had to tell my mentor this was my creation haha. Amazing solution. I only recently learned about dictionaries, and I have never seen it used like that as in the very top section of the code. I will have to read up on it. the basic dictionary below that makes sense to me though.
Thanks for your superb support you are golden
Reply
#12
If you look at this code you probably notice that in order to explain the solution I created unnecessary list of chunks. We actually don't need to create this, we need the names of amino acids. So we can do lookup right away and we will have oneliner:

[amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] 
Not that it's pretty and easy to read but we don't create unnecessary list and save some memory.

Regarding finding solutions to coding problems: clear your mind, separate what from how and work out what technique suits you best.

One way to approach this problem:

"I have a string and i need to get amino acids names from that. I know that I need three letter chunks from that string. Ok, let's suppose that I already have these chunks, what would I do? I lookup somewhere what amino acid corresponds to chunk. Bingo! I know that in order to do lookup one needs dictionary. Let's build one where we can lookup amino acids by chunks. Now let's solve problems of chunking string and lookup...."
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#13
(Sep-20-2019, 11:17 AM)perfringo Wrote: If you look at this code you probably notice that in order to explain the solution I created unnecessary list of chunks. We actually don't need to create this, we need the names of amino acids. So we can do lookup right away and we will have oneliner:

[amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] 
Not that it's pretty and easy to read but we don't create unnecessary list and save some memory.

Regarding finding solutions to coding problems: clear your mind, separate what from how and work out what technique suits you best.

One way to approach this problem:

"I have a string and i need to get amino acids names from that. I know that I need three letter chunks from that string. Ok, let's suppose that I already have these chunks, what would I do? I lookup somewhere what amino acid corresponds to chunk. Bingo! I know that in order to do lookup one needs dictionary. Let's build one where we can lookup amino acids by chunks. Now let's solve problems of chunking string and lookup...."

Just WOW Big Grin I like the one liner its very, very elegant. I Agree totally with your advice as to how to go about finding a solution to the problem. very sensible . just remember that my toolbox is very sparse still, so even if I have a sensible idea of how to approach it, more often than not I have to research how to execute that steps. I am still a baby coder with only 2 weeks of experience. I will do my best to progress and expand my toolbox as quick and as much as I can. Thanks for all the help , explanations and support it means the world to me.

amino_acids = {'ATT': 'Isoleucine',
               'ATC': 'Isoleucine',
               'ATA': 'Isoleucine',
               'CTT': 'Leucine',
               'CTC': 'Leucine',
               'CTA': 'Leucine',
               'CTG': 'Leucine',
               'TTA': 'Leucine',
               'TTG': 'Leucine',
               'GTT': 'Valine',
               'GTC': 'Valine',
               'GTA': 'Valine',
               'GTG': 'Valine',
               'TTT': 'Phenylalanine',
               'TTC': 'Phenylalanine',
               'ATG': 'Methionine'}



def dnaTranslator(dna):
    x = 3
    aminos = [amino_acids[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3]             
    return print("Input DNA sequence represents the following amino acids:", aminos)
Reply
#14
If you repeat yourself too many times (as with writing 'Leucine' into amino acids dictionary) you must always think: Python is not typing machine, there must be a better way!

It's good practice to keep data and display of data separate. This way you can use function in places where you don't need to print it out. You should return only list of amino acids.

For printing there is possibility to unpack:

>>> my_list = ['spam', 'ham', 'eggs']    
>>> print(*my_list)                                                         
spam ham eggs
>>> print(*my_list, sep=', ')                                               
spam, ham, eggs
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  List Comprehension Issue johnywhy 5 506 Jan-14-2024, 07:58 AM
Last Post: Pedroski55
  Python List Issue Aggie64 5 1,607 Jun-30-2022, 09:15 PM
Last Post: Aggie64
  List to table issue robdineen 2 1,459 Nov-07-2021, 09:31 PM
Last Post: robdineen
  Last caracter of a string truncated issue when working from the end of the string Teknohead23 3 1,589 Oct-03-2021, 01:08 PM
Last Post: snippsat
  Calculator code issue using list kirt6405 4 2,253 Jun-11-2021, 10:13 PM
Last Post: topfox
  Issue accessing data from Dictionary/List in the right format LuisSatch 2 2,205 Jul-25-2020, 06:12 AM
Last Post: LuisSatch
  connection string issue racone 2 3,723 Feb-03-2020, 02:22 AM
Last Post: racone
  For List Loop Issue Galdain 2 2,045 Dec-31-2019, 04:53 AM
Last Post: Galdain
  Python C API - Issue with string as arugments JRHeisey 2 2,790 Nov-30-2019, 04:53 AM
Last Post: casevh
  IndexError: List index out of range issue Adem 1 3,521 Nov-01-2019, 10:47 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020