Python Forum
List/String seperation issue
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
List/String seperation issue
#1
Hey guys,

Hope you can shed some light on this matter.
In this task I need to create a function that displays the amino acid type that corresponds to a codon. for example if the user inputs a DNA sequence of AAABBBCCCDD
The function needs to be able to deal with a length not divisible by 3, in other words any random input

AAA is one codon and will represent one type of amino acid
BBB "" ""
CCC "" ""

DD is not a full codon and needs to be removed from the sequence

What I am trying to do now is to get the input / or argument given to the called function into workable blocks of 3. for example if the input is AAABBBCC

I want to get it to:

DNA1 = "AAA"
DNA2 = "BBB"

CC needs to get cut off.

Below is my idea to get the odd tail that's not workable cut off and it seems to work if input is AAABBBC but not if the input is AAABBBCC , I will need to fix that too
but how do I code it so it splits any random size sequence in workable chunks and join them again from list to string?

also I struggled to join it again from ['A','A','A'] to "AAA"
my Idea as is can obvious not deal with a random size input which is a problem


dna = "AAABBBCCCDD"

dna = list(dna)
DNA = len(dna)
print(DNA)

if DNA % 3 != 0:
    del dna[-1]
    print(dna)

DNA1 = dna[0:3]
DNA2 = dna[3:6]
print(DNA1)
print(DNA2)
    
''.join(DNA2)
print(DNA2)
Any help would be appreciate stacks
Reply
#2
You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
(Sep-19-2019, 01:58 PM)ichabod801 Wrote: You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]

if (DNA % 3 == 1 ): 
    del dna[-1]
    print(dna)
elif (DNA % 3 == 2):
    del dna[-2::]
    print(dna)



Yes I just fixed that but I like your idea much more seems way more efficient Big Grin
Reply
#4
But getting the tail cut off is the easy part , what I cannot seem to figure out is how to get the AAABBBCCC in workable chunks of 3 like:
dna1 = AAA
dna2 = BBB
if you don't know the length of the sequence that will be imputed in the function
Reply
#5
I found a solution, I will set this thread to solved, thanks ichabod801 for your input

dna = AAABBBCCCDDD
x=3 
chunks = [dna[y-x:y] for y in range(x, len(dna)+x,x)]
print(chunks)
Reply
#6
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:

chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]  
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#7
   
(Sep-20-2019, 06:33 AM)perfringo Wrote:
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:

chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]  


Hi perfringo, yes sorry I did not include that part on the post, that part is sorted though. I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

And Then I have a for loop issue. Basically I want to have the loop run through these chucks above and check if they match any of the Amino acids I am checking for. My whole approach can be wrong, I realise that. Basically I need to work it all into a function. the argument that gets passed through the function is the dna sequence of arbitrary length, the return needs to be which amino acids the codons (AAA) ; BBB represents, My code below has the actual codons in it so it might look confusing, but I attach pic of the amino acids I need to cover. Mayb if you feel like helping out a youngGrassHopper can tell me where I am going wrong with this?


dna = input("Enter DNA sequence: ")
dna = list(dna)
DNA = len(dna)

if (DNA % 3 == 1 ): 
    del dna[-1]
    
elif (DNA % 3 == 2):
    del dna[-2::]

print(dna)
    
C=3 
chunks = [dna[y-C:y] for y in range(C, len(dna)+C,C)]
                                                         
print(chunks)                        

Isoleucine = 0
Leucine = 0
Valine = 0
Phenylalanine = 0
Methionine = 0    
                    
for i in chunks:                                                    
    if (i == "ATT")or(i == "ATC")or(i == "ATA"):
        Isoleucine += 1

    elif (i == "CTT")or(i == "CTC")or(i == "CTA")or(i == "CTG")or(i == "TTA")or(i == "TTG"):
        Leucine += 1

    elif (i == "GTT")or(i == "GTC")or(i == "GTA")or(i == "GTG"):
        Valine =+ 1

    elif (i == "TGT")or(i == "TGC"):
        Phenylalanine += 1

    elif (i == "ATG"):
        Methionine += 1

                                                    
if Isoleucine == 1:
    print("ATT ; ATC ; ATA - represents: Isoleucine")

elif Leucine == 1:
        print("CTT ; CTC ; CTA ; CTG ; TTA ; TTG - represents: Leucine")
        
elif Valine == 1:
        print("GTT ; GTC ; GTA ; GTG - represents: Valine")

elif Phenylalanine == 1:
        print("TGT ; TGC - represents: Phenylalanine")

elif Methionine == 1:
        print("ATG - represents: Methionine")

else: print("Codon represents: Amino Acid X")         
In this task I only need to cover the 5 amino acids above, any other codon I can say its represents amino acid X
Reply
#8
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?

chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#9
(Sep-20-2019, 08:05 AM)perfringo Wrote:
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?

chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]

Well I know there is probably a easier way to achieve it, but this is the best way I could come up with, ive only been coding for 2 weeks now so that's quite likely the reason I am not finding the most efficient way of going about it.

What I need to achieve:

DNA Input: ATTATTATT  
Output: III (representing: Isoleucine, Isoleucine, Isoleucine ) 


I need to take in a arbitrary length dna sequence from user . So I can get AAAB or AAABBBCC etc.

I need to chop it in workable chunks of 3

I need to check if the codons in the sequence given by the user, matches any of the amino acids I need to check for that is why I am trying a for loop.

The elif statements below the for loop then "checks" the counter and displays if a codon matched any of the aminos I am checking for.

But like I said, I know my approach can be very inefficient and flawed
Reply
#10
It's unclear where III comes from.

But maybe this can help:

(1) Create datastructure to hold sequences:

amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), 
               **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), 
               **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
               **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), 
               **dict.fromkeys(['ATG'], 'Methionine')}
amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:

# amino_acid
{'ATT': 'Isoleucine',
 'ATC': 'Isoleucine',
 'ATA': 'Isoleucine',
 'CTT': 'Leucine',
 'CTC': 'Leucine',
 'CTA': 'Leucine',
 'CTG': 'Leucine',
 'TTA': 'Leucine',
 'TTG': 'Leucine',
 'GTT': 'Valine',
 'GTC': 'Valine',
 'GTA': 'Valine',
 'GTG': 'Valine',
 'TTT': 'Phenylalanine',
 'TTC': 'Phenylalanine',
 'ATG': 'Methionine'}


(2) Chunk DNA sequence:

x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]  
         for y in range(x, len(dna)+x, x)  
         if len(dna[y-x:y]) == 3] 
Which nicely chops off trailing AA-s:

['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA']
(3) Iterate over chunks and replace sequence with amino acid:

>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
 'Leucine',
 'Phenylalanine',
 'Methionine',
 'Leucine',
 'Leucine',
 'Leucine',
 'Leucine']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  List Comprehension Issue johnywhy 5 440 Jan-14-2024, 07:58 AM
Last Post: Pedroski55
  Python List Issue Aggie64 5 1,564 Jun-30-2022, 09:15 PM
Last Post: Aggie64
  List to table issue robdineen 2 1,435 Nov-07-2021, 09:31 PM
Last Post: robdineen
  Last caracter of a string truncated issue when working from the end of the string Teknohead23 3 1,560 Oct-03-2021, 01:08 PM
Last Post: snippsat
  Calculator code issue using list kirt6405 4 2,203 Jun-11-2021, 10:13 PM
Last Post: topfox
  Issue accessing data from Dictionary/List in the right format LuisSatch 2 2,169 Jul-25-2020, 06:12 AM
Last Post: LuisSatch
  connection string issue racone 2 3,670 Feb-03-2020, 02:22 AM
Last Post: racone
  For List Loop Issue Galdain 2 2,017 Dec-31-2019, 04:53 AM
Last Post: Galdain
  Python C API - Issue with string as arugments JRHeisey 2 2,746 Nov-30-2019, 04:53 AM
Last Post: casevh
  IndexError: List index out of range issue Adem 1 3,478 Nov-01-2019, 10:47 PM
Last Post: ichabod801

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020