List/String seperation issue - Printable Version

List/String seperation issue - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: List/String seperation issue (/thread-21216.html)

Pages: 1 2

List/String seperation issue - YoungGrassHopper - Sep-19-2019

Hey guys,

Hope you can shed some light on this matter.
In this task I need to create a function that displays the amino acid type that corresponds to a codon. for example if the user inputs a DNA sequence of AAABBBCCCDD
The function needs to be able to deal with a length not divisible by 3, in other words any random input

AAA is one codon and will represent one type of amino acid
BBB "" ""
CCC "" ""

DD is not a full codon and needs to be removed from the sequence

What I am trying to do now is to get the input / or argument given to the called function into workable blocks of 3. for example if the input is AAABBBCC

I want to get it to:

DNA1 = "AAA"
DNA2 = "BBB"

CC needs to get cut off.

Below is my idea to get the odd tail that's not workable cut off and it seems to work if input is AAABBBC but not if the input is AAABBBCC , I will need to fix that too
but how do I code it so it splits any random size sequence in workable chunks and join them again from list to string?

also I struggled to join it again from ['A','A','A'] to "AAA"
my Idea as is can obvious not deal with a random size input which is a problem

dna = "AAABBBCCCDD"

dna = list(dna)
DNA = len(dna)
print(DNA)

if DNA % 3 != 0:
    del dna[-1]
    print(dna)

DNA1 = dna[0:3]
DNA2 = dna[3:6]
print(DNA1)
print(DNA2)
    
''.join(DNA2)
print(DNA2)

Any help would be appreciate stacks

RE: List/String seperation issue - ichabod801 - Sep-19-2019

You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]

RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019

(Sep-19-2019, 01:58 PM)ichabod801 Wrote: You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]

if (DNA % 3 == 1 ): 
    del dna[-1]
    print(dna)
elif (DNA % 3 == 2):
    del dna[-2::]
    print(dna)

Yes I just fixed that but I like your idea much more seems way more efficient Big Grin

RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019

But getting the tail cut off is the easy part , what I cannot seem to figure out is how to get the AAABBBCCC in workable chunks of 3 like:
dna1 = AAA
dna2 = BBB
if you don't know the length of the sequence that will be imputed in the function

RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019

I found a solution, I will set this thread to solved, thanks ichabod801 for your input

dna = AAABBBCCCDDD
x=3 
chunks = [dna[y-x:y] for y in range(x, len(dna)+x,x)]
print(chunks)

RE: List/String seperation issue - perfringo - Sep-20-2019

(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:

chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]

RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019

[attachment=714]

(Sep-20-2019, 06:33 AM)perfringo Wrote:
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:
chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]  

Hi perfringo, yes sorry I did not include that part on the post, that part is sorted though. I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

And Then I have a for loop issue. Basically I want to have the loop run through these chucks above and check if they match any of the Amino acids I am checking for. My whole approach can be wrong, I realise that. Basically I need to work it all into a function. the argument that gets passed through the function is the dna sequence of arbitrary length, the return needs to be which amino acids the codons (AAA) ; BBB represents, My code below has the actual codons in it so it might look confusing, but I attach pic of the amino acids I need to cover. Mayb if you feel like helping out a youngGrassHopper can tell me where I am going wrong with this?

dna = input("Enter DNA sequence: ")
dna = list(dna)
DNA = len(dna)

if (DNA % 3 == 1 ): 
    del dna[-1]
    
elif (DNA % 3 == 2):
    del dna[-2::]

print(dna)
    
C=3 
chunks = [dna[y-C:y] for y in range(C, len(dna)+C,C)]
                                                         
print(chunks)                        

Isoleucine = 0
Leucine = 0
Valine = 0
Phenylalanine = 0
Methionine = 0    
                    
for i in chunks:                                                    
    if (i == "ATT")or(i == "ATC")or(i == "ATA"):
        Isoleucine += 1

    elif (i == "CTT")or(i == "CTC")or(i == "CTA")or(i == "CTG")or(i == "TTA")or(i == "TTG"):
        Leucine += 1

    elif (i == "GTT")or(i == "GTC")or(i == "GTA")or(i == "GTG"):
        Valine =+ 1

    elif (i == "TGT")or(i == "TGC"):
        Phenylalanine += 1

    elif (i == "ATG"):
        Methionine += 1

                                                    
if Isoleucine == 1:
    print("ATT ; ATC ; ATA - represents: Isoleucine")

elif Leucine == 1:
        print("CTT ; CTC ; CTA ; CTG ; TTA ; TTG - represents: Leucine")
        
elif Valine == 1:
        print("GTT ; GTC ; GTA ; GTG - represents: Valine")

elif Phenylalanine == 1:
        print("TGT ; TGC - represents: Phenylalanine")

elif Methionine == 1:
        print("ATG - represents: Methionine")

else: print("Codon represents: Amino Acid X")

In this task I only need to cover the 5 amino acids above, any other codon I can say its represents amino acid X

RE: List/String seperation issue - perfringo - Sep-20-2019

(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?

chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]

RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019

(Sep-20-2019, 08:05 AM)perfringo Wrote:
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?
chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]

Well I know there is probably a easier way to achieve it, but this is the best way I could come up with, ive only been coding for 2 weeks now so that's quite likely the reason I am not finding the most efficient way of going about it.

What I need to achieve:

DNA Input: ATTATTATT
Output: III (representing: Isoleucine, Isoleucine, Isoleucine )

I need to take in a arbitrary length dna sequence from user . So I can get AAAB or AAABBBCC etc.

I need to chop it in workable chunks of 3

I need to check if the codons in the sequence given by the user, matches any of the amino acids I need to check for that is why I am trying a for loop.

The elif statements below the for loop then "checks" the counter and displays if a codon matched any of the aminos I am checking for.

But like I said, I know my approach can be very inefficient and flawed

RE: List/String seperation issue - perfringo - Sep-20-2019

It's unclear where III comes from.

But maybe this can help:

(1) Create datastructure to hold sequences:

amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), 
               **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), 
               **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
               **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), 
               **dict.fromkeys(['ATG'], 'Methionine')}

amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:

# amino_acid
{'ATT': 'Isoleucine',
 'ATC': 'Isoleucine',
 'ATA': 'Isoleucine',
 'CTT': 'Leucine',
 'CTC': 'Leucine',
 'CTA': 'Leucine',
 'CTG': 'Leucine',
 'TTA': 'Leucine',
 'TTG': 'Leucine',
 'GTT': 'Valine',
 'GTC': 'Valine',
 'GTA': 'Valine',
 'GTG': 'Valine',
 'TTT': 'Phenylalanine',
 'TTC': 'Phenylalanine',
 'ATG': 'Methionine'}

(2) Chunk DNA sequence:

x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]  
         for y in range(x, len(dna)+x, x)  
         if len(dna[y-x:y]) == 3]

Which nicely chops off trailing AA-s:

['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA']

(3) Iterate over chunks and replace sequence with amino acid:

>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
 'Leucine',
 'Phenylalanine',
 'Methionine',
 'Leucine',
 'Leucine',
 'Leucine',
 'Leucine']