Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 List/String seperation issue
#1
Hey guys,

Hope you can shed some light on this matter.
In this task I need to create a function that displays the amino acid type that corresponds to a codon. for example if the user inputs a DNA sequence of AAABBBCCCDD
The function needs to be able to deal with a length not divisible by 3, in other words any random input

AAA is one codon and will represent one type of amino acid
BBB "" ""
CCC "" ""

DD is not a full codon and needs to be removed from the sequence

What I am trying to do now is to get the input / or argument given to the called function into workable blocks of 3. for example if the input is AAABBBCC

I want to get it to:

DNA1 = "AAA"
DNA2 = "BBB"

CC needs to get cut off.

Below is my idea to get the odd tail that's not workable cut off and it seems to work if input is AAABBBC but not if the input is AAABBBCC , I will need to fix that too
but how do I code it so it splits any random size sequence in workable chunks and join them again from list to string?

also I struggled to join it again from ['A','A','A'] to "AAA"
my Idea as is can obvious not deal with a random size input which is a problem


dna = "AAABBBCCCDD"

dna = list(dna)
DNA = len(dna)
print(DNA)

if DNA % 3 != 0:
    del dna[-1]
    print(dna)

DNA1 = dna[0:3]
DNA2 = dna[3:6]
print(DNA1)
print(DNA2)
    
''.join(DNA2)
print(DNA2)


Any help would be appreciate stacks
Quote
#2
You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]
YoungGrassHopper likes this post
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures

Quote
#3
(Sep-19-2019, 01:58 PM)ichabod801 Wrote: You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]


if (DNA % 3 == 1 ): 
    del dna[-1]
    print(dna)
elif (DNA % 3 == 2):
    del dna[-2::]
    print(dna)





Yes I just fixed that but I like your idea much more seems way more efficient Big Grin
Quote
#4
But getting the tail cut off is the easy part , what I cannot seem to figure out is how to get the AAABBBCCC in workable chunks of 3 like:
dna1 = AAA
dna2 = BBB
if you don't know the length of the sequence that will be imputed in the function
Quote
#5
I found a solution, I will set this thread to solved, thanks ichabod801 for your input


dna = AAABBBCCCDDD
x=3 
chunks = [dna[y-x:y] for y in range(x, len(dna)+x,x)]
print(chunks)
Quote
#6
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:

chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]  
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Quote
#7
   
(Sep-20-2019, 06:33 AM)perfringo Wrote:
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence

This code will not work according to requirement set above. You should add check about length:

chunks = [dna[y-x:y] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]  


Hi perfringo, yes sorry I did not include that part on the post, that part is sorted though. I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

And Then I have a for loop issue. Basically I want to have the loop run through these chucks above and check if they match any of the Amino acids I am checking for. My whole approach can be wrong, I realise that. Basically I need to work it all into a function. the argument that gets passed through the function is the dna sequence of arbitrary length, the return needs to be which amino acids the codons (AAA) ; BBB represents, My code below has the actual codons in it so it might look confusing, but I attach pic of the amino acids I need to cover. Mayb if you feel like helping out a youngGrassHopper can tell me where I am going wrong with this?



dna = input("Enter DNA sequence: ")
dna = list(dna)
DNA = len(dna)

if (DNA % 3 == 1 ): 
    del dna[-1]
    
elif (DNA % 3 == 2):
    del dna[-2::]

print(dna)
    
C=3 
chunks = [dna[y-C:y] for y in range(C, len(dna)+C,C)]
                                                         
print(chunks)                        

Isoleucine = 0
Leucine = 0
Valine = 0
Phenylalanine = 0
Methionine = 0    
                    
for i in chunks:                                                    
    if (i == "ATT")or(i == "ATC")or(i == "ATA"):
        Isoleucine += 1

    elif (i == "CTT")or(i == "CTC")or(i == "CTA")or(i == "CTG")or(i == "TTA")or(i == "TTG"):
        Leucine += 1

    elif (i == "GTT")or(i == "GTC")or(i == "GTA")or(i == "GTG"):
        Valine =+ 1

    elif (i == "TGT")or(i == "TGC"):
        Phenylalanine += 1

    elif (i == "ATG"):
        Methionine += 1

                                                    
if Isoleucine == 1:
    print("ATT ; ATC ; ATA - represents: Isoleucine")

elif Leucine == 1:
        print("CTT ; CTC ; CTA ; CTG ; TTA ; TTG - represents: Leucine")
        
elif Valine == 1:
        print("GTT ; GTC ; GTA ; GTG - represents: Valine")

elif Phenylalanine == 1:
        print("TGT ; TGC - represents: Phenylalanine")

elif Methionine == 1:
        print("ATG - represents: Methionine")

else: print("Codon represents: Amino Acid X")         


In this task I only need to cover the 5 amino acids above, any other codon I can say its represents amino acid X
Quote
#8
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?

chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Quote
#9
(Sep-20-2019, 08:05 AM)perfringo Wrote:
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]

I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck

My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?

chunks = [[dna[y-x:y]] 
          for y in range(x, len(dna)+x, x) 
          if len(dna[y-x:y]) == 3]

Well I know there is probably a easier way to achieve it, but this is the best way I could come up with, ive only been coding for 2 weeks now so that's quite likely the reason I am not finding the most efficient way of going about it.

What I need to achieve:

DNA Input: ATTATTATT  
Output: III (representing: Isoleucine, Isoleucine, Isoleucine ) 


I need to take in a arbitrary length dna sequence from user . So I can get AAAB or AAABBBCC etc.

I need to chop it in workable chunks of 3

I need to check if the codons in the sequence given by the user, matches any of the amino acids I need to check for that is why I am trying a for loop.

The elif statements below the for loop then "checks" the counter and displays if a codon matched any of the aminos I am checking for.

But like I said, I know my approach can be very inefficient and flawed
Quote
#10
It's unclear where III comes from.

But maybe this can help:

(1) Create datastructure to hold sequences:

amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), 
               **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), 
               **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
               **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), 
               **dict.fromkeys(['ATG'], 'Methionine')}
amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:

# amino_acid
{'ATT': 'Isoleucine',
 'ATC': 'Isoleucine',
 'ATA': 'Isoleucine',
 'CTT': 'Leucine',
 'CTC': 'Leucine',
 'CTA': 'Leucine',
 'CTG': 'Leucine',
 'TTA': 'Leucine',
 'TTG': 'Leucine',
 'GTT': 'Valine',
 'GTC': 'Valine',
 'GTA': 'Valine',
 'GTG': 'Valine',
 'TTT': 'Phenylalanine',
 'TTC': 'Phenylalanine',
 'ATG': 'Methionine'}


(2) Chunk DNA sequence:

x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]  
         for y in range(x, len(dna)+x, x)  
         if len(dna[y-x:y]) == 3] 
Which nicely chops off trailing AA-s:

['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA']
(3) Iterate over chunks and replace sequence with amino acid:

>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
 'Leucine',
 'Phenylalanine',
 'Methionine',
 'Leucine',
 'Leucine',
 'Leucine',
 'Leucine']
YoungGrassHopper likes this post
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  IndexError: List index out of range issue Adem 1 193 Nov-01-2019, 10:47 PM
Last Post: ichabod801
  List Issue Batman 3 313 Jun-06-2019, 11:56 PM
Last Post: Batman
  Basic List Issue rogueakula 1 242 May-18-2019, 06:01 PM
Last Post: snippsat
  List slicing issue Irhcsa 3 394 Apr-26-2019, 09:16 PM
Last Post: nilamo
  I converted string to 'list', but it doesn't look like a list! mrapple2020 3 447 Apr-07-2019, 02:34 PM
Last Post: mrapple2020
  Mixed string,Integer input variable issue maderdash 2 474 Nov-06-2018, 09:46 AM
Last Post: snippsat
  issue with updating list every iteration of a loop ftrillaudp 2 462 Oct-29-2018, 03:23 AM
Last Post: ftrillaudp
  Create Alert if string from list appears on other list javalava 1 470 Sep-17-2018, 02:44 PM
Last Post: DeaD_EyE
  python list iter issue anna 6 1,145 Apr-09-2018, 06:53 AM
Last Post: anna
  List of pathlib.Paths Not Ordered As Same List of Same String Filenames QbLearningPython 20 3,995 Nov-16-2017, 04:47 PM
Last Post: QbLearningPython

Forum Jump:


Users browsing this thread: 1 Guest(s)