Posts: 44
Threads: 7
Joined: Sep 2019
Sep-19-2019, 01:56 PM
(This post was last modified: Sep-19-2019, 01:56 PM by YoungGrassHopper.)
Hey guys,
Hope you can shed some light on this matter.
In this task I need to create a function that displays the amino acid type that corresponds to a codon. for example if the user inputs a DNA sequence of AAABBBCCCDD
The function needs to be able to deal with a length not divisible by 3, in other words any random input
AAA is one codon and will represent one type of amino acid
BBB "" ""
CCC "" ""
DD is not a full codon and needs to be removed from the sequence
What I am trying to do now is to get the input / or argument given to the called function into workable blocks of 3. for example if the input is AAABBBCC
I want to get it to:
DNA1 = "AAA"
DNA2 = "BBB"
CC needs to get cut off.
Below is my idea to get the odd tail that's not workable cut off and it seems to work if input is AAABBBC but not if the input is AAABBBCC , I will need to fix that too
but how do I code it so it splits any random size sequence in workable chunks and join them again from list to string?
also I struggled to join it again from ['A','A','A'] to "AAA"
my Idea as is can obvious not deal with a random size input which is a problem
dna = "AAABBBCCCDD"
dna = list(dna)
DNA = len(dna)
print(DNA)
if DNA % 3 != 0:
del dna[-1]
print(dna)
DNA1 = dna[0:3]
DNA2 = dna[3:6]
print(DNA1)
print(DNA2)
''.join(DNA2)
print(DNA2) Any help would be appreciate stacks
Posts: 4,220
Threads: 97
Joined: Sep 2016
You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]
Posts: 44
Threads: 7
Joined: Sep 2019
(Sep-19-2019, 01:58 PM)ichabod801 Wrote: You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]
if (DNA % 3 == 1 ):
del dna[-1]
print(dna)
elif (DNA % 3 == 2):
del dna[-2::]
print(dna)
Yes I just fixed that but I like your idea much more seems way more efficient
Posts: 44
Threads: 7
Joined: Sep 2019
But getting the tail cut off is the easy part , what I cannot seem to figure out is how to get the AAABBBCCC in workable chunks of 3 like:
dna1 = AAA
dna2 = BBB
if you don't know the length of the sequence that will be imputed in the function
Posts: 44
Threads: 7
Joined: Sep 2019
Sep-19-2019, 05:21 PM
(This post was last modified: Sep-19-2019, 05:24 PM by YoungGrassHopper.)
I found a solution, I will set this thread to solved, thanks ichabod801 for your input
dna = AAABBBCCCDDD
x=3
chunks = [dna[y-x:y] for y in range(x, len(dna)+x,x)]
print(chunks)
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence
This code will not work according to requirement set above. You should add check about length:
chunks = [dna[y-x:y]
for y in range(x, len(dna)+x, x)
if len(dna[y-x:y]) == 3]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 44
Threads: 7
Joined: Sep 2019
(Sep-20-2019, 06:33 AM)perfringo Wrote: (Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence
This code will not work according to requirement set above. You should add check about length:
chunks = [dna[y-x:y]
for y in range(x, len(dna)+x, x)
if len(dna[y-x:y]) == 3]
Hi perfringo, yes sorry I did not include that part on the post, that part is sorted though. I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]
I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck
And Then I have a for loop issue. Basically I want to have the loop run through these chucks above and check if they match any of the Amino acids I am checking for. My whole approach can be wrong, I realise that. Basically I need to work it all into a function. the argument that gets passed through the function is the dna sequence of arbitrary length, the return needs to be which amino acids the codons (AAA) ; BBB represents, My code below has the actual codons in it so it might look confusing, but I attach pic of the amino acids I need to cover. Mayb if you feel like helping out a youngGrassHopper can tell me where I am going wrong with this?
dna = input("Enter DNA sequence: ")
dna = list(dna)
DNA = len(dna)
if (DNA % 3 == 1 ):
del dna[-1]
elif (DNA % 3 == 2):
del dna[-2::]
print(dna)
C=3
chunks = [dna[y-C:y] for y in range(C, len(dna)+C,C)]
print(chunks)
Isoleucine = 0
Leucine = 0
Valine = 0
Phenylalanine = 0
Methionine = 0
for i in chunks:
if (i == "ATT")or(i == "ATC")or(i == "ATA"):
Isoleucine += 1
elif (i == "CTT")or(i == "CTC")or(i == "CTA")or(i == "CTG")or(i == "TTA")or(i == "TTG"):
Leucine += 1
elif (i == "GTT")or(i == "GTC")or(i == "GTA")or(i == "GTG"):
Valine =+ 1
elif (i == "TGT")or(i == "TGC"):
Phenylalanine += 1
elif (i == "ATG"):
Methionine += 1
if Isoleucine == 1:
print("ATT ; ATC ; ATA - represents: Isoleucine")
elif Leucine == 1:
print("CTT ; CTC ; CTA ; CTG ; TTA ; TTG - represents: Leucine")
elif Valine == 1:
print("GTT ; GTC ; GTA ; GTG - represents: Valine")
elif Phenylalanine == 1:
print("TGT ; TGC - represents: Phenylalanine")
elif Methionine == 1:
print("ATG - represents: Methionine")
else: print("Codon represents: Amino Acid X") In this task I only need to cover the 5 amino acids above, any other codon I can say its represents amino acid X
Posts: 1,950
Threads: 8
Joined: Jun 2018
(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]
I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck
My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?
chunks = [[dna[y-x:y]]
for y in range(x, len(dna)+x, x)
if len(dna[y-x:y]) == 3]
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Posts: 44
Threads: 7
Joined: Sep 2019
(Sep-20-2019, 08:05 AM)perfringo Wrote: (Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']]
I need to get them like this : [['AAA'],['BBB'],['CCC']]
I tried ''.join(chunks) but to luck
My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it?
chunks = [[dna[y-x:y]]
for y in range(x, len(dna)+x, x)
if len(dna[y-x:y]) == 3]
Well I know there is probably a easier way to achieve it, but this is the best way I could come up with, ive only been coding for 2 weeks now so that's quite likely the reason I am not finding the most efficient way of going about it.
What I need to achieve:
DNA Input: ATTATTATT
Output: III (representing: Isoleucine, Isoleucine, Isoleucine )
I need to take in a arbitrary length dna sequence from user . So I can get AAAB or AAABBBCC etc.
I need to chop it in workable chunks of 3
I need to check if the codons in the sequence given by the user, matches any of the amino acids I need to check for that is why I am trying a for loop.
The elif statements below the for loop then "checks" the counter and displays if a codon matched any of the aminos I am checking for.
But like I said, I know my approach can be very inefficient and flawed
Posts: 1,950
Threads: 8
Joined: Jun 2018
Sep-20-2019, 08:47 AM
(This post was last modified: Sep-20-2019, 08:47 AM by perfringo.)
It's unclear where III comes from.
But maybe this can help:
(1) Create datastructure to hold sequences:
amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'),
**dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'),
**dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'),
**dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'),
**dict.fromkeys(['ATG'], 'Methionine')} amino_acids is dictionary where DNA codon is key and value is corresponding amino acid:
# amino_acid
{'ATT': 'Isoleucine',
'ATC': 'Isoleucine',
'ATA': 'Isoleucine',
'CTT': 'Leucine',
'CTC': 'Leucine',
'CTA': 'Leucine',
'CTG': 'Leucine',
'TTA': 'Leucine',
'TTG': 'Leucine',
'GTT': 'Valine',
'GTC': 'Valine',
'GTA': 'Valine',
'GTG': 'Valine',
'TTT': 'Phenylalanine',
'TTC': 'Phenylalanine',
'ATG': 'Methionine'}
(2) Chunk DNA sequence:
x = 3
dna = 'ATTCTTTTCATGCTCCTGTTACTAAA'
chunks = [dna[y-x:y]
for y in range(x, len(dna)+x, x)
if len(dna[y-x:y]) == 3] Which nicely chops off trailing AA-s:
['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA'] (3) Iterate over chunks and replace sequence with amino acid:
>>> [amino_acids[sequence] for sequence in chunks]
['Isoleucine',
'Leucine',
'Phenylalanine',
'Methionine',
'Leucine',
'Leucine',
'Leucine',
'Leucine']
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy
Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
|