List/String seperation issue - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: List/String seperation issue (/thread-21216.html) Pages:
1
2
|
List/String seperation issue - YoungGrassHopper - Sep-19-2019 Hey guys, Hope you can shed some light on this matter. In this task I need to create a function that displays the amino acid type that corresponds to a codon. for example if the user inputs a DNA sequence of AAABBBCCCDD The function needs to be able to deal with a length not divisible by 3, in other words any random input AAA is one codon and will represent one type of amino acid BBB "" "" CCC "" "" DD is not a full codon and needs to be removed from the sequence What I am trying to do now is to get the input / or argument given to the called function into workable blocks of 3. for example if the input is AAABBBCC I want to get it to: DNA1 = "AAA" DNA2 = "BBB" CC needs to get cut off. Below is my idea to get the odd tail that's not workable cut off and it seems to work if input is AAABBBC but not if the input is AAABBBCC , I will need to fix that too but how do I code it so it splits any random size sequence in workable chunks and join them again from list to string? also I struggled to join it again from ['A','A','A'] to "AAA" my Idea as is can obvious not deal with a random size input which is a problem dna = "AAABBBCCCDD" dna = list(dna) DNA = len(dna) print(DNA) if DNA % 3 != 0: del dna[-1] print(dna) DNA1 = dna[0:3] DNA2 = dna[3:6] print(DNA1) print(DNA2) ''.join(DNA2) print(DNA2)Any help would be appreciate stacks RE: List/String seperation issue - ichabod801 - Sep-19-2019 You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do dna = dna[-(len(dna) % 3):]
RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019 (Sep-19-2019, 01:58 PM)ichabod801 Wrote: You are not accounting for the case wher e DNA % 3 == 2. In that case you need to delete two items. The modulus operator gives you how many items you need to delete, so you can just do if (DNA % 3 == 1 ): del dna[-1] print(dna) elif (DNA % 3 == 2): del dna[-2::] print(dna) Yes I just fixed that but I like your idea much more seems way more efficient RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019 But getting the tail cut off is the easy part , what I cannot seem to figure out is how to get the AAABBBCCC in workable chunks of 3 like: dna1 = AAA dna2 = BBB if you don't know the length of the sequence that will be imputed in the function RE: List/String seperation issue - YoungGrassHopper - Sep-19-2019 I found a solution, I will set this thread to solved, thanks ichabod801 for your input dna = AAABBBCCCDDD x=3 chunks = [dna[y-x:y] for y in range(x, len(dna)+x,x)] print(chunks) RE: List/String seperation issue - perfringo - Sep-20-2019 (Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence This code will not work according to requirement set above. You should add check about length: chunks = [dna[y-x:y] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019 [attachment=714] (Sep-20-2019, 06:33 AM)perfringo Wrote:(Sep-19-2019, 01:56 PM)YoungGrassHopper Wrote: DD is not a full codon and needs to be removed from the sequence Hi perfringo, yes sorry I did not include that part on the post, that part is sorted though. I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']] I need to get them like this : [['AAA'],['BBB'],['CCC']] I tried ''.join(chunks) but to luck And Then I have a for loop issue. Basically I want to have the loop run through these chucks above and check if they match any of the Amino acids I am checking for. My whole approach can be wrong, I realise that. Basically I need to work it all into a function. the argument that gets passed through the function is the dna sequence of arbitrary length, the return needs to be which amino acids the codons (AAA) ; BBB represents, My code below has the actual codons in it so it might look confusing, but I attach pic of the amino acids I need to cover. Mayb if you feel like helping out a youngGrassHopper can tell me where I am going wrong with this? dna = input("Enter DNA sequence: ") dna = list(dna) DNA = len(dna) if (DNA % 3 == 1 ): del dna[-1] elif (DNA % 3 == 2): del dna[-2::] print(dna) C=3 chunks = [dna[y-C:y] for y in range(C, len(dna)+C,C)] print(chunks) Isoleucine = 0 Leucine = 0 Valine = 0 Phenylalanine = 0 Methionine = 0 for i in chunks: if (i == "ATT")or(i == "ATC")or(i == "ATA"): Isoleucine += 1 elif (i == "CTT")or(i == "CTC")or(i == "CTA")or(i == "CTG")or(i == "TTA")or(i == "TTG"): Leucine += 1 elif (i == "GTT")or(i == "GTC")or(i == "GTA")or(i == "GTG"): Valine =+ 1 elif (i == "TGT")or(i == "TGC"): Phenylalanine += 1 elif (i == "ATG"): Methionine += 1 if Isoleucine == 1: print("ATT ; ATC ; ATA - represents: Isoleucine") elif Leucine == 1: print("CTT ; CTC ; CTA ; CTG ; TTA ; TTG - represents: Leucine") elif Valine == 1: print("GTT ; GTC ; GTA ; GTG - represents: Valine") elif Phenylalanine == 1: print("TGT ; TGC - represents: Phenylalanine") elif Methionine == 1: print("ATG - represents: Methionine") else: print("Codon represents: Amino Acid X")In this task I only need to cover the 5 amino acids above, any other codon I can say its represents amino acid X RE: List/String seperation issue - perfringo - Sep-20-2019 (Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']] My personal feeling is that you should clearly articulate (in spoken language) what you want achieve and then start to implement steps to reach your goal. Do you really need this structure? Not that this is particularly hard to implement (you need to add pair of squared brackets) but why you need it? chunks = [[dna[y-x:y]] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3] RE: List/String seperation issue - YoungGrassHopper - Sep-20-2019 (Sep-20-2019, 08:05 AM)perfringo Wrote:(Sep-20-2019, 07:57 AM)YoungGrassHopper Wrote: I am struggling with getting the chunks joined in list form after they get divided in workable chunks of [['A','A','A'],['B','B','B'],['C','C','C']] Well I know there is probably a easier way to achieve it, but this is the best way I could come up with, ive only been coding for 2 weeks now so that's quite likely the reason I am not finding the most efficient way of going about it. What I need to achieve: DNA Input: ATTATTATT Output: III (representing: Isoleucine, Isoleucine, Isoleucine ) I need to take in a arbitrary length dna sequence from user . So I can get AAAB or AAABBBCC etc. I need to chop it in workable chunks of 3 I need to check if the codons in the sequence given by the user, matches any of the amino acids I need to check for that is why I am trying a for loop. The elif statements below the for loop then "checks" the counter and displays if a codon matched any of the aminos I am checking for. But like I said, I know my approach can be very inefficient and flawed RE: List/String seperation issue - perfringo - Sep-20-2019 It's unclear where III comes from. But maybe this can help: (1) Create datastructure to hold sequences: amino_acids = {**dict.fromkeys(['ATT', 'ATC', 'ATA'], 'Isoleucine'), **dict.fromkeys(['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG'], 'Leucine'), **dict.fromkeys(['GTT', 'GTC', 'GTA', 'GTG'], 'Valine'), **dict.fromkeys(['TTT', 'TTC'], 'Phenylalanine'), **dict.fromkeys(['ATG'], 'Methionine')}amino_acids is dictionary where DNA codon is key and value is corresponding amino acid: # amino_acid {'ATT': 'Isoleucine', 'ATC': 'Isoleucine', 'ATA': 'Isoleucine', 'CTT': 'Leucine', 'CTC': 'Leucine', 'CTA': 'Leucine', 'CTG': 'Leucine', 'TTA': 'Leucine', 'TTG': 'Leucine', 'GTT': 'Valine', 'GTC': 'Valine', 'GTA': 'Valine', 'GTG': 'Valine', 'TTT': 'Phenylalanine', 'TTC': 'Phenylalanine', 'ATG': 'Methionine'} (2) Chunk DNA sequence: x = 3 dna = 'ATTCTTTTCATGCTCCTGTTACTAAA' chunks = [dna[y-x:y] for y in range(x, len(dna)+x, x) if len(dna[y-x:y]) == 3]Which nicely chops off trailing AA-s: ['ATT', 'CTT', 'TTC', 'ATG', 'CTC', 'CTG', 'TTA', 'CTA'](3) Iterate over chunks and replace sequence with amino acid: >>> [amino_acids[sequence] for sequence in chunks] ['Isoleucine', 'Leucine', 'Phenylalanine', 'Methionine', 'Leucine', 'Leucine', 'Leucine', 'Leucine'] |