Python Forum
Lists - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Homework (https://python-forum.io/forum-9.html)
+--- Thread: Lists (/thread-31197.html)

Pages: 1 2


Lists - Tink - Nov-27-2020

I think I am going about this all wrong if anyone can help.

Write a Python function named 'filter_seq' in the following code cell that takes a list of DNA sequences as an argument and returns a list containing only those sequences that pass the following two criteria:

The sequence contains only the nucleotide letters A, C, G or T, or their lowercase equivalents, and no ambiguous nucleotides (N or n).
The sequence must be exactly 72 nucleotides long.

In addition:

Your function must accept DNA sequences in the argument to be in lowercase, UPPERCASE or a mixture of both. All sequences that meet the criteria must be returned in UPPERCASE.
Your function must have a valid function docstring (any text is acceptable).

def filter_seq(dna_seqs):
    #your code here.

#attempt 1

 def filter_seq(dna_seqs):
    dna_seqs = (A,C,G,T,a,c,g,t)
    Maxnumber = [:71]
    uppercase_filter_seq = filter_seq.upper()
    print(maxnumber(uppercase_filter_seq))



RE: Lists - deanhystad - Nov-27-2020

Go read about function arguments.


RE: Lists - perfringo - Nov-27-2020

You must be able to spell out a plan what you want to do in order to achieve desired result.

Function input is list of DNA sequences. This is not stated but I assume that these DNA sequences are strings. So your first task:

- how to iterate over list and get items out of it for processing

After you have solved it you have items (DNA sequences) and you need to process them. Processing involves keeping only those items which meet following criterias:

- length is exactly 72
- contain only A, C, G or T (either in lower or uppercase)

If you processed DNA sequences according to those criterias you must construct new list from them and return it.

There is additional condition that input can be lower, upper or mixedcase.

So - start from beginning and solve your problem step by step.


RE: Lists - Larz60+ - Nov-27-2020

The following is a list on some DNA slicing and searching code that I wrote some time back.
It uses sequences from fasta, but you can adjust for your purposes.

Sorry I can't give the specific code as it's been some time since I looked at this, but it should be simple enough to find:


RE: Lists - Tink - Nov-28-2020

(Nov-27-2020, 06:02 PM)perfringo Wrote: You must be able to spell out a plan what you want to do in order to achieve desired result.

Function input is list of DNA sequences. This is not stated but I assume that these DNA sequences are strings. So your first task:

- how to iterate over list and get items out of it for processing

After you have solved it you have items (DNA sequences) and you need to process them. Processing involves keeping only those items which meet following criterias:

- length is exactly 72
- contain only A, C, G or T (either in lower or uppercase)

If you processed DNA sequences according to those criterias you must construct new list from them and return it.

There is additional condition that input can be lower, upper or mixedcase.

So - start from beginning and solve your problem step by step.

Okay so the fisrt part should look like this?

dna_seqs = ('A','C','G','T','a','c','g','t')
    for dna_seqs in filter_seq:
        print(dna_seq)
Also sadly the links aren't opening for me.


RE: Lists - jefsummers - Nov-28-2020

Please use Python tags (the blue and yellow icon above) when posting. It preserves formatting.

Use uppercase() which returns a capitalized string on your input DNA string.
You define the function with a parameter dna_seqs. You then reassign dna_seqs losing what was passed in. You then have a for loop that makes no sense - is filter_seq a function or is it a string?


RE: Lists - Tink - Nov-28-2020

(Nov-28-2020, 02:07 PM)jefsummers Wrote: Please use Python tags (the blue and yellow icon above) when posting. It preserves formatting.

Use uppercase() which returns a capitalized string on your input DNA string.
You define the function with a parameter dna_seqs. You then reassign dna_seqs losing what was passed in. You then have a for loop that makes no sense - is filter_seq a function or is it a string?

I don't really understand it enough. Back to the drawing board.

I thought dna_seqs = ('A','C','G','T','a','c','g','t') would just give out this part:
"The sequence contains only the nucleotide letters A, C, G or T, or their lowercase equivalents, and no ambiguous nucleotides" Don't really know my thinking with the loop.


RE: Lists - jefsummers - Nov-29-2020

Write out in English the series of steps you would use to solve the problem, as detailed as you can. Then write the code for each step
To get you started -
1. Define a function with one parameter, the DNA sequence
2. Define a string with the valid nucleotide bases (ATGC)
3. Define a string that will be your output from the function
4. Start your loop through the DNA sequence, one letter (nucleotide) at a time
5. If the nucleotide is in the list of valid bases, add it to the output string
6. Once the loop has completed, return the output string
7. Test it on a test DNA sequence

Note that you do not want to modify the original DNA string when going through the loop. Leads to lots of hard to find errors. Bad mojo.


RE: Lists - Tink - Dec-01-2020

(Nov-29-2020, 01:20 PM)jefsummers Wrote: Write out in English the series of steps you would use to solve the problem, as detailed as you can. Then write the code for each step
To get you started -
1. Define a function with one parameter, the DNA sequence
2. Define a string with the valid nucleotide bases (ATGC)
3. Define a string that will be your output from the function
4. Start your loop through the DNA sequence, one letter (nucleotide) at a time
5. If the nucleotide is in the list of valid bases, add it to the output string
6. Once the loop has completed, return the output string
7. Test it on a test DNA sequence

Note that you do not want to modify the original DNA string when going through the loop. Leads to lots of hard to find errors. Bad mojo.

This sort of works but not really.

def filter_seq(dna_seqs):
    for dna in dna_seqs:
        if len(dna) == 72 and 'N, n' not in dna:
            print(filter_seq(test_seqs_N))



RE: Lists - deanhystad - Dec-01-2020

That sorta doesn't work at all since it crashes with a maximum recursion depth exceeded error and if it didn't do that it would still print "None".

Your description of what you want to do needs work. It is too vague to identify a valid DNA sequence. According to your initial post a DNA sequence is valid if:
1. It is exactly 72 nucleotides long
2. It contains only the nucleotide letters A, C, G or T, or their lowercase equivalent
3. It contains no ambiguous nucleotides (N or n).
That is a clear and concise description and it can be made better. Constraint 2 makes step 3 superfluous, so there are only two tests to determine if a DNA sequence is valid. Your code must contain these two tests.

Since you are having so much trouble with this assignment you should start with a simpler assignment. I do this all the time, breaking difficult programming tasks into smaller, easier tasks. Instead of worrying about finding all valid DNA sequences, write a function that returns True if a sequence is valid, False if it is not. Here's a starting point:
def valid_dna_sequence(sequence):
    '''Return True if sequence is valid'''
    # Replace this with code that tests the sequence length
    if sequences is the wrong length:  # How to do this in Python?
        return False
    # Replace this with the code that test the nucleotides
    for nucleotide in sequence:
        if nucleotide is not valid:  # How to do this in Python?
            return False
    return True # Passed all the tests
Test your code with valid and invalid sequences to verify it works. Once you are confident the code works you can move on to the next step where you loop through a list of DNA sequences, building a list of valid sequences.