Python Forum

Full Version: How to link two python scripts
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2 3 4
Actually a few more hours, forgot I had a doctor's preventative maintenance appointment, and just got back.
Here is the final program. It Sorts and Selects in one module.

It is written so that you can run it from the command line using:
python SelsectAndSortFasta infilename outfilename min_seq_size
Or, you can import it into another program and call from within similar to:
# Add following import at top of your program
import SelectAndSortFasta

# In your initialization routine, add:
# If a class:
    self.sasf = SelectAndSortFasta.SelectAndSortFasta()
# If just a function:
    sasf = SelectAndSortFasta.SelectAndSortFasta()

# Then when you want to run a file
    sasf.sort_fasta_file(infile, outfile, minlen)
Here's the code:
from operator import itemgetter
import sys


class SelectAndSortFasta:
    """
    Sort in decreasing size order
    """
    def __init__(self):
        self.headers = []
        self.infile = None
        self.outfile = None
        self.minlen = 0

    def ExtractHeader(self):
        """
        Reads all headers, saves starting file position, header, and size in self.headers
        :return: None
        """
        seqlen = 0
        this_header = []
        firstseq = True

        f = open(self.infile, 'r')
        f.seek(0, 2)
        file_size = f.tell()
        f.seek(0)
        while True:
            fptr = f.tell()
            if fptr == file_size:
                break
            line = None
            line = f.readline()
            line = line.strip()

            # skip empty lines
            if len(line) > 0:
                # print(f'line len: {len(line)}')
                if line.startswith('>'):
                    if firstseq:
                        firstseq = False
                    else:
                        this_header.append(seqlen)
                        self.headers.append(this_header)

                    seqlen = 0
                    this_header = []
                    this_header.append(fptr)
                    this_header.append(line)
                else:
                    seqlen += len(line)
        this_header.append(seqlen)
        self.headers.append(this_header)
        f.close()
        # This is the sort routine, sorts on column 2 (size) of self.headers, in reverse order
        self.headers.sort(key=itemgetter(2), reverse=True)
        self.show_header()

    def show_header(self):
        """
        display self.headers list
        :return: None
        """
        for item in self.headers:
            print(f'File ptr: {item[0]}, Size: {item[2]}, Header: {item[1]}')

    def write_outfile(self):
        """
        Reads self.headers (which is sorted in reverse size order,
        seeks to that record in the inout file (from file pointer which i in column 0 of self.headers)
        and writes the output file
        :return:
        """
        with open(self.infile) as f, open(self.outfile, 'w') as fo:
            for item in self.headers:
                # Ignore entry if too small
                if item[2] < minlen:
                    continue
                f.seek(item[0], 0)
                fo.write(f.readline())
                while True:
                    buf = f.readline()
                    if buf.startswith('>'):
                        break
                    fo.write(buf)

    def sort_fasta_file(self, infile, outfile, minlen):
        """
        Launch pad for sort and select program
        :param infile: Name of input fasta file
        :param outfile: Name of output fasta file
        :param minlen: Minimun size of output record. (smaller inut records are ignored)
        :return: None
        """
        self.infile = infile
        self.outfile = outfile
        self.minlen = minlen
        self.ExtractHeader()
        self.write_outfile()


if __name__ == '__main__':
    # Test routine
    srtf = SelectAndSortFasta()
    numargs = len(sys.argv)
    if numargs > 1:
        infile = sys.argv[1]
        outfile = sys.argv[2]
        minlen = sys.argv[3]
    else:
        infile = 'data/fasta/AINZ01/AINZ01.1.fsa_nt'
        outfile = 'data/fasta/AINZ01/AINZ01sorted.1.fsa_nt'
        minlen = 1000

    srtf.sort_fasta_file(infile, outfile, minlen)
You may want to supress the printout on line 57 (self.show_header())
Thanks !

I get a similar error than before:

Error:
File "SelectAndSortFasta.py", line 65 print(f'File ptr: {item[0]}, Size: {item[2]}, Header: {item[1]}') ^ SyntaxError: invalid syntax
I am assuming it's again because of my version of python. I'll ask the admin from the cluster I work on if they can install python 3 instead of the 2.6 I am working with at the moment, because it looks like I am going to get lots of these errors.

In the meantime, can you help me correcting the print syntax so that I can check if the script is doing what I wanted?

Thanks again!
I'm sorry, I've got in the habit of using f-string because I really like it.
It was introduced in python 3.6
change to:
print('File ptr: {}, Size: {}, Header: {}'.format(item[0], item[2], item[1]))
or install python 3.6.4

Python 2.6 is so old!
maintenance for 2.7 and below is scheduled to stop in less than 2 years.
F-string won't work with anything less than 3.6, but 3.7 is coming out soon, maybe the admin can wait for that,
It's going to have some fantastic async additions.
I contacter the admin, to see if I can get access to a more recent version of python, but I don't know how fast they are to respond.

I fixed this line now, so it's running.

However my output file is created, but is completely empty. I have this on the screen:

Output:
$ python SelectAndSortFasta.py 2013_1056H.contigs.fa 2013_1056HTestOutput 1000 File ptr: 0, Size: 550068, Header: >NODE_1_length_550068_cov_488.448 File ptr: 559270, Size: 190085, Header: >NODE_2_length_190085_cov_532.242 File ptr: 752558, Size: 153922, Header: >NODE_3_length_153922_cov_529.57 File ptr: 909079, Size: 114954, Header: >NODE_4_length_114954_cov_494.41 File ptr: 1025982, Size: 106564, Header: >NODE_5_length_106564_cov_477.255 File ptr: 1134357, Size: 103116, Header: >NODE_6_length_103116_cov_541.733 File ptr: 1239226, Size: 102698, Header: >NODE_7_length_102698_cov_564.458
I didn't copy and paste the whole thing, but it goes till the end of my original fasta, all the contigs including those less than 1000bp.

I am not sure what goes wrong, as I don't have an error showing.
How are you running it?
I just ran it and it worked fine

Try to modify the input and output in the else part
of the code at the bottom:
if __name__ == '__main__':
    sf = SplitFasta()
    numargs = len(sys.argv)
    if numargs > 1:
        infile = sys.argv[1]
        outfile = sys.argv[2]
        minlen = sys.argv[3]
    else:
        infile = '2013_1056H.contigs.fa'
        outfile = '2013_1056HTestOutput.fa'
        minlen = 1000
    sf.split_file(infile, outfile, int(minlen))
    sf.show_seq_out()
And run like:
python SelectAndSortFasta.py
without arguments. Perhaps there's an issue with command line arguments
I'm going to have to call it quits soon. Getting pretty sleepy.
I'll wait a few minutes to see if you have success.
Okay I've done the modifications in the infile and outfile part, and run it without arguments. It does work now.

So I think it is safe to assume the issue comes from the command line arguments.
It worked like a charm with new file
Pages: 1 2 3 4