Posts: 12,030
Threads: 485
Joined: Sep 2016
Actually a few more hours, forgot I had a doctor's preventative maintenance appointment, and just got back.
Posts: 12,030
Threads: 485
Joined: Sep 2016
Jan-30-2018, 11:58 PM
(This post was last modified: Jan-30-2018, 11:59 PM by Larz60+.)
Here is the final program. It Sorts and Selects in one module.
It is written so that you can run it from the command line using:
1 |
python SelsectAndSortFasta infilename outfilename min_seq_size
|
Or, you can import it into another program and call from within similar to:
1 2 3 4 5 6 7 8 9 10 11 |
import SelectAndSortFasta
self .sasf = SelectAndSortFasta.SelectAndSortFasta()
sasf = SelectAndSortFasta.SelectAndSortFasta()
sasf.sort_fasta_file(infile, outfile, minlen)
|
Here's the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
from operator import itemgetter
import sys
class SelectAndSortFasta:
def __init__( self ):
self .headers = []
self .infile = None
self .outfile = None
self .minlen = 0
def ExtractHeader( self ):
seqlen = 0
this_header = []
firstseq = True
f = open ( self .infile, 'r' )
f.seek( 0 , 2 )
file_size = f.tell()
f.seek( 0 )
while True :
fptr = f.tell()
if fptr = = file_size:
break
line = None
line = f.readline()
line = line.strip()
if len (line) > 0 :
if line.startswith( '>' ):
if firstseq:
firstseq = False
else :
this_header.append(seqlen)
self .headers.append(this_header)
seqlen = 0
this_header = []
this_header.append(fptr)
this_header.append(line)
else :
seqlen + = len (line)
this_header.append(seqlen)
self .headers.append(this_header)
f.close()
self .headers.sort(key = itemgetter( 2 ), reverse = True )
self .show_header()
def show_header( self ):
for item in self .headers:
print ( f 'File ptr: {item[0]}, Size: {item[2]}, Header: {item[1]}' )
def write_outfile( self ):
with open ( self .infile) as f, open ( self .outfile, 'w' ) as fo:
for item in self .headers:
if item[ 2 ] < minlen:
continue
f.seek(item[ 0 ], 0 )
fo.write(f.readline())
while True :
buf = f.readline()
if buf.startswith( '>' ):
break
fo.write(buf)
def sort_fasta_file( self , infile, outfile, minlen):
self .infile = infile
self .outfile = outfile
self .minlen = minlen
self .ExtractHeader()
self .write_outfile()
if __name__ = = '__main__' :
srtf = SelectAndSortFasta()
numargs = len (sys.argv)
if numargs > 1 :
infile = sys.argv[ 1 ]
outfile = sys.argv[ 2 ]
minlen = sys.argv[ 3 ]
else :
infile = 'data/fasta/AINZ01/AINZ01.1.fsa_nt'
outfile = 'data/fasta/AINZ01/AINZ01sorted.1.fsa_nt'
minlen = 1000
srtf.sort_fasta_file(infile, outfile, minlen)
|
You may want to supress the printout on line 57 (self.show_header())
Posts: 13
Threads: 1
Joined: Jan 2018
Feb-01-2018, 10:26 AM
(This post was last modified: Feb-01-2018, 10:27 AM by berthenet.)
Thanks !
I get a similar error than before:
Error: File "SelectAndSortFasta.py", line 65
print(f'File ptr: {item[0]}, Size: {item[2]}, Header: {item[1]}')
^
SyntaxError: invalid syntax
I am assuming it's again because of my version of python. I'll ask the admin from the cluster I work on if they can install python 3 instead of the 2.6 I am working with at the moment, because it looks like I am going to get lots of these errors.
In the meantime, can you help me correcting the print syntax so that I can check if the script is doing what I wanted?
Thanks again!
Posts: 12,030
Threads: 485
Joined: Sep 2016
Feb-01-2018, 01:02 PM
(This post was last modified: Feb-01-2018, 01:02 PM by Larz60+.)
I'm sorry, I've got in the habit of using f-string because I really like it.
It was introduced in python 3.6
change to:
1 |
print ( 'File ptr: {}, Size: {}, Header: {}' . format (item[ 0 ], item[ 2 ], item[ 1 ]))
|
or install python 3.6.4
Python 2.6 is so old!
maintenance for 2.7 and below is scheduled to stop in less than 2 years.
F-string won't work with anything less than 3.6, but 3.7 is coming out soon, maybe the admin can wait for that,
It's going to have some fantastic async additions.
Posts: 13
Threads: 1
Joined: Jan 2018
I contacter the admin, to see if I can get access to a more recent version of python, but I don't know how fast they are to respond.
I fixed this line now, so it's running.
However my output file is created, but is completely empty. I have this on the screen:
Output: $ python SelectAndSortFasta.py 2013_1056H.contigs.fa 2013_1056HTestOutput 1000
File ptr: 0, Size: 550068, Header: >NODE_1_length_550068_cov_488.448
File ptr: 559270, Size: 190085, Header: >NODE_2_length_190085_cov_532.242
File ptr: 752558, Size: 153922, Header: >NODE_3_length_153922_cov_529.57
File ptr: 909079, Size: 114954, Header: >NODE_4_length_114954_cov_494.41
File ptr: 1025982, Size: 106564, Header: >NODE_5_length_106564_cov_477.255
File ptr: 1134357, Size: 103116, Header: >NODE_6_length_103116_cov_541.733
File ptr: 1239226, Size: 102698, Header: >NODE_7_length_102698_cov_564.458
I didn't copy and paste the whole thing, but it goes till the end of my original fasta, all the contigs including those less than 1000bp.
I am not sure what goes wrong, as I don't have an error showing.
Posts: 12,030
Threads: 485
Joined: Sep 2016
Feb-01-2018, 01:26 PM
(This post was last modified: Feb-01-2018, 01:26 PM by Larz60+.)
How are you running it?
I just ran it and it worked fine
Try to modify the input and output in the else part
of the code at the bottom:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
if __name__ = = '__main__' :
sf = SplitFasta()
numargs = len (sys.argv)
if numargs > 1 :
infile = sys.argv[ 1 ]
outfile = sys.argv[ 2 ]
minlen = sys.argv[ 3 ]
else :
infile = '2013_1056H.contigs.fa'
outfile = '2013_1056HTestOutput.fa'
minlen = 1000
sf.split_file(infile, outfile, int (minlen))
sf.show_seq_out()
|
Posts: 12,030
Threads: 485
Joined: Sep 2016
And run like:
1 |
python SelectAndSortFasta.py
|
without arguments. Perhaps there's an issue with command line arguments
Posts: 12,030
Threads: 485
Joined: Sep 2016
I'm going to have to call it quits soon. Getting pretty sleepy.
I'll wait a few minutes to see if you have success.
Posts: 13
Threads: 1
Joined: Jan 2018
Okay I've done the modifications in the infile and outfile part, and run it without arguments. It does work now.
So I think it is safe to assume the issue comes from the command line arguments.
Posts: 12,030
Threads: 485
Joined: Sep 2016
It worked like a charm with new file
|