sorry, I gave you an ftp url
I was looking for something larger, 2 items is not a good test.
Nevertheless, take a look at the following and see if it does what you'd like
save the file as SplitFasta.py
you can import this into any program like:
just once at start of script, instantiate the class with:
Here's a sample run (from command line):
I was looking for something larger, 2 items is not a good test.
Nevertheless, take a look at the following and see if it does what you'd like
save the file as SplitFasta.py
#!/usr/bin/python import sys class SplitFasta: def __init__(self): self.lines_out = [] self.header = None self.save_seq = '' def save_this(self, fo): fo.write('{}\n'.format(self.header)) fo.write('{}\n'.format(self.save_seq)) def split_file(self, in_filename, out_filename, min_seq_len): self.header = None seqlen = 0 self.lines_out = [] with open(in_filename, 'r') as fin, open(out_filename, 'w') as fo: firstline = True for line in fin: line = line.strip() if line.startswith('>'): if firstline: firstline = False self.header = line else: if seqlen > min_seq_len: self.save_this(fo) self.save_seq = line self.lines_out.append(self.header) self.header = line seqlen = 0 else: self.save_seq = '{}{}\n'.format(self.save_seq, line) seqlen += len(line) if self.header is not None: if seqlen > min_seq_len: self.save_this(fo) def show_seq_out(self): print('The following sequences are in the output file:') plist = "\n".join(self.lines_out) print(plist) if __name__ == '__main__': sf = SplitFasta() numargs = len(sys.argv) print(f'numargs: {numargs}') if numargs > 1: infile = sys.argv[1] outfile = sys.argv[2] minlen = sys.argv[3] else: infile = 'Example.fa' outfile = 'NewExample.fa' minlen = 1000 sf.split_file(infile, outfile, int(minlen)) sf.show_seq_out()The bottom part 'if __name__ (etc.)' is for testing only, and woun be used if the module is imported into another program
you can import this into any program like:
import SplitFastaand then use this way
just once at start of script, instantiate the class with:
sf = SplitFasta.SplitFasta()Then to call, use:
sf.split_file(your_in_filename, your_out_filename, your_min_seq_len)if you want to show what was written, call:
sf.show_seq_out()of if you just want the list of output headers:
my_header_list = sf.lines_outIf this is what you are looking for, let me know, I'll be going to bed soon, but be back in about 4 - 5 hours.
Here's a sample run (from command line):
Output:λ python SplitFasta.py Example.fa NewExample.fa 1000
numargs: 4
The following sequences are in the output file:
>NODE_30_length_1090_cov_54656.2
and the output file (NewExample.fa:Output:>NODE_30_length_1090_cov_54656.2
TTATGGAATATCTATTAGAGCAAAAAAGAGATTTTACGCAATTAAAATTTAGCGATATAC
AGCAAATGAAATCAGCTTATAGCATAAGAATTTATAATATGCTACTTTGTGAATTAAAAC
AAAACAGACAAAATCTTAAAATAAATCTTTCAGTATTGCAAAATCTTTTAGAAGTTCCGA
AAAATTATGAAGAAAGATGGGCTGATTTTAATCGTTTTGTATTAAAACAAGCAGAAAAAG
ATATAAATAGCAAATCTAATTTAGTTTTATTAGATATTAAAACTTATAAAACAGGGCGTA
AAATAACAGACTTAGAGTTTATTTTTGATTATAAAAATAACGATAAGCGTATCGCACAGG
AAAAACTAAAAGAAGAAAATTTATTTAAAAAACTCAAAGAAATATTAAGTTCTTACATAG
GCAAATCAATTTATGATGATAGATTTGGCGAAATGATTATAAGTCATTACGAACATAATG
AAGAAAATAAAAAGATTTTAATTATCGCCCAGAGAAAAAGCGATGATAAATTTGTTTGCT
TTGGTGTTAAAAACTTCAAAGATATTAAAAGTTTAGAAAAGCTAAAAGATAAAGCAGAAG
AGTTGTTTTATTTAGATAAACAAAGAGTTTTAAAAGCAAAAGAAGCTCAAAAATATAGAA
ATCTTTTTAATTGATTGTATTTTAAAAATTATAAAAATAAAAGAGATATTAAAAGGCTTG
ATTGATAAAAATAATTCTTAAGCTCTAATATCTATGCTTTTTTGTGTAGAATTTAAAGAA
AGAATTTTATTAAATTCCCCTGTATTATCATCGCTAAATTTCATACCAAAAAGAATTTCT
AGCTCATCGCTTGTGCCAAATTTATTTTCCAGTAGCTTTTTTAAAAGCTCATTCATTTTA
TTATCATCTTTATAGGTTTCGCTTTTACTTTCTGCTTGTATAGGTTTAAAAGGCTTTTTT
TTGTCTTCTTCTGAAGTTTCTTTGTTATTTGTATTTTTTAAAGGATTGCTATAATCTACA
CCTTTTGCCTTTTCTGCTTCTTCTAGTGATTTTACAAACCCATCGTGTCTTTGTTTAAAA
TCAAGATATT