Feb-01-2018, 01:48 PM
Great, also, the printout of the header is done on entire list.
The smaller ones are eliminated in the write routine
The smaller ones are eliminated in the write routine
How to link two python scripts
|
Feb-01-2018, 01:48 PM
Great, also, the printout of the header is done on entire list.
The smaller ones are eliminated in the write routine
Feb-01-2018, 01:52 PM
Well, I need some sleep.
Since I retired, I tend to work most of the night and sleep mornings. Bad habit, but I've spent my entire life driving myself this way. By the way, My cousin is Dr. Laurie Ozelius: http://www.massgeneral.org/XDP-center/Ab...elius.aspx She is credited with the discovery of the DYT1 (TOR1A), DYT6 (THAP1), DYT12 (ATP1A3), DYT25 (GNAL) and DYT4 (TUBB4A) dystonia genes.
Feb-02-2018, 09:16 AM
Ok thanks for the clarification about the printout.
I'm not sure why it's not working when I run it with arguments and it's working when I run it without. There's nothing wrong with working at night and sleep in the morning. Maybe you're just better this way. We all have different rythms. Cool for your cousin! I a more into bacterial genomics than human genes, so I haven't heard of her before, but it looks like very cool and usefull research she's into!
Feb-02-2018, 11:36 AM
There could be something that got messed up with arguments. I'll check that today and place a new version if necessary.
It was working, but I usually use the internal startup, so might have broken it without knowing (my early testing include checking running with arguments, but not the final (which may be my folly).
Feb-02-2018, 09:21 PM
Ok, first I'll admit I was lazy when reading args:
You'll get a new version soon. In the meantime, don't put filenames in quotes and it should work from command line
OK, this version uses argparse which I should have used in the first one.
now you can use filename with or without quotes. There is a new command line structure, it now uses flags, so arguments can be passed in any order, so long as they have the proper flags, you can type: python SelectAndSortFasta.py -hTo get help so a proper command line argument would look like:python SelectAndSortFasta.py -o data/fasta/AINZ01/AINZ01sorted.1.fsa_nt -s 1000 -i data/fasta/AINZ01/AINZ01.1.fsa_nt # or python SelectAndSortFasta.py -o 'data/fasta/AINZ01/AINZ01sorted.1.fsa_nt' -s 1000 -i 'data/fasta/AINZ01/AINZ01.1.fsa_nt'Assuming your data was in a sub-directory 'data/fasta/AINZ01' You can still import the module into an other program as explained previously. just make sure you use argument names like: #At top of program import SelectAndSortFasta # In Initialization: ssf = SelectAndSortFasta.SelectAndSortFasta() # or if in calss: self.ssf = SelectAndSortFasta.SelectAndSortFasta() # Then when you want to convert a file: ssf.split_file(self, in_filename='YourInfileName', out_filename='YourOutfileName', min_seq_len=1000) # or if in class: self.ssf.split_file(self, in_filename='YourInfileName', out_filename='YourOutfileName', min_seq_len=1000)Here's the new (and final?) code. import sys import argparse class SelectAndSortFasta: def __init__(self): self.lines_out = [] self.header = None self.save_seq = '' self.infilename = None self.outfilename = None self.minsize = None def save_this(self, fo): fo.write('{}\n'.format(self.header)) fo.write('{}\n'.format(self.save_seq)) def split_file(self, from_args=False, in_filename=None, out_filename=None, min_seq_len=None): iname = None oname = None mlen = None if from_args: iname = self.infilename oname = self.outfilename mlen = self.minsize else: iname = in_filename oname = out_filename mlen = min_seq_len self.header = None seqlen = 0 self.lines_out = [] with open(iname, 'r') as fin, open(oname, 'w') as fo: firstline = True for line in fin: line = line.strip() if line.startswith('>'): if firstline: firstline = False self.header = line else: if seqlen > mlen: self.save_this(fo) self.save_seq = line self.lines_out.append(self.header) self.header = line seqlen = 0 else: self.save_seq = '{}{}\n'.format(self.save_seq, line) seqlen += len(line) if self.header is not None: if seqlen > mlen: self.save_this(fo) def show_seq_out(self): print('The following sequences are in the output file:') plist = "\n".join(self.lines_out) print(plist) def parse_args(self): parser = argparse.ArgumentParser(description='SortFasta') parser.add_argument('-i', action='store', dest='infile_name', help='Name of input file') parser.add_argument('-o', action='store', dest='oufile_name', help='Name of output file') parser.add_argument('-s', action='store', dest='minsize', help='Minimum sequence length') args = parser.parse_args() action = results = parser.parse_args() self.infilename = action.infile_name.strip('\'"') self.outfilename = action.oufile_name.strip('\'"') self.minsize = int(action.minsize) if __name__ == '__main__': sf = SelectAndSortFasta() if len(sys.argv) > 1: sf.parse_args() sf.split_file(from_args=True) else: infile = 'data/fasta/AINZ01/AINZ01.1.fsa_nt' outfile = 'data/fasta/AINZ01/AINZ01sorted.1.fsa_nt' minlen = 1000 sf.split_file(from_args=False, in_filename=infile, out_filename=outfile, min_seq_len=minlen) sf.show_seq_out() |
|