Dec-18-2017, 10:16 PM
Dec-18-2017, 10:34 PM
I'll take a quick look at making this work in python 2.7. No promises, and I can't spend much more time on it.
Dec-18-2017, 11:29 PM
OK, re-do everything in post 15.
This will work in python 2.7, but you have to run from the command line like:
and changing file names as appropriate.
running:
This will work in python 2.7, but you have to run from the command line like:
c:\Python27\python.exe SwapHeaders.py -i 'File1.txt' -b 'File2.txt' -o 'Fileout.txt'replacing the python command to point to your python 2.7 directory
and changing file names as appropriate.
running:
c:\Python27\python.exe SwapHeaders.py -hAs an aid, will give you:
Output:usage: SwapHeaders.py [-h] [-i ORIGINAL_FILENAME]
[-b REPLACE_ORIGINAL_FILENAME] [-o OUT_FILENAME]
optional arguments:
-h, --help show this help message and exit
-i ORIGINAL_FILENAME, --ifile ORIGINAL_FILENAME
Filename where headers are to be replaced
-b REPLACE_ORIGINAL_FILENAME, --bfile REPLACE_ORIGINAL_FILENAME
Filename containing body
-o OUT_FILENAME, --ofile OUT_FILENAME
Output filename
Dec-19-2017, 02:47 PM
Was getting some No such directory errors and was able to figure out how to modify the code so it would work (using MacOS system). Code below. Running it only replaces some of the targeted headers, specifically it seems to only replace those formatted that have Clavigralla and Anoplocnemis. I think I see why and will play with the script some more.
If I change line 70 'x = match.rfind('.')' to 'x = match.rfind('seq1')', that certainly will select the other targeted headers, but it will include headers that have, e.g., 'seq1_A_' and 'seq1_B_' which I do not want to include. Is there a way to get the match.rfind search term to exclude these instances or to just include seq1_[some numerical digits]?
#!/usr/bin/env python # Replace header inoriginal file header with header in header file, writing output to outputfile # Larz60+ # from pathlib import Path import os import sys import argparse class SwapHeaders: def __init__(self, origfile=None, headerfile=None, outfile=None): # Note Modern pathlib objects removed because they won't work in # outdated python 2.7 # self.home = Path('.') # self.data = self.home / 'data' # self.original_file = self.data / origfile # self.header_file = self.data / headerfile # self.out_file = self.data / outfi # with self.header_file.open() as fh: # self.header_data = fh.readlines() # self.orig = self.original_file.open() # self.fo = self.out_file.open('w') self.home = os.getcwd() self.data = self.home + '/data/' self.original_file = self.data + origfile self.header_file = self.data + headerfile self.out_file = self.data + outfile with open(self.header_file, 'r') as fh: self.header_data = fh.readlines() self.orig = open(self.original_file, 'r') self.fo = None def close_files(self): self.orig.close() def get_replacement_header(self, match): retrec = None for line in self.header_data: if not line.startswith('>'): continue if match in line: retrec = line break return retrec def read_orig_record(self): """ original file record read :return: data or False """ while True: data = self.orig.readline() if not data: break yield data def make_new_file(self): # with self.out_file.open('w') as fo: with open(self.out_file, 'w') as fo: for orig in self.read_orig_record(): match = None if orig.startswith('>'): match = orig[1:] x = match.rfind('.') if x: match = match[:x] new = self.get_replacement_header(match) if new is not None: fo.write(new) else: fo.write(orig) else: fo.write(orig) def main(): # Typical command line call python SwapHeaders.py -i 'File1.txt' -b 'File2.txt' -o 'Fileout.txt' parser = argparse.ArgumentParser() parser.add_argument("-i", "--ifile", dest='original_filename', help="Filename where headers are to be replaced", action="store") parser.add_argument("-b", "--bfile", dest='replace_original_filename', help="Filename containing body", action="store") parser.add_argument("-o", "--ofile", dest='out_filename', help="Output filename", action="store") args = parser.parse_args() original_filename = args.original_filename replace_original_filename = args.replace_original_filename out_filename = args.out_filename sh = SwapHeaders(origfile=original_filename, headerfile=replace_original_filename, outfile=out_filename) sh.make_new_file() sh.close_files() if __name__ == '__main__': main()
If I change line 70 'x = match.rfind('.')' to 'x = match.rfind('seq1')', that certainly will select the other targeted headers, but it will include headers that have, e.g., 'seq1_A_' and 'seq1_B_' which I do not want to include. Is there a way to get the match.rfind search term to exclude these instances or to just include seq1_[some numerical digits]?
Dec-19-2017, 03:51 PM
I've got to get some sleep for a few hours, please during that time, isolate the items that aren't being replaced.
Thanks.
Thanks.
Dec-19-2017, 04:01 PM
(Dec-19-2017, 03:51 PM)Larz60+ Wrote: [ -> ]I've got to get some sleep for a few hours, please during that time, isolate the items that aren't being replaced.
Thanks.
I just modified the file1.txt to insert a '.' after 'seq1', which now gets picked up by the script. I appreciate all of your help.
Dec-19-2017, 07:11 PM
Great!