Help with python code to search string in one file & replace with line in other file

mforthman · (This post was last modified: Dec-16-2017, 06:44 PM by mforthman.)

(Dec-15-2017, 10:43 PM)Larz60+ Wrote: Questions:
The only thing you want to replace are the headers, not the data, correct?
Also, the header text does not match exactly between files:
Output:Original: Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1[b]_0_rc[/b]
Replacement: Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1
How much is necessary to create unique match (I am assuming anything after the '.' is not part of the match)
Almost there give me another hour or so.

Yes, I only want to replace the headers, not the data.
The original header not matching between files was due to copying the examples from this site into there (it retained the bolding html code (whoops).
Anything after the . is not necessary for a unique match, but it can be used, except that last _[digit]_rc or _[digit].

(Dec-16-2017, 12:58 AM)Larz60+ Wrote: Ok Check this out and get back. I think it's what you are looking for. It replaces everything from the match up to the next '>' record.
It looks for the files to be in a directory named data which is a sub-directory of wherever the code is. You mat want to change this.
you can run it from the command line with a command that looks like:

python WhateverYouCallIt.py -i File1.txt -b File2.txt -o Fileout.txt > data/results.txt

code:
# Replace header in bodyfile with header in header file, writing output to outputfile Larz60+
#

from pathlib import Path
import argparse

class SwapHeaders:
    def __init__(self, origfile=None, headerfile=None, outfile=None):
        self.home = Path('.')
        self.data = self.home / 'data'
        self.original_file = self.data / origfile
        self.header_file = self.data / headerfile
        self.out_file = self.data / outfile

        with self.header_file.open() as fh:
            self.new_data = fh.readlines()

        self.make_new_file()

    def get_orig_rec(self):
        with self.original_file.open() as forig:
            for line in forig:
                yield line

    def get_match(self, match_this, fo):
        found = False
        for line in self.new_data:
            if line.startswith('>'):
                if found:
                    break
                if match_this in line:
                    found = True
            if found:
                fo.write(line)

    def make_new_file(self):
        with self.out_file.open('w') as fo:
            skip = False
            for line in self.get_orig_rec():
                if line.startswith('>'):
                    if skip:
                        skip = False
                    match = line[1:]
                    x = match.rfind('.')
                    if x:
                        match = match[:x]
                    skip = self.get_match(match, fo)
                if skip:
                    continue
                fo.write(line)


def debug_main():
    SwapHeaders(origfile='File1.txt', headerfile='File2.txt', outfile='Fileout.txt')

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ifile",
                        dest='original_filename',
                        help="Filename where headers are to be replaced",
                        action="store")

    parser.add_argument("-b", "--bfile",
                        dest='replace_original_filename',
                        help="Filename containing body",
                        action="store")

    parser.add_argument("-o", "--ofile",
                        dest='out_filename',
                        help="Output filename",
                        action="store")

    args = parser.parse_args()
    original_filename = args.original_filename

    replace_original_filename = args.replace_original_filename

    out_filename = args.out_filename

    SwapHeaders(origfile=original_filename, headerfile=replace_original_filename, outfile=out_filename)

if __name__ == '__main__':
    main()
    # debug_main()

partial results:

Output:>OFAS009268-RA-EXON07 |design:coreoidea-v1,designer:forthman,probes-locus:OFAS009268-RA-EXON07,probes-probe:,probes-source:Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1
TTCTACACAAACTGCTTTGCACTGAGCACCATTAAAATCATCTGTTGACCTTGCAAGTTCTTCAAAATTTACATCAACGCTAATATTCATTTTCCGAGAATGTATTTGCATAATTCGAGCACGGGCATCTTCATTTGGATGAGGAAATTCAATTTTTCTGTCTAGCCTGCCTGATCGGAGAAGGGCTGGATCTAATATATCAACTCTGTTAGTTGCTGCAATG
>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_0_rc
GCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATTTTAATGGTGCTCAGTGCAAAGCAGTTTGTGTAGAA
>OFAS009268-RA-EXON07 |design:coreoidea-v1,designer:forthman,probes-locus:OFAS009268-RA-EXON07,probes-probe:,probes-source:Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1
TTCTACACAAACTGCTTTGCACTGAGCACCATTAAAATCATCTGTTGACCTTGCAAGTTCTTCAAAATTTACATCAACGCTAATATTCATTTTCCGAGAATGTATTTGCATAATTCGAGCACGGGCATCTTCATTTGGATGAGGAAATTCAATTTTTCTGTCTAGCCTGCCTGATCGGAGAAGGGCTGGATCTAATATATCAACTCTGTTAGTTGCTGCAATG
>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_35_rc
AAATTGAATTTCCTCATCCAAATGAAGATGCCCGTGCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATT
>Anasa_tristis_comp3229_c0_seq1_136_rc
TCAGCCAATCATAGTGGAACCGATTTCCAGTGGAGACGAACTCCGAACTGATATTCATGGAATGGAAACACAAATAAACACTTTAGGTTCTAATAACATTGTATGTGTTCTTTCAACAAC
>uce-3225_p7 |design:hemiptera-v1,designer:faircloth,probes-locus:uce-3225,probes-probe:7,probes-source:halhal1,probes-global-chromo:Scaffold629,probes-global-start:410155,probes-global-end:410275,probes-local-start:0,probes-local-end:120
AAATCCATCAAGAAATACCAACAACAACTTAAGGATGTCCAGACCGCACTCGAGGAAGAACAAAGAGCTAGGGATGATGCCCGAGAACAACTTGGTATTGCCGAAAGGCGAGCCAACGCT

You state: Ok Check this out and get back. I think it's what you are looking for. It replaces everything from the match up to the next '>' record.

If I'm reading correctly, that is also replacing the sequence data, not just the header.

The output is close to what I'm wanting, but it seems to miss a few headers that it should be replacing, e.g.:

Quote:>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_0_rc
GCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATTTTAATGGTGCTCAGTGCAAAGCAGTTTGTGTAGAA
>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_35_rc
AAATTGAATTTCCTCATCCAAATGAAGATGCCCGTGCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATT
>Anasa_tristis_comp3229_c0_seq1_136_rc
TCAGCCAATCATAGTGGAACCGATTTCCAGTGGAGACGAACTCCGAACTGATATTCATGGAATGGAAACACAAATAAACACTTTAGGTTCTAATAACATTGTATGTGTTCTTTCAACAAC

(Dec-16-2017, 02:39 AM)Larz60+ Wrote: curious about your moniker.
Are (were) you a forth programmer?

No, that's my last name

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[SOLVED] [Beautiful Soup] Replace tag.string from another file?	Winfried	2	569	May-01-2025, 03:43 PM Last Post: Winfried
	Replace values in Yaml file with value in dictionary	PelleH	1	2,366	Feb-11-2025, 09:51 AM Last Post: alexjordan
	How to remove unwanted images and tables from a Word file using Python?	rownong	2	926	Feb-04-2025, 08:30 AM Last Post: Pedroski55
	Best way to feed python script of a file	absolut	6	1,358	Jan-11-2025, 07:03 AM Last Post: Gribouillis
	Removal of watermark logo pdf file Python	druva	0	884	Jan-01-2025, 11:55 AM Last Post: druva
	How to write variable in a python file then import it in another python file?	tatahuft	4	1,079	Jan-01-2025, 12:18 AM Last Post: Skaperen
	How to communicate between scripts in python via shared file?	daiboonchu	4	2,124	Dec-31-2024, 01:56 PM Last Post: Pedroski55
	Problems writing a large text file in python	Vilius	4	1,155	Dec-21-2024, 09:20 AM Last Post: Pedroski55
	How to read a file as binary or hex "string" so that I can do regex search?	tatahuft	3	1,358	Dec-19-2024, 11:57 AM Last Post: snippsat
	Search in a file using regular expressions	ADELE80	2	870	Dec-18-2024, 12:29 PM Last Post: ADELE80

Help with python code to search string in one file & replace with line in other file

User Panel Messages

Announcements