Python Forum
Help with python code to search string in one file & replace with line in other file
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help with python code to search string in one file & replace with line in other file
#8
(Dec-15-2017, 10:43 PM)Larz60+ Wrote: Questions:
  • The only thing you want to replace are the headers, not the data, correct?
  • Also, the header text does not match exactly between files:
    Output:
    Original: Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1[b]_0_rc[/b] Replacement: Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1
  • How much is necessary to create unique match (I am assuming anything after the '.' is not part of the match)

Almost there give me another hour or so.

Yes, I only want to replace the headers, not the data.
The original header not matching between files was due to copying the examples from this site into there (it retained the bolding html code (whoops).
Anything after the . is not necessary for a unique match, but it can be used, except that last _[digit]_rc or _[digit].

(Dec-16-2017, 12:58 AM)Larz60+ Wrote: Ok Check this out and get back. I think it's what you are looking for. It replaces everything from the match up to the next '>' record.
It looks for the files to be in a directory named data which is a sub-directory of wherever the code is. You mat want to change this.
you can run it from the command line with a command that looks like:
python WhateverYouCallIt.py -i File1.txt -b File2.txt -o Fileout.txt > data/results.txt
code:
# Replace header in bodyfile with header in header file, writing output to outputfile Larz60+
#
from pathlib import Path
import argparse

class SwapHeaders:
    def __init__(self, origfile=None, headerfile=None, outfile=None):
        self.home = Path('.')
        self.data = self.home / 'data'
        self.original_file = self.data / origfile
        self.header_file = self.data / headerfile
        self.out_file = self.data / outfile

        with self.header_file.open() as fh:
            self.new_data = fh.readlines()

        self.make_new_file()

    def get_orig_rec(self):
        with self.original_file.open() as forig:
            for line in forig:
                yield line

    def get_match(self, match_this, fo):
        found = False
        for line in self.new_data:
            if line.startswith('>'):
                if found:
                    break
                if match_this in line:
                    found = True
            if found:
                fo.write(line)

    def make_new_file(self):
        with self.out_file.open('w') as fo:
            skip = False
            for line in self.get_orig_rec():
                if line.startswith('>'):
                    if skip:
                        skip = False
                    match = line[1:]
                    x = match.rfind('.')
                    if x:
                        match = match[:x]
                    skip = self.get_match(match, fo)
                if skip:
                    continue
                fo.write(line)


def debug_main():
    SwapHeaders(origfile='File1.txt', headerfile='File2.txt', outfile='Fileout.txt')

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-i", "--ifile",
                        dest='original_filename',
                        help="Filename where headers are to be replaced",
                        action="store")

    parser.add_argument("-b", "--bfile",
                        dest='replace_original_filename',
                        help="Filename containing body",
                        action="store")

    parser.add_argument("-o", "--ofile",
                        dest='out_filename',
                        help="Output filename",
                        action="store")

    args = parser.parse_args()
    original_filename = args.original_filename

    replace_original_filename = args.replace_original_filename

    out_filename = args.out_filename

    SwapHeaders(origfile=original_filename, headerfile=replace_original_filename, outfile=out_filename)

if __name__ == '__main__':
    main()
    # debug_main()
partial results:
Output:
>OFAS009268-RA-EXON07 |design:coreoidea-v1,designer:forthman,probes-locus:OFAS009268-RA-EXON07,probes-probe:,probes-source:Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1 TTCTACACAAACTGCTTTGCACTGAGCACCATTAAAATCATCTGTTGACCTTGCAAGTTCTTCAAAATTTACATCAACGCTAATATTCATTTTCCGAGAATGTATTTGCATAATTCGAGCACGGGCATCTTCATTTGGATGAGGAAATTCAATTTTTCTGTCTAGCCTGCCTGATCGGAGAAGGGCTGGATCTAATATATCAACTCTGTTAGTTGCTGCAATG >Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_0_rc GCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATTTTAATGGTGCTCAGTGCAAAGCAGTTTGTGTAGAA >OFAS009268-RA-EXON07 |design:coreoidea-v1,designer:forthman,probes-locus:OFAS009268-RA-EXON07,probes-probe:,probes-source:Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1 TTCTACACAAACTGCTTTGCACTGAGCACCATTAAAATCATCTGTTGACCTTGCAAGTTCTTCAAAATTTACATCAACGCTAATATTCATTTTCCGAGAATGTATTTGCATAATTCGAGCACGGGCATCTTCATTTGGATGAGGAAATTCAATTTTTCTGTCTAGCCTGCCTGATCGGAGAAGGGCTGGATCTAATATATCAACTCTGTTAGTTGCTGCAATG >Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_35_rc AAATTGAATTTCCTCATCCAAATGAAGATGCCCGTGCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATT >Anasa_tristis_comp3229_c0_seq1_136_rc TCAGCCAATCATAGTGGAACCGATTTCCAGTGGAGACGAACTCCGAACTGATATTCATGGAATGGAAACACAAATAAACACTTTAGGTTCTAATAACATTGTATGTGTTCTTTCAACAAC >uce-3225_p7 |design:hemiptera-v1,designer:faircloth,probes-locus:uce-3225,probes-probe:7,probes-source:halhal1,probes-global-chromo:Scaffold629,probes-global-start:410155,probes-global-end:410275,probes-local-start:0,probes-local-end:120 AAATCCATCAAGAAATACCAACAACAACTTAAGGATGTCCAGACCGCACTCGAGGAAGAACAAAGAGCTAGGGATGATGCCCGAGAACAACTTGGTATTGCCGAAAGGCGAGCCAACGCT

You state: Ok Check this out and get back. I think it's what you are looking for. It replaces everything from the match up to the next '>' record.

If I'm reading correctly, that is also replacing the sequence data, not just the header.

The output is close to what I'm wanting, but it seems to miss a few headers that it should be replacing, e.g.:

Quote:>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_0_rc
GCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATTTTAATGGTGCTCAGTGCAAAGCAGTTTGTGTAGAA
>Clavigralla_tomentosicollis_gi_512427643_gb_GAJX01006991.1_35_rc
AAATTGAATTTCCTCATCCAAATGAAGATGCCCGTGCTCGAATTATGCAAATACATTCTCGGAAAATGAATATTAGCGTTGATGTAAATTTTGAAGAACTTGCAAGGTCAACAGATGATT
>Anasa_tristis_comp3229_c0_seq1_136_rc
TCAGCCAATCATAGTGGAACCGATTTCCAGTGGAGACGAACTCCGAACTGATATTCATGGAATGGAAACACAAATAAACACTTTAGGTTCTAATAACATTGTATGTGTTCTTTCAACAAC

(Dec-16-2017, 02:39 AM)Larz60+ Wrote: curious about your moniker.
Are (were) you a forth programmer?

No, that's my last name
Reply


Messages In This Thread
RE: Help with python code to search string in one file & replace with line in other file - by mforthman - Dec-16-2017, 06:39 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question [SOLVED] [Beautiful Soup] Replace tag.string from another file? Winfried 2 569 May-01-2025, 03:43 PM
Last Post: Winfried
  Replace values in Yaml file with value in dictionary PelleH 1 2,366 Feb-11-2025, 09:51 AM
Last Post: alexjordan
  How to remove unwanted images and tables from a Word file using Python? rownong 2 926 Feb-04-2025, 08:30 AM
Last Post: Pedroski55
  Best way to feed python script of a file absolut 6 1,358 Jan-11-2025, 07:03 AM
Last Post: Gribouillis
  Removal of watermark logo pdf file Python druva 0 884 Jan-01-2025, 11:55 AM
Last Post: druva
  How to write variable in a python file then import it in another python file? tatahuft 4 1,079 Jan-01-2025, 12:18 AM
Last Post: Skaperen
  How to communicate between scripts in python via shared file? daiboonchu 4 2,124 Dec-31-2024, 01:56 PM
Last Post: Pedroski55
  Problems writing a large text file in python Vilius 4 1,155 Dec-21-2024, 09:20 AM
Last Post: Pedroski55
  How to read a file as binary or hex "string" so that I can do regex search? tatahuft 3 1,358 Dec-19-2024, 11:57 AM
Last Post: snippsat
  Search in a file using regular expressions ADELE80 2 870 Dec-18-2024, 12:29 PM
Last Post: ADELE80

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020