Search string in mutliple .gz files - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Search string in mutliple .gz files (/thread-34655.html) Pages:
1
2
|
Search string in mutliple .gz files - SARAOOF - Aug-18-2021 Hi, Kindly your support to provide python script to search given strings (Number, Text etc.) from multiple ".gz" text files Directory contains multiple".gz" files date wise
Getting error while running below Code.import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: #print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content)Output:
RE: Search string in mutliple .gz files - Larz60+ - Aug-18-2021 The error is clear. You need to fix indentation. (I did not try to run following so there may be additional errors): import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: # print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content) RE: Search string in mutliple .gz files - SARAOOF - Aug-18-2021 (Aug-18-2021, 11:26 AM)Larz60+ Wrote: The error is clear. You need to fix indentation. Thanks - now code is working fine but not getting search result. import glob import gzip matched_lines = [] ZIPFILES='/bkup/TC/XYZ/20210818/*.gz' grep = raw_input('Enter Search: ') filelist = glob.glob(ZIPFILES) for gzfile in filelist: print("#Starting " + gzfile) #if you want to know which file is being processed with gzip.open( gzfile, 'rb') as f: # grep = raw_input('Enter Search: ') for line in f: # read file line by line if grep in line: # search for string in each line matched_lines.append(line) # keep a list of matched lines file_content = ''.join(matched_lines) # join the matched lines print(file_content) RE: Search string in mutliple .gz files - DeaD_EyE - Aug-18-2021 It looks like an ancient example for Python 2, which is really out of date. Here is a working example with some Python magic: #!/usr/bin/env python3 # You should use Python 3 and don't touch Python 2 # The development of Python 2 has been stopped and # won't get any security updates import gzip import sys from collections import defaultdict from pathlib import Path def get_matching_files(root, contains): """ Generator to iterate over gz-files in root and search line by line for each file a matching text. If a result was found the generator yields: >>> gzfile, (line_number, line) """ for gzfile in root.glob("*.gz"): # open in text mode # this may rise an UnicodeDecodeError # if the encoding is messed up with gzip.open(gzfile, "rt") as gz: for line_number, line in enumerate(gz, start=1): if contains in line: yield gzfile, (line_number, line) if __name__ == "__main__": if len(sys.argv) != 3: raise SystemExit(f"python3 {sys.argv[0]} path_to_directory matching_text") zipfiles = Path(sys.argv[1]) search = sys.argv[2] results = defaultdict(list) for gzfile, line in get_matching_files(zipfiles, search): # line is tuple of (line_number, line) results[gzfile].append(line) print(results)The part to get the arguments should be done with argparse, click or typer. RE: Search string in mutliple .gz files - SARAOOF - Aug-24-2021 (Aug-18-2021, 02:35 PM)DeaD_EyE Wrote: It looks like an ancient example for Python 2, which is really out of date. Apologize for delayed response as we have updated python 3.6.8 version. Executing above script by updating exact file path in below line for gzfile in root.glob("/bkup/TC/XYZ/20210818/*.gz"): Output: python3 ./srch.py path_to_directory matching_text RE: Search string in mutliple .gz files - ndc85430 - Aug-25-2021 What's the reason for reimplementing zgrep ?
RE: Search string in mutliple .gz files - DeaD_EyE - Aug-25-2021 (Aug-25-2021, 04:18 AM)ndc85430 Wrote: What's the reason for reimplementing Reimplementing it in Python is better like this: import subprocess def zgrep(file, pattern): proc = subprocess.Popen(["zgrep", pattern, file], stdout=subprocess.PIPE) for line in proc.stdout: yield line.decode(errors="replace")Code, which utilizes zgrep , does not run on Windows.In addition, it adds a dependency to Python + it's not Python. The next could be, why to implement cat, sort, awk, sed, ... if we already have them on our machines? The increase of non-pythonic solutions: https://github.com/arunsivaramanneo/GPU-Viewer/blob/master/Files/VdpauViewer.py#L24 Yes, what could this mean?Have you tried python3 ./srch.py --help It's the normal way how command line tools are controlled. They take options, arguments and parameters. If you want to list a directory on Linux, you could type: ls -l / The -l is an option and the / is an argument and points to the target directory which ls should show.
RE: Search string in mutliple .gz files - Gribouillis - Aug-25-2021 It seems that zgrep is just a shell script invoking gzip and grep. It could be easily rewritten in Python. RE: Search string in mutliple .gz files - ndc85430 - Aug-25-2021 But the point is, it exists, so why bother reimplementing it? RE: Search string in mutliple .gz files - Gribouillis - Aug-25-2021 As DeaD_EyE said above, to increase portability and to reduce dependencies. Python has a built-in gzip library, and a grep-like behavior can be obtained with re.search() . It means that a similar functionality can be obtained from the standard library. Of course someone has to make the effort (sorry I don't currently have time to do that).
|