Posts: 4
Threads: 1
Joined: Aug 2021
Aug-18-2021, 11:07 AM
(This post was last modified: Aug-18-2021, 11:20 AM by Larz60+.)
Hi,
Kindly your support to provide python script to search given strings (Number, Text etc.) from multiple ".gz" text files
Directory contains multiple".gz" files date wise
Output: /bkup/TC/XYZ/20210818
File Names:
A_7235818.csv.gz
A_7235819.csv.gz
.
.
Output: Content of sample file.
38486,22625,XYZ_06_0_20210817204446-3997
88279,77617,XYZ_06_0_20210817204846-3998
Getting error while running below Code.
import glob
import gzip
matched_lines = []
ZIPFILES='/bkup/TC/XYZ/20210818/*.gz'
grep = raw_input('Enter Search: ')
filelist = glob.glob(ZIPFILES)
for gzfile in filelist:
#print("#Starting " + gzfile) #if you want to know which file is being processed
with gzip.open( gzfile, 'rb') as f:
# grep = raw_input('Enter Search: ')
for line in f: # read file line by line
if grep in line: # search for string in each line
matched_lines.append(line) # keep a list of matched lines
file_content = ''.join(matched_lines) # join the matched lines
print(file_content) Output:
Error: $ ./srch6.py
File "./srch6.py", line 17
for line in f: # read file line by line
^
IndentationError: expected an indented block
Larz60+ write Aug-18-2021, 11:20 AM:Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
fixed for you this time, please use bbcode tags on future posts
Posts: 12,031
Threads: 485
Joined: Sep 2016
The error is clear. You need to fix indentation.
(I did not try to run following so there may be additional errors):
import glob
import gzip
matched_lines = []
ZIPFILES='/bkup/TC/XYZ/20210818/*.gz'
grep = raw_input('Enter Search: ')
filelist = glob.glob(ZIPFILES)
for gzfile in filelist:
# print("#Starting " + gzfile) #if you want to know which file is being processed
with gzip.open( gzfile, 'rb') as f:
# grep = raw_input('Enter Search: ')
for line in f: # read file line by line
if grep in line: # search for string in each line
matched_lines.append(line) # keep a list of matched lines
file_content = ''.join(matched_lines) # join the matched lines
print(file_content)
Posts: 4
Threads: 1
Joined: Aug 2021
Aug-18-2021, 01:37 PM
(This post was last modified: Aug-18-2021, 06:18 PM by Larz60+.)
(Aug-18-2021, 11:26 AM)Larz60+ Wrote: The error is clear. You need to fix indentation.
(I did not try to run following so there may be additional errors):
import glob
import gzip
matched_lines = []
ZIPFILES='/bkup/TC/XYZ/20210818/*.gz'
grep = raw_input('Enter Search: ')
filelist = glob.glob(ZIPFILES)
for gzfile in filelist:
# print("#Starting " + gzfile) #if you want to know which file is being processed
with gzip.open( gzfile, 'rb') as f:
# grep = raw_input('Enter Search: ')
for line in f: # read file line by line
if grep in line: # search for string in each line
matched_lines.append(line) # keep a list of matched lines
file_content = ''.join(matched_lines) # join the matched lines
print(file_content)
Thanks - now code is working fine but not getting search result.
import glob
import gzip
matched_lines = []
ZIPFILES='/bkup/TC/XYZ/20210818/*.gz'
grep = raw_input('Enter Search: ')
filelist = glob.glob(ZIPFILES)
for gzfile in filelist:
print("#Starting " + gzfile) #if you want to know which file is being processed
with gzip.open( gzfile, 'rb') as f:
# grep = raw_input('Enter Search: ')
for line in f: # read file line by line
if grep in line: # search for string in each line
matched_lines.append(line) # keep a list of matched lines
file_content = ''.join(matched_lines) # join the matched lines
print(file_content)
Larz60+ write Aug-18-2021, 06:18 PM:Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Please, as requested previously, use bbcode tags on posts, it's a forum requirement.
Posts: 2,126
Threads: 11
Joined: May 2017
Aug-18-2021, 02:35 PM
(This post was last modified: Aug-18-2021, 02:35 PM by DeaD_EyE.)
It looks like an ancient example for Python 2, which is really out of date.
Here is a working example with some Python magic:
#!/usr/bin/env python3
# You should use Python 3 and don't touch Python 2
# The development of Python 2 has been stopped and
# won't get any security updates
import gzip
import sys
from collections import defaultdict
from pathlib import Path
def get_matching_files(root, contains):
"""
Generator to iterate over gz-files in root and
search line by line for each file a matching text.
If a result was found the generator yields:
>>> gzfile, (line_number, line)
"""
for gzfile in root.glob("*.gz"):
# open in text mode
# this may rise an UnicodeDecodeError
# if the encoding is messed up
with gzip.open(gzfile, "rt") as gz:
for line_number, line in enumerate(gz, start=1):
if contains in line:
yield gzfile, (line_number, line)
if __name__ == "__main__":
if len(sys.argv) != 3:
raise SystemExit(f"python3 {sys.argv[0]} path_to_directory matching_text")
zipfiles = Path(sys.argv[1])
search = sys.argv[2]
results = defaultdict(list)
for gzfile, line in get_matching_files(zipfiles, search):
# line is tuple of (line_number, line)
results[gzfile].append(line)
print(results) The part to get the arguments should be done with argparse, click or typer.
Posts: 4
Threads: 1
Joined: Aug 2021
(Aug-18-2021, 02:35 PM)DeaD_EyE Wrote: It looks like an ancient example for Python 2, which is really out of date.
Here is a working example with some Python magic:
#!/usr/bin/env python3
# You should use Python 3 and don't touch Python 2
# The development of Python 2 has been stopped and
# won't get any security updates
import gzip
import sys
from collections import defaultdict
from pathlib import Path
def get_matching_files(root, contains):
"""
Generator to iterate over gz-files in root and
search line by line for each file a matching text.
If a result was found the generator yields:
>>> gzfile, (line_number, line)
"""
for gzfile in root.glob("*.gz"):
# open in text mode
# this may rise an UnicodeDecodeError
# if the encoding is messed up
with gzip.open(gzfile, "rt") as gz:
for line_number, line in enumerate(gz, start=1):
if contains in line:
yield gzfile, (line_number, line)
if __name__ == "__main__":
if len(sys.argv) != 3:
raise SystemExit(f"python3 {sys.argv[0]} path_to_directory matching_text")
zipfiles = Path(sys.argv[1])
search = sys.argv[2]
results = defaultdict(list)
for gzfile, line in get_matching_files(zipfiles, search):
# line is tuple of (line_number, line)
results[gzfile].append(line)
print(results) The part to get the arguments should be done with argparse, click or typer.
Apologize for delayed response as we have updated python 3.6.8 version.
Executing above script by updating exact file path in below line
for gzfile in root.glob("/bkup/TC/XYZ/20210818/*.gz"):
Output:
python3 ./srch.py path_to_directory matching_text
Posts: 1,838
Threads: 2
Joined: Apr 2017
What's the reason for reimplementing zgrep ?
Gribouillis likes this post
Posts: 2,126
Threads: 11
Joined: May 2017
(Aug-25-2021, 04:18 AM)ndc85430 Wrote: What's the reason for reimplementing zgrep ?
Reimplementing it in Python is better like this:
import subprocess
def zgrep(file, pattern):
proc = subprocess.Popen(["zgrep", pattern, file], stdout=subprocess.PIPE)
for line in proc.stdout:
yield line.decode(errors="replace") Code, which utilizes zgrep , does not run on Windows.
In addition, it adds a dependency to Python + it's not Python.
The next could be, why to implement cat, sort, awk, sed, ... if we already have them on our machines?
The increase of non-pythonic solutions: https://github.com/arunsivaramanneo/GPU-...wer.py#L24
Output: Output:
python3 ./srch.py path_to_directory matching_text
Yes, what could this mean?
Have you tried python3 ./srch.py --help
It's the normal way how command line tools are controlled. They take options, arguments and parameters.
If you want to list a directory on Linux, you could type: ls -l /
The -l is an option and the / is an argument and points to the target directory which ls should show.
Posts: 4,790
Threads: 76
Joined: Jan 2018
It seems that zgrep is just a shell script invoking gzip and grep. It could be easily rewritten in Python.
Posts: 1,838
Threads: 2
Joined: Apr 2017
But the point is, it exists, so why bother reimplementing it?
Posts: 4,790
Threads: 76
Joined: Jan 2018
As DeaD_EyE said above, to increase portability and to reduce dependencies. Python has a built-in gzip library, and a grep-like behavior can be obtained with re.search() . It means that a similar functionality can be obtained from the standard library. Of course someone has to make the effort (sorry I don't currently have time to do that).
|