Python Forum
Pigz inside python - Reading compressed .gz file much faster
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pigz inside python - Reading compressed .gz file much faster
#1
Hello Pythoners-

I am a linux admin. And one of our users were wondering on how to make the below script faster using pigz or any other multi-threading methods. I have no idea regarding python. Can someone please share on how to make the below part a little bit faster? She said it currently takes around 45minutes to parse on compressed .gz file that is 1GB in size.

if infile.endswith(".gz"):
data = gzip.open(infile, 'rb')
else:
data = open(infile, "r")
outfile = infile.split(".txt")[0] +"_step1.gz"
outdata = gzip.open(outfile, "wb")

## take line by line
for line in data:
line1 = line.rstrip()
if line.startswith("@"):
....
....
....
....
....
outdata.close()
data.close()
print ">Output file: "+ outfile # end of run
Thank you. This is not a homework task. This is a biology lab's problem.
Reply


Messages In This Thread
Pigz inside python - Reading compressed .gz file much faster - by jsmith7279 - Dec-21-2017, 07:21 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Reading an ASCII text file and parsing data... oradba4u 2 1,502 Jun-08-2024, 12:41 AM
Last Post: oradba4u
Sad problems with reading csv file. MassiJames 3 2,718 Nov-16-2023, 03:41 PM
Last Post: snippsat
  Navigating file directories and paths inside Jupyter Notebook Mark17 5 9,294 Oct-29-2023, 12:40 PM
Last Post: Mark17
  Reading a file name fron a folder on my desktop Fiona 4 2,189 Aug-23-2023, 11:11 AM
Last Post: Axel_Erfurt
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 2,146 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Reading a file JonWayn 3 2,020 Dec-30-2022, 10:18 AM
Last Post: ibreeden
  Reading Specific Rows In a CSV File finndude 3 1,906 Dec-13-2022, 03:19 PM
Last Post: finndude
  Excel file reading problem max70990 1 1,731 Dec-11-2022, 07:00 PM
Last Post: deanhystad
  Reading All The RAW Data Inside a PDF NBAComputerMan 4 3,265 Nov-30-2022, 10:54 PM
Last Post: Larz60+
  Replace columns indexes reading a XSLX file Larry1888 2 1,778 Nov-18-2022, 10:16 PM
Last Post: Pedroski55

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020