Aug-19-2022, 08:28 PM
Pythonistas,
I am trying to improve performance of a simple script reading and comparing two 20KB .dat files. The diff assignment takes 42 seconds and the sys.stdout.writelines 82 seconds. Is there an approach to shaving down these times using standard library features. Would concurrent.futures work. It would be nice to have a settable thread pool like used in multiprocessing module but that doesn't come in the standard library.
Dave
I am trying to improve performance of a simple script reading and comparing two 20KB .dat files. The diff assignment takes 42 seconds and the sys.stdout.writelines 82 seconds. Is there an approach to shaving down these times using standard library features. Would concurrent.futures work. It would be nice to have a settable thread pool like used in multiprocessing module but that doesn't come in the standard library.
with multiprocessing.Pool(processes=(2)) as poolIs there a way to use multiple threads or processes than can be "set" depending on the size of the .dat?
import difflib import sys import time import os fromfile = "some1.dat" tofile = "some2.dat" fromlines = open(fromfile, "r").readlines() tolines = open(tofile, "r").readlines() # takes 42 secs diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile) #takes 82 secs sys.stdout.writelines(diff)Best,
Dave