Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterating Large Files
#4
Robotguy Wrote:I see Memory leaks due to append method.
Well, the most obvious recommendation would be to write a file sequentially instead of filling a numpy array until there is a memory leak. I'm not a numpy expert but the strategy would be
with open('export.txt', 'wb') as ofh:
    diff = [] # this is a python list, not a numpy array
    for <iteration within the input>:
        diff.extend(...) # use list's extend() or append() methods which are fast
        if len(diff) > 10000: # save diff list and reset it when it becomes too long
            np.savetxt(ofh, diff)
            diff = []
It is probably not lightspeed but it can do the work for reasonably sized files. As an example the following python loop with 1 billion numbers take less than 2 minutes on my computer

>>> def f():
...     L = []
...     start = time.time()
...     for i in range(10**9):
...         L.append(i)
...         if len(L) >= 10000:
...             L = []
...     print(time.time() - start, 'seconds')
beware of numpy.append() which rewrites the whole array each time. Benchmark your procedures.

As for the file, I use sometimes the trick to write a file on a ramdisk, which is fast and doesn't need real disk access. By using this trick, you could perhaps write directly segments of numpy arrays to the file, such as in
np.savetxt(ofh, x[i] - y[j:k])
Reply


Messages In This Thread
Iterating Large Files - by Robotguy - Jun-25-2020, 10:46 PM
RE: Iterating Large Files - by Gribouillis - Jun-26-2020, 10:00 AM
RE: Iterating Large Files - by Robotguy - Jul-15-2020, 08:54 PM
RE: Iterating Large Files - by Gribouillis - Jul-15-2020, 11:01 PM
RE: Iterating Large Files - by Robotguy - Jul-17-2020, 04:23 PM
RE: Iterating Large Files - by Gribouillis - Jul-16-2020, 07:11 AM
RE: Iterating Large Files - by Gribouillis - Jul-17-2020, 07:41 PM
RE: Iterating Large Files - by Robotguy - Jul-22-2020, 03:23 PM
RE: Iterating Large Files - by Gribouillis - Jul-22-2020, 06:09 PM
RE: Iterating Large Files - by Robotguy - Jul-22-2020, 08:46 PM
RE: Iterating Large Files - by Gribouillis - Jul-22-2020, 09:13 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 6,094 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Handling Large XML Files (>10GB) in Python onlydibs 1 4,261 Dec-22-2019, 05:46 AM
Last Post: Clunk_Head
  Segmentation fault with large files kusal1 3 2,823 Oct-01-2019, 07:32 AM
Last Post: Gribouillis
  Compare two large CSV files for a match Python_Newbie9 3 5,861 Apr-22-2019, 08:49 PM
Last Post: ichabod801
  Comparing values in large txt files StevenVF 2 2,789 Feb-28-2019, 09:07 AM
Last Post: StevenVF
  Download multiple large json files at once halcynthis 0 2,821 Feb-14-2019, 08:41 AM
Last Post: halcynthis
  iterating over files clarablanes 17 7,414 Aug-30-2018, 02:18 PM
Last Post: clarablanes

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020