Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterating Large Files
#11
I'm really not a numpy expert but np.loadtxt() accepts an iterator argument (such as an open file) instead of a file name, but more importantly for us, it also has a max_rows argument which allows us to read only a part of the file into an np.array.

  1. I suggest that you read the file this way by chunks s and you append s.byteswap().tobytes() to a temporary binary file open for writing.
  2. In a second step, you invoke the bsort command on the binary file.
  3. In a third step, you read the binary file by chunks of 8 * N bytes, you can convert the chunk to a numpy array and if you want, you can write the chunk to a new text file.

If there are more than one column in the file, then you need to take this into account. What do the other columns contain? Several strategies are possible.

If all this works, other optimisations may be reachable, for example some people write that loadtxt is slow and that there are other solutions.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 5,707 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Handling Large XML Files (>10GB) in Python onlydibs 1 4,143 Dec-22-2019, 05:46 AM
Last Post: Clunk_Head
  Segmentation fault with large files kusal1 3 2,695 Oct-01-2019, 07:32 AM
Last Post: Gribouillis
  Compare two large CSV files for a match Python_Newbie9 3 5,755 Apr-22-2019, 08:49 PM
Last Post: ichabod801
  Comparing values in large txt files StevenVF 2 2,702 Feb-28-2019, 09:07 AM
Last Post: StevenVF
  Download multiple large json files at once halcynthis 0 2,752 Feb-14-2019, 08:41 AM
Last Post: halcynthis
  iterating over files clarablanes 17 7,119 Aug-30-2018, 02:18 PM
Last Post: clarablanes

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020