Python Forum

Full Version: Iterating Large Files
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
I'm really not a numpy expert but np.loadtxt() accepts an iterator argument (such as an open file) instead of a file name, but more importantly for us, it also has a max_rows argument which allows us to read only a part of the file into an np.array.

  1. I suggest that you read the file this way by chunks s and you append s.byteswap().tobytes() to a temporary binary file open for writing.
  2. In a second step, you invoke the bsort command on the binary file.
  3. In a third step, you read the binary file by chunks of 8 * N bytes, you can convert the chunk to a numpy array and if you want, you can write the chunk to a new text file.

If there are more than one column in the file, then you need to take this into account. What do the other columns contain? Several strategies are possible.

If all this works, other optimisations may be reachable, for example some people write that loadtxt is slow and that there are other solutions.
Pages: 1 2