Jul-22-2020, 09:13 PM
I'm really not a numpy expert but
If there are more than one column in the file, then you need to take this into account. What do the other columns contain? Several strategies are possible.
If all this works, other optimisations may be reachable, for example some people write that loadtxt is slow and that there are other solutions.
np.loadtxt()
accepts an iterator argument (such as an open file) instead of a file name, but more importantly for us, it also has a max_rows
argument which allows us to read only a part of the file into an np.array
.- I suggest that you read the file this way by chunks
s
and you appends.byteswap().tobytes()
to a temporary binary file open for writing.
- In a second step, you invoke the
bsort
command on the binary file.
- In a third step, you read the binary file by chunks of 8 * N bytes, you can convert the chunk to a numpy array and if you want, you can write the chunk to a new text file.
If there are more than one column in the file, then you need to take this into account. What do the other columns contain? Several strategies are possible.
If all this works, other optimisations may be reachable, for example some people write that loadtxt is slow and that there are other solutions.