Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 (/thread-28531.html) |
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5 - Robotguy - Jul-22-2020 Hello, I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time. I would like to implement this logic such that the text file is read directly and I can access the elements. Thereafter, I need to sort the whole (mapped) file based on column-2 elements. The examples I see online assumes smaller piece of data (d) and using f[:] = d[:]but I can't do that since d is huge in my case and eats my RAM. PS: I know how to load the file using np.loadtxt and sort them using argsort, but that logic fails (memory error) for GB file size. Would appreciate any direction. nrows, ncols = 20000000, 4 f = np.memmap('memmapped.dat', dtype=np.float32, mode='w+', shape=(nrows, ncols)) filename = "my_file.txt" with open(filename) as file: for i, line in enumerate(file): floats = [float(x) for x in line.split(',')] f[i, :] = floats del f |