Python Forum
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5

I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time.

I would like to implement this logic such that the text file is read directly and I can access the elements. Thereafter, I need to sort the whole (mapped) file based on column-2 elements.

The examples I see online assumes smaller piece of data (d) and using f[:] = d[:]but I can't do that since d is huge in my case and eats my RAM.

PS: I know how to load the file using np.loadtxt and sort them using argsort, but that logic fails (memory error) for GB file size. Would appreciate any direction.

nrows, ncols = 20000000, 4 
f = np.memmap('memmapped.dat', dtype=np.float32,
              mode='w+', shape=(nrows, ncols))

filename = "my_file.txt"

with open(filename) as file:

    for i, line in enumerate(file):
        floats = [float(x) for x in line.split(',')]
        f[i, :] = floats
del f

Possibly Related Threads…
Thread Author Replies Views Last Post
  |SOLVED] Glob JPGs, read EXIF, update file timestamp? Winfried 5 232 Oct-21-2021, 03:29 AM
Last Post: buran
  [SOLVED] Read text file from some point till EOF? Winfried 1 130 Oct-10-2021, 10:29 PM
Last Post: Winfried
  How to do next line output from CSV column? atomxkai 2 432 Oct-02-2021, 01:00 AM
Last Post: Pedroski55
  How to do line continuation in Jupyter Notebook? Mark17 4 325 Sep-22-2021, 04:22 PM
Last Post: ibreeden
  append a string to a modified line Mr_Blue 10 585 Sep-16-2021, 07:24 PM
Last Post: Mr_Blue
  Avoiding Re-login Goodsayan 0 183 Sep-09-2021, 01:53 PM
Last Post: Goodsayan
Lightbulb Multiple inputs on the same line (beginner) dementshuk 9 499 Sep-03-2021, 02:21 PM
Last Post: dementshuk
  Line 42 syntax error..Help!!1 patpython 4 384 Sep-01-2021, 03:22 PM
Last Post: deanhystad
  [SOLVED] Why does regex fail cleaning line? Winfried 5 577 Aug-22-2021, 06:59 PM
Last Post: Winfried
  Read Tensorflow Documentation - Clarification IoannisDem 0 212 Aug-20-2021, 10:36 AM
Last Post: IoannisDem

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020