Python Forum
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5
#1
Hello,

I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time.

I would like to implement this logic such that the text file is read directly and I can access the elements. Thereafter, I need to sort the whole (mapped) file based on column-2 elements.

The examples I see online assumes smaller piece of data (d) and using f[:] = d[:]but I can't do that since d is huge in my case and eats my RAM.

PS: I know how to load the file using np.loadtxt and sort them using argsort, but that logic fails (memory error) for GB file size. Would appreciate any direction.

nrows, ncols = 20000000, 4 
f = np.memmap('memmapped.dat', dtype=np.float32,
              mode='w+', shape=(nrows, ncols))

filename = "my_file.txt"

with open(filename) as file:

    for i, line in enumerate(file):
        floats = [float(x) for x in line.split(',')]
        f[i, :] = floats
del f
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to do line continuation in Jupyter Notebook? Mark17 4 102 9 hours ago
Last Post: ibreeden
  append a string to a modified line Mr_Blue 10 367 Sep-16-2021, 07:24 PM
Last Post: Mr_Blue
  Avoiding Re-login Goodsayan 0 116 Sep-09-2021, 01:53 PM
Last Post: Goodsayan
Lightbulb Multiple inputs on the same line (beginner) dementshuk 9 332 Sep-03-2021, 02:21 PM
Last Post: dementshuk
  Line 42 syntax error..Help!!1 patpython 4 270 Sep-01-2021, 03:22 PM
Last Post: deanhystad
  [SOLVED] Why does regex fail cleaning line? Winfried 5 475 Aug-22-2021, 06:59 PM
Last Post: Winfried
  Read Tensorflow Documentation - Clarification IoannisDem 0 145 Aug-20-2021, 10:36 AM
Last Post: IoannisDem
  How to read this code? Jlyk 3 201 Aug-19-2021, 06:10 AM
Last Post: Jlyk
  Presenting multiline data into single line aaronbuhu 1 264 Aug-05-2021, 10:57 AM
Last Post: jamesaarr
  Read csv file through PyCharm kimx0961 3 403 Aug-01-2021, 07:05 PM
Last Post: perfringo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020