Python Forum
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Read/Sort Large text file avoiding line-by-line read using mmep or hdf5
#1
Hello,

I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time.

I would like to implement this logic such that the text file is read directly and I can access the elements. Thereafter, I need to sort the whole (mapped) file based on column-2 elements.

The examples I see online assumes smaller piece of data (d) and using f[:] = d[:]but I can't do that since d is huge in my case and eats my RAM.

PS: I know how to load the file using np.loadtxt and sort them using argsort, but that logic fails (memory error) for GB file size. Would appreciate any direction.

nrows, ncols = 20000000, 4 
f = np.memmap('memmapped.dat', dtype=np.float32,
              mode='w+', shape=(nrows, ncols))

filename = "my_file.txt"

with open(filename) as file:

    for i, line in enumerate(file):
        floats = [float(x) for x in line.split(',')]
        f[i, :] = floats
del f
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to add multi-line comment section? Winfried 1 139 Mar-24-2024, 04:34 PM
Last Post: deanhystad
  break print_format lengthy line akbarza 4 275 Mar-13-2024, 08:35 AM
Last Post: akbarza
  Help with to check an Input list data with a data read from an external source sacharyya 3 318 Mar-09-2024, 12:33 PM
Last Post: Pedroski55
  Reading and storing a line of output from pexpect child eagerissac 1 4,146 Feb-20-2024, 05:51 AM
Last Post: ayoshittu
  coma separator is printed on a new line for some reason tester_V 4 418 Feb-02-2024, 06:06 PM
Last Post: tester_V
  problem with spliting line in print akbarza 3 335 Jan-23-2024, 04:11 PM
Last Post: deanhystad
  Unable to understand the meaning of the line of code. jahuja73 0 272 Jan-23-2024, 05:09 AM
Last Post: jahuja73
  Receive Input on Same Line? johnywhy 8 609 Jan-16-2024, 03:45 AM
Last Post: johnywhy
  Recommended way to read/create PDF file? Winfried 3 2,784 Nov-26-2023, 07:51 AM
Last Post: Pedroski55
  python Read each xlsx file and write it into csv with pipe delimiter mg24 4 1,309 Nov-09-2023, 10:56 AM
Last Post: mg24

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020