Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Iterating Large Files
#9
Here is a devilish suggestion. I made the following experiment
>>> s
array([9.94223926e-01, 7.55188959e-01, 2.87075284e-04, 6.60265593e-01,
       3.12176498e-02, 3.01580980e-01, 9.79960201e-01, 2.37826251e-01,
       1.74042656e-01, 1.39546100e-02, 2.14055048e-01, 8.73880775e-01,
       5.12656017e-01])
>>> filename = 'paillasse/foo.bin'
>>> Path(filename).write_bytes(s.tobytes())
104
>>> subprocess.call('bsort -k 8 -r 8 ' + filename, shell=True)
0
>>> ss = np.frombuffer(Path(filename).read_bytes())
>>> ss
array([2.87075284e-04, 3.12176498e-02, 9.79960201e-01, 2.37826251e-01,  # STILL WRONG, SEE BELOW
       9.94223926e-01, 5.12656017e-01, 6.60265593e-01, 8.73880775e-01,
       3.01580980e-01, 7.55188959e-01, 1.39546100e-02, 2.14055048e-01,
       1.74042656e-01])
In other words, I save the array as bytes in a file, I use the external progam bsort to perform inplace binary sort on this file
and then I retrieve the sorted numpy array by reading the file as bytes.

Bsort's reputation is to be extremely efficient and also it can presumably handle large files in the same way as Gnu's sort does.

This could be the solution of this problem.

OOPS, there is still an issue, it seems that the array is not yet sorted. It could be a byteorder problem.

YES! the following version works on wy machine
from pathlib import Path
import numpy as np
import subprocess

s = np.array([9.94223926e-01, 7.55188959e-01, 2.87075284e-04, 6.60265593e-01,
       3.12176498e-02, 3.01580980e-01, 9.79960201e-01, 2.37826251e-01,
       1.74042656e-01, 1.39546100e-02, 2.14055048e-01, 8.73880775e-01,
       5.12656017e-01])
filename = 'paillasse/foo.bin'

Path(filename).write_bytes(s.byteswap().tobytes())
subprocess.call(['bsort', '-k', '8', '-r', '8', filename])

ss = np.frombuffer(Path(filename).read_bytes()).byteswap()
print(s)
print(ss)
s.sort()
print('Sorted ?', np.array_equal(s, ss))
Output:
[9.94223926e-01 7.55188959e-01 2.87075284e-04 6.60265593e-01 3.12176498e-02 3.01580980e-01 9.79960201e-01 2.37826251e-01 1.74042656e-01 1.39546100e-02 2.14055048e-01 8.73880775e-01 5.12656017e-01] [2.87075284e-04 1.39546100e-02 3.12176498e-02 1.74042656e-01 2.14055048e-01 2.37826251e-01 3.01580980e-01 5.12656017e-01 6.60265593e-01 7.55188959e-01 8.73880775e-01 9.79960201e-01 9.94223926e-01] Sorted ? True
Reply


Messages In This Thread
Iterating Large Files - by Robotguy - Jun-25-2020, 10:46 PM
RE: Iterating Large Files - by Gribouillis - Jun-26-2020, 10:00 AM
RE: Iterating Large Files - by Robotguy - Jul-15-2020, 08:54 PM
RE: Iterating Large Files - by Gribouillis - Jul-15-2020, 11:01 PM
RE: Iterating Large Files - by Robotguy - Jul-17-2020, 04:23 PM
RE: Iterating Large Files - by Gribouillis - Jul-16-2020, 07:11 AM
RE: Iterating Large Files - by Gribouillis - Jul-17-2020, 07:41 PM
RE: Iterating Large Files - by Robotguy - Jul-22-2020, 03:23 PM
RE: Iterating Large Files - by Gribouillis - Jul-22-2020, 06:09 PM
RE: Iterating Large Files - by Robotguy - Jul-22-2020, 08:46 PM
RE: Iterating Large Files - by Gribouillis - Jul-22-2020, 09:13 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Iterate 2 large text files across lines and replace lines in second file medatib531 13 6,172 Aug-10-2020, 11:01 PM
Last Post: medatib531
  Handling Large XML Files (>10GB) in Python onlydibs 1 4,312 Dec-22-2019, 05:46 AM
Last Post: Clunk_Head
  Segmentation fault with large files kusal1 3 2,852 Oct-01-2019, 07:32 AM
Last Post: Gribouillis
  Compare two large CSV files for a match Python_Newbie9 3 5,882 Apr-22-2019, 08:49 PM
Last Post: ichabod801
  Comparing values in large txt files StevenVF 2 2,820 Feb-28-2019, 09:07 AM
Last Post: StevenVF
  Download multiple large json files at once halcynthis 0 2,850 Feb-14-2019, 08:41 AM
Last Post: halcynthis
  iterating over files clarablanes 17 7,493 Aug-30-2018, 02:18 PM
Last Post: clarablanes

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020