Chunks | Sort | Merge
Split the data of the hdf5 file into chunks.
Then sort this chunks and write the sorted chunks to disk.
Then open all chunk files and merge them.
Write the output into a different file.
You need
Here an example how it could work.
Make a new hdf5 file.
Split the data of the hdf5 file into chunks.
Then sort this chunks and write the sorted chunks to disk.
Then open all chunk files and merge them.
Write the output into a different file.
You need
heapq.merge
which return a generator.Here an example how it could work.
import os import heapq import random from contextlib import ExitStack from pathlib import Path def producer(size): """ Return some random integers between 0 and < 1024 """ return [random.randint(0, 1024) for _ in range(size)] # you could use chunked from more_itertools # https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.chunked def chunker(iterable, chunksize): """ Split a iterable into smaller chunks """ return zip(*[iter(iterable)] * chunksize) def sorter(iterable, filename): """ Sort the chunks and save them into a files The files are defined by filename """ for n, chunk in enumerate(iterable): chunk = "\n".join(map(str, sorted(chunk))) with open(f"{filename}_{n}", "w") as fd: fd.write(chunk) def merger(filename, output): """ Find all files related to filename_* Sort the files by last number Then open the files Merge the chunks and write it to output """ key = lambda x: int(x.name.replace(f"{filename}_", "")) files = sorted(Path(filename).parent.glob(f"{filename}_*"), key=key) with ExitStack() as stack, open(output, "w") as fd_out: files = [stack.enter_context(file.open()) for file in files] map_to_int = [map(int, fd) for fd in files] for number in map(str, heapq.merge(*map_to_int)): fd_out.write(number) fd_out.write("\n") sorter(chunker(producer(100), 10), "sorting") merger("sorting", "result.txt")By the way, do not try to save the output data into the source hdf5 file.
Make a new hdf5 file.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
All humans together. We don't need politicians!