Jun-02-2019, 06:56 AM
i have a massively huge list of numbers and i am trying to think of efficient ways to process these with Python. there are about 360 billion numbers in this list. i cannot store them all in virtual memory (obviously). fortunately the numbers are already sorted. there are known to be many ranges where the numbers leave no gaps, or few gaps, and many more ranges where there are no numbers, or very few. i am wanting to scan these numbers sequentially to determine if i could encode the ranges somehow to compact them enough to keep them in memory for other processing that accesses them in random order to test if a given number is in the list or not. are there any existing object classes that could do this? i don't know how long these ranges could span but i do know that the numbers go up to around 2**48 in value. they are currently stored compressed on an external 2 TB USB hard drive taking up about 1 TB. uncompressed it is one number per each 8 bytes in little-endian binary and is more than 2 TB.