Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
huge list of whole numbers
#1
i have a massively huge list of numbers and i am trying to think of efficient ways to process these with Python. there are about 360 billion numbers in this list. i cannot store them all in virtual memory (obviously). fortunately the numbers are already sorted. there are known to be many ranges where the numbers leave no gaps, or few gaps, and many more ranges where there are no numbers, or very few. i am wanting to scan these numbers sequentially to determine if i could encode the ranges somehow to compact them enough to keep them in memory for other processing that accesses them in random order to test if a given number is in the list or not. are there any existing object classes that could do this? i don't know how long these ranges could span but i do know that the numbers go up to around 2**48 in value. they are currently stored compressed on an external 2 TB USB hard drive taking up about 1 TB. uncompressed it is one number per each 8 bytes in little-endian binary and is more than 2 TB.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
https://stackoverflow.com/questions/2832...f-integers
Craig "Ichabod" O'Brien - xenomind.com
I wish you happiness.
Recommended Tutorials: BBCode, functions, classes, text adventures
Reply
#3
(Jun-02-2019, 06:56 AM)Skaperen Wrote: to keep them in memory for other processing that accesses them in random order to test if a given number is in the list or not

I have no experience with such huge lists but my simplistic approach would be to convert list of integers into list of ranges (as list is sorted) and then using built-in any() to find whether number in list:

>>> lst = [range(1, 1000000000), range(2000000000, 18000000000)]
>>> any(True for r in lst if 12345 in r)
True
>>> any(True for r in lst if 1100000000 in r)
False
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#4
yes, i was thinking to convert a cluster into a range and subranges or a bitmap for exceptions. but whether range of on followed by range of off followed by range of on is better vs. big range of on and a subrange of exceptions for off. the latter case may be more efficient if it does not go deep. then to figure out if subranges are so small that a bitmap would be better. i may well do this in Python as a prototype then redo it in C.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How do I calculate a ratio from 2 numbers and return an equivalent list of about 1000 Pleiades 8 15,637 Jan-05-2024, 08:30 PM
Last Post: sgrey
  find random numbers that are = to the first 2 number of a list. Frankduc 23 3,198 Apr-05-2023, 07:36 PM
Last Post: Frankduc
  List of random numbers astral_travel 17 2,694 Dec-02-2022, 10:37 PM
Last Post: deanhystad
  Remove numbers from a list menator01 4 1,322 Nov-13-2022, 01:27 AM
Last Post: menator01
  [split] why can't i create a list of numbers (ints) with random.randrange() astral_travel 7 1,508 Oct-23-2022, 11:13 PM
Last Post: Pedroski55
  Divide a number by numbers in a list. Wallen 7 8,019 Feb-12-2022, 01:51 PM
Last Post: deanhystad
  producing numbers out of a list bouraque7878 10 3,739 Nov-12-2021, 09:13 PM
Last Post: jefsummers
  How to change odd to even numbers in the list? plumberpy 8 3,716 Aug-08-2021, 11:07 AM
Last Post: plumberpy
  convert numbers into list lokesh 1 2,378 Jun-03-2021, 06:37 AM
Last Post: menator01
  Exporting a huge dataFrame stylingpat 5 15,556 Mar-23-2021, 12:13 AM
Last Post: stylingpat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020