Python Forum
Efficient way of iterating a list of records
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Efficient way of iterating a list of records
#1
Hi

First post for me so pls be gentle Smile

There's a lot of pretext here for a fairly simple question but I thought it better to explain the context than just post a newbie question and invite a barrage of RTFM LMGTFY replies. I'm scoping for ways of doing a particular task, and Python is one of the options I have. This is what I need to do:

I have a small database of locations plus some other information fields which will look like

Lat,Long,Name,data1,data2 etc

and I expect to have 100-1000 order of magnitude number of records. When an event happens I need to iterate through the list as fast as possible and find the closest Lat/Long to my current lat/long then return the rest of the data associated with that record.

I'm planning to do this on a GPS module with embedded Python and this is the main reson for this post. Because as soon as I start searching for database/field/record etc I mostly get results about SQL plugins or suchlike which I doubt I will be able to use. Also, being embedded means resources will be pretty tight so what may work in PyCharm for example won't necessarily work for this platform.

What I think I need is to store my data in a list of dictionaries but I'd appreciate any [polite!] pointers or advice how to get started. Once I'm headed in the right direction I'm happy enough plugging away, RTFMing & Googling as I firmly believe doing is the best way of learning. (NB I'm a complete newbie to Python although I have dabbled in a fair few other languages over the years)

Thanks!
Reply
#2
In case of traditional computational environment built on top of Python (numpy+pandas+python), that would be a quite simple problem. NumPy is implemented in C and aimed at computationally efficient workflow with arrays in Python.
But it is not clear, would it possible to install NumPy on your GPS module?!
You will likely need to implement Haversine distance using NumPy to get
numpy-vectorized form of the distance function.
Reply
#3
(Feb-18-2019, 11:48 AM)scidam Wrote: In case of traditional computational environment built on top of Python (numpy+pandas+python), that would be a quite simple problem. NumPy is implemented in C and aimed at computationally efficient workflow with arrays in Python.
But it is not clear, would it possible to install NumPy on your GPS module?!
You will likely need to implement Haversine distance using NumPy to get
numpy-vectorized form of the distance function.

Sadly I think the answer is no Sad That's the trouble with embedded implementations; they often contain a pretty basic or stripped down
interpreter. I'm aware of the Haversine calculation thanks, and fortunately trig functions are supported although in this case I'm after relative comparisons rather than absolute distances so I'm hoping I can work with lat/long values directly (happy to be corrected though!)

So it's back to manually trawling through the records and the best way to do it...
Reply
#4
Python dictionaries are fastest for key lookup (keys are/must be immutable and dict is hash table, so time complexity is O(1)).

You could use coordinates (tuple) as your dictionary keys. However, this probably will not help you as much as you don't look for exact match but nearest. Still there are some possibilities:

- if you have Python 3.7 (dictionaries are insertion ordered) or can use OrderedDict then you can have dictionary ordered by keys and use bisect to find insertion point and compare two adjacent keys to find which is closer
- you can use min() with keyfunc to find smallest difference in coordinates/keys.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
Quote:python -m timeit -s "import numpy as np; np.random.seed(30); data = {(a, b): '' for a, b in np.random.rand(100, 2)}; point = (10, 20);" "min(data, key=lambda x: (x[0]-point[0])**2 + (x[1]-point[1])**2)"

Output:
10000 loops, best of 3: 192 usec per loop
Quote:python -m timeit -s "import numpy as np; np.random.seed(30); data = [(a, b) for a, b in np.random.rand(100, 2)]; point = (10, 20);" "min([(a - point[0]) ** 2 + (b - point[1]) ** 2 for a,b in data])"

Output:
10000 loops, best of 3: 159 usec per loop
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  A more efficient code titanif 2 492 Oct-17-2023, 02:07 PM
Last Post: deanhystad
  Cleaning my code to make it more efficient BSDevo 13 1,359 Sep-27-2023, 10:39 PM
Last Post: BSDevo
  Making a function more efficient CatorCanulis 9 1,827 Oct-06-2022, 07:47 AM
Last Post: DPaul
  Delete list while iterating shantanu97 1 1,888 Jun-06-2021, 11:59 AM
Last Post: Yoriz
  I there a more efficient way of printing ? Capitaine_Flam 7 3,489 Dec-01-2020, 10:37 AM
Last Post: buran
  Creating a list of dictionaries while iterating pythonnewbie138 6 3,271 Sep-27-2020, 08:23 PM
Last Post: pythonnewbie138
  python3: iterating through list not working wardancer84 3 2,357 Jul-08-2020, 04:30 PM
Last Post: DPaul
  Indexing problem while iterating list and subtracting lbtdne 2 2,130 May-14-2020, 10:19 PM
Last Post: deanhystad
  A more efficient way of comparing two images in a python vukan 0 2,018 Mar-17-2020, 11:39 AM
Last Post: vukan
  how to make iterative search more efficient renergy 2 2,266 Jan-03-2020, 03:43 PM
Last Post: stullis

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020