Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
fast lookup for array
#1
Hi all,

I have a large array (+1M values) where for each value I want to lookup something in a table.
What would be an efficient way to do this?

for instance,a s a small example:
A = pd.DataFrame([67,67,67,67,68,69,69,69,70,70])
Table = pd.DataFrame(np.array([[67,'a'],[68,'b'],[69,'c'],[70,'d']]),columns=['Index', 'Item'])
Result = ['a','a','a','a','b','c','c','c','d','d']
How do I best get to the desired result, especially when A is very large?

Thanks,
Reply
#2
DataFrame.isin or Series.isin methods are efficient. Did you try these methods?
Reply
#3
Hash tables are good for fast lockups, but they are bigger than other primitive data structures.
In Python you can use the dict, which is an implementation of hash tables.
If you use Python 3.6+, the order is also preserved, which is not common.

I don't know pandas very well and it's implementation. If pandas uses hash-lookup, then it's fast.
If not, then it's slow.

How big is the table you want to lookup?
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#4
In general, pandas invokes underlying numpy lookup engine, e.g. df.loc[:, 'somecolumn'] == 'somevalue'
is almost equivalent (internally) to df.loc[:, 'somecolumn'].values == 'somevalue', where xxx.values points to corresponding numpy array. This lookup is performed with O(N) time complexity, but it is quite fast, since it is implemented in C. In the same time, Pandas index-based lookups use hash-tables under the hood, so, looking up rows by index is much faster (if N is large) than looking them up by column value.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  2-dataframe, datetime lookup problem Mark17 0 1,230 Jan-27-2022, 01:02 AM
Last Post: Mark17
  Python VLookup? Lookup Table? Nu2Python 3 2,407 Oct-25-2021, 08:47 PM
Last Post: Nu2Python
  Can I replace IF statements with a key lookup ? jehoshua 3 2,492 Mar-05-2021, 10:24 PM
Last Post: jehoshua
  python 3 dns lookup private domain didact 1 2,558 Sep-19-2020, 06:01 PM
Last Post: bowlofred
  Partial key lookup in dictionary GaryNR 1 3,443 Jul-16-2020, 06:55 PM
Last Post: Gribouillis
  Encoding and mac-vendor-lookup library tuanjggaa 1 2,687 Mar-27-2020, 03:12 PM
Last Post: deanhystad
  Excel Lookup riteshprakash 0 1,762 Sep-11-2019, 12:43 PM
Last Post: riteshprakash
  lookup tables Skaperen 4 3,217 Aug-13-2018, 06:43 AM
Last Post: Gribouillis
  Lookup tables parrytoss 0 2,497 Feb-07-2018, 08:45 AM
Last Post: parrytoss
  Reading specific rows (lookup) rumbles 3 3,354 Jan-03-2018, 04:07 PM
Last Post: hshivaraj

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020