Python Forum
Python 2.7 Addition to dict is too slow
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python 2.7 Addition to dict is too slow
#1
I'm working on Python script. It takes information about subscribers' traffic from files and put it in special structures. And it works. But it works too slow. I've written the same algorith in PHP and it works much faster. I noticed Python spends a lot of time to put the data in dict. The PHP script spends 6 sec to process my test file, but the Python script - 12 sec (about 7 sec to get the data from the file and 5 sec to fill the structures). My structures look like this: struct[subscriberId][protocolId] = octents

And I use followed function to fill them:

def addBytesToStatStruct(struct, subscriberId, protocolId, octents):
  if subscriberId in struct:
    if protocolId in struct[subscriberId]:
      struct[subscriberId][protocolId] += octents
      return
      else:
        struct[subscriberId][protocolId] = octents
        return
  else:
    struct[subscriberId] = {protocolId : octents}
May be I do something wrong? I suppose my problem appears because of collisions happen during addition. As I know PHP uses chaining but Python uses open addressing. Could you give me a hint how can I make Python dict faster?
Reply
#2
You constantly check if subscriberId and protocolId already exists in the struct. I would bet it is slowing it down
you may want to explore dict.setdefault() and dict.get() methods and defaultdict from collections
check them and if you are not able to figure it post some sample data and I will make an example
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(May-03-2018, 06:07 AM)buran Wrote: You constantly check if subscriberId and protocolId already exists in the struct. I would bet it is slowing it down
you may want to explore dict.setdefault() and dict.get() methods and defaultdict from collections
check them and if you are not able to figure it post some sample data and I will make an example

Thank you for your reply! I've rewritten my function in this way:
struct.setdefault(contactId, {sectionId:0})
struct[contactId].setdefault(sectionId, 0)
struct[contactId][sectionId] += octents
And I tried to use collections.defaultdict instead Python's dict, but unfortunately it didn't make the script significantly faster :(
Reply
#4
Is this way faster?
from collections import defaultdict
struct = defaultdict(lambda: defaultdict(int))
struct['foo']['bar'] += 7
print(struct)
The following version probably works for every version of python since 1.0
def addBytesToStatStruct(struct, subscriberId, protocolId, octents):
    try:
        d = struct[subscriberId]
    except KeyError:
        struct[subscriberId] = {protocolId: octents}
        return
    try:
        d[protocolId] += octents
    except KeyError:
        d[protocolId] = octents
Other solutions involve setdefault() and get() as buran said.
Reply
#5
can you provide your code in broader context - i.e. reading from file and creating the structure
a sample data file would help too.
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
Hi again guys! I've decided my problem. Actually you both helped me. Firstly I checked my file reading function again and saw I called parsing line function twice there. I removed it and it gave me -3 sec. Then added defaultdict initialization with lambda as Gribouillis adviced and it gave me -1 sec. Finally, I compiled my regexp and now my Python script works faster then PHP one. Thank you guys!
Reply
#7
You could probably gain a little more with
from functools import partial
struct = defaultdict(partial(defaultdict, int))
instead of lambda.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Multiply and Addition in the same loop statement with logic. joelraj 2 1,046 Feb-02-2023, 04:33 AM
Last Post: deanhystad
  forloop to compute sum by alternating from addition to subtraction JulianZ 3 1,826 Apr-02-2022, 09:36 AM
Last Post: DeaD_EyE
  My python code is running very slow on millions of records shantanu97 7 2,592 Dec-28-2021, 11:02 AM
Last Post: Larz60+
  String index out of bounds ( Python : Dict ) kommu 2 2,403 Jun-25-2020, 08:52 PM
Last Post: menator01
  Python 2 to 3 dict sorting joshuaprocious 2 59,775 May-14-2020, 03:28 PM
Last Post: joshuaprocious
  Sort a dict in dict cherry_cherry 4 75,861 Apr-08-2020, 12:25 PM
Last Post: perfringo
  Python list - group by dict key karthidec 2 9,438 Nov-25-2019, 06:58 AM
Last Post: buran
  addition for elements in lists of list ridgerunnersjw 3 3,100 Sep-15-2019, 07:11 AM
Last Post: perfringo
  multiplication by successive addition Zebrol 1 3,533 Sep-14-2019, 05:37 PM
Last Post: ichabod801
  Slow Python Code Jay123 3 2,496 Sep-09-2019, 08:46 AM
Last Post: Jay123

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020