Python Forum

Full Version: Python 2.7 Addition to dict is too slow
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm working on Python script. It takes information about subscribers' traffic from files and put it in special structures. And it works. But it works too slow. I've written the same algorith in PHP and it works much faster. I noticed Python spends a lot of time to put the data in dict. The PHP script spends 6 sec to process my test file, but the Python script - 12 sec (about 7 sec to get the data from the file and 5 sec to fill the structures). My structures look like this: struct[subscriberId][protocolId] = octents

And I use followed function to fill them:

def addBytesToStatStruct(struct, subscriberId, protocolId, octents):
  if subscriberId in struct:
    if protocolId in struct[subscriberId]:
      struct[subscriberId][protocolId] += octents
      return
      else:
        struct[subscriberId][protocolId] = octents
        return
  else:
    struct[subscriberId] = {protocolId : octents}
May be I do something wrong? I suppose my problem appears because of collisions happen during addition. As I know PHP uses chaining but Python uses open addressing. Could you give me a hint how can I make Python dict faster?
You constantly check if subscriberId and protocolId already exists in the struct. I would bet it is slowing it down
you may want to explore dict.setdefault() and dict.get() methods and defaultdict from collections
check them and if you are not able to figure it post some sample data and I will make an example
(May-03-2018, 06:07 AM)buran Wrote: [ -> ]You constantly check if subscriberId and protocolId already exists in the struct. I would bet it is slowing it down
you may want to explore dict.setdefault() and dict.get() methods and defaultdict from collections
check them and if you are not able to figure it post some sample data and I will make an example

Thank you for your reply! I've rewritten my function in this way:
struct.setdefault(contactId, {sectionId:0})
struct[contactId].setdefault(sectionId, 0)
struct[contactId][sectionId] += octents
And I tried to use collections.defaultdict instead Python's dict, but unfortunately it didn't make the script significantly faster :(
Is this way faster?
from collections import defaultdict
struct = defaultdict(lambda: defaultdict(int))
struct['foo']['bar'] += 7
print(struct)
The following version probably works for every version of python since 1.0
def addBytesToStatStruct(struct, subscriberId, protocolId, octents):
    try:
        d = struct[subscriberId]
    except KeyError:
        struct[subscriberId] = {protocolId: octents}
        return
    try:
        d[protocolId] += octents
    except KeyError:
        d[protocolId] = octents
Other solutions involve setdefault() and get() as buran said.
can you provide your code in broader context - i.e. reading from file and creating the structure
a sample data file would help too.
Hi again guys! I've decided my problem. Actually you both helped me. Firstly I checked my file reading function again and saw I called parsing line function twice there. I removed it and it gave me -3 sec. Then added defaultdict initialization with lambda as Gribouillis adviced and it gave me -1 sec. Finally, I compiled my regexp and now my Python script works faster then PHP one. Thank you guys!
You could probably gain a little more with
from functools import partial
struct = defaultdict(partial(defaultdict, int))
instead of lambda.