Python Forum
Fastest dict/map method when 'key' is already a hash?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fastest dict/map method when 'key' is already a hash?
#7
@DeaD_EyE Thanks for the detailed reply.

Some background: My program is intended to address the shortcomings of borg and similar tools, as it can generate incremental deltas instantly from volume snapshot metadata. It also doesn't require a high degree of network interactivity (which creates multiple security issues), and doesn't cache + retransmit data chunks.

Although I've done some tests of borg, I haven't looked into its dedup indexing yet as I thought a simple dict would be worth trying at first:

dedup_idx[hash_i] = (session_obj, address_i)
session_obj is an object reference that holds a chunk's backup session context (like source volume ID, file path, etc.). Typically, there are only about 10-100 session_obj instances so the relationship is many:1.

hash_i and address_i are integers referencing the sha256 hash and 64bit source-volume address for the chunk, respectively. Using integers and object refs this way reduces memory footprint vs strings by almost half.

Before reading your post, I had found a way to poll memory use this way:
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * resource.getpagesize()
Output:
Without index: 65MB With index: 611MB (690135 elements)
Taking your suggestion about sorted collections into account, I'm trying to figure out what advantage they provide for building and searching an index of 'random' hashes. The point is to detect collisions, not to keep elements sorted. What I do get from your size profiling is that lists are more space-efficient than dicts. But I don't know how to efficiently search a key in a list of tuples.
Reply


Messages In This Thread
RE: Fastest dict/map method when 'key' is already a hash? - by tasket - Apr-20-2019, 06:40 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Fastest way tkinter Quatrixouuu 2 467 Feb-19-2024, 07:20 AM
Last Post: Danishhafeez
  What is the fastest way to get all the frames from a video file? glorsh66 3 1,186 May-26-2023, 04:41 AM
Last Post: Gribouillis
  [SOLVED] How to crack hash with hashlib Milan 0 1,478 Mar-09-2023, 08:25 PM
Last Post: Milan
  Fastest Way of Writing/Reading Data JamesA 1 2,242 Jul-27-2021, 03:52 PM
Last Post: Larz60+
  Fastest Method for Querying SQL Server with Python Pandas BuJayBelvin 7 7,054 Aug-02-2020, 06:21 PM
Last Post: jefsummers
  Hash command works differently for me in CMD and Spyder ZweiDCG 3 2,402 Sep-10-2019, 01:10 PM
Last Post: DeaD_EyE
  length constraint on phrase hash to password javaben 0 1,949 Aug-21-2019, 05:34 PM
Last Post: javaben
  Create file archive that contains crypto hash ED209 1 2,083 May-29-2019, 03:05 AM
Last Post: heiner55
  fastest way to record values between quotes paul18fr 5 3,370 Apr-15-2019, 01:51 PM
Last Post: snippsat
  hash v2 and v3 help Normalitie 7 4,422 Mar-22-2018, 01:57 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020