Python Forum
Fastest dict/map method when 'key' is already a hash?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fastest dict/map method when 'key' is already a hash?
#7
@DeaD_EyE Thanks for the detailed reply.

Some background: My program is intended to address the shortcomings of borg and similar tools, as it can generate incremental deltas instantly from volume snapshot metadata. It also doesn't require a high degree of network interactivity (which creates multiple security issues), and doesn't cache + retransmit data chunks.

Although I've done some tests of borg, I haven't looked into its dedup indexing yet as I thought a simple dict would be worth trying at first:

dedup_idx[hash_i] = (session_obj, address_i)
session_obj is an object reference that holds a chunk's backup session context (like source volume ID, file path, etc.). Typically, there are only about 10-100 session_obj instances so the relationship is many:1.

hash_i and address_i are integers referencing the sha256 hash and 64bit source-volume address for the chunk, respectively. Using integers and object refs this way reduces memory footprint vs strings by almost half.

Before reading your post, I had found a way to poll memory use this way:
resource.getrusage(resource.RUSAGE_SELF).ru_maxrss * resource.getpagesize()
Output:
Without index: 65MB With index: 611MB (690135 elements)
Taking your suggestion about sorted collections into account, I'm trying to figure out what advantage they provide for building and searching an index of 'random' hashes. The point is to detect collisions, not to keep elements sorted. What I do get from your size profiling is that lists are more space-efficient than dicts. But I don't know how to efficiently search a key in a list of tuples.
Reply


Messages In This Thread
RE: Fastest dict/map method when 'key' is already a hash? - by tasket - Apr-20-2019, 06:40 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Fastest way tkinter Quatrixouuu 2 413 Feb-19-2024, 07:20 AM
Last Post: Danishhafeez
  What is the fastest way to get all the frames from a video file? glorsh66 3 1,086 May-26-2023, 04:41 AM
Last Post: Gribouillis
  [SOLVED] How to crack hash with hashlib Milan 0 1,422 Mar-09-2023, 08:25 PM
Last Post: Milan
  Fastest Way of Writing/Reading Data JamesA 1 2,203 Jul-27-2021, 03:52 PM
Last Post: Larz60+
  Fastest Method for Querying SQL Server with Python Pandas BuJayBelvin 7 6,927 Aug-02-2020, 06:21 PM
Last Post: jefsummers
  Sort a dict in dict cherry_cherry 4 75,500 Apr-08-2020, 12:25 PM
Last Post: perfringo
  Hash command works differently for me in CMD and Spyder ZweiDCG 3 2,360 Sep-10-2019, 01:10 PM
Last Post: DeaD_EyE
  length constraint on phrase hash to password javaben 0 1,922 Aug-21-2019, 05:34 PM
Last Post: javaben
  Create file archive that contains crypto hash ED209 1 2,064 May-29-2019, 03:05 AM
Last Post: heiner55
  fastest way to record values between quotes paul18fr 5 3,307 Apr-15-2019, 01:51 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020