Python Forum
Code that generates MD5 hashes from IPv6 addresses giving differant answers?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Code that generates MD5 hashes from IPv6 addresses giving differant answers?
#1
I have some code that generates MD5 hashes from IPv6 addresses, then checks them against a list of known MD5 hashes. In trying to speed it up, I profiled it, and found the string conversion was chewing up a lot of CPU time. One must convert IPv6 to string to bytes, then feed that to _hashlib.

So, I attempted to speed it up. Here's some code documenting my attempt:
from _hashlib import openssl_md5 as hashMD5
from ipaddress import IPv6Address as IPv6

starting_ip='2001:4958::'
ip = IPv6(starting_ip)
aa = 208000000000

hashgen = hashMD5((b'%u' % (ip+aa))).hexdigest()
hashgen2 = hashMD5(('%s' % (ip+aa)).encode('utf-8')).hexdigest()

print(hashgen)
print(hashgen2)
Output:
d6f76fb9ca27fdae847af8ea2f3797e2 6e217802558e0534bfb91f694e045f5e
I know the second one (hashgen2) is correct, but why is the first one (hashgen) not returning the correct MD5 hash? If Python 3.5.2 is using Unicode as a default, then specifying the 'b' string literal should implicitly encode it as Unicode, right?

What am I doing wrong?

Ah, figured it out. 'u' is not a Unicode string literal. Apparently it's for an integer.
Reply
#2
Hello,

You are correct, the u is for unsigned character. It originated in C as a data type, originally for an 8 bit byte, where you wanted to use the full 255 possible values.
Without it, the range would be  -128 to 127.
Reply
#3
Yeah. It's too bad, too... hashgen was about 4 times faster than hashgen2.
Reply
#4
if you know C (I expect the hash algorithm is written in C) you might want to take a look at the two algorithms.
Most that I have written or borrowed (I used a modified of Aho's from the dragon book) were actually quite simple,
usually fed a seed that was the size of the hash table, manipulating the key through an iterative process of masks and
bit shifts. only a few lines of code.

What you did with it afterwards is where it can get more complicated (although, with care, this can be simple as well). The one that I used for processing
a days worth of phone calls (~80 million calls) used a lateral extension, which was actually a linked list, when a collision was encountered. By using the
size of the table as part of the has, the distribution was very even. The linked list handling of collisions had the (very good) side effect of not running
out of space.

This algorithm could process (identify customer, distance between points, number of points, segment rating, etc.) in twenty minutes.
The lateral lists never got too long, so caused little delay.

I got into hashing in a big way, saving a few computer cycles on a single call really added up when you were processing so many.

Should you get interested and investigate the python hashes, I'd be interested in what you find.

Larz60+
Reply
#5
(Oct-16-2016, 02:59 AM)Larz60+ Wrote: Hello,

You are correct, the u is for unsigned character. It originated in C as a data type, originally for an 8 bit byte, where you wanted to use the full 255 possible values.
Without it, the range would be  -128 to 127.
it's for unsigned (int is implied ... 32 bit in common platforms) in C.  and works that way in Py, too:

Output:
lt1/forums /home/forums 10> py2 Python 2.7.12 (default, Jul  1 2016, 15:12:24) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> '%u' % (2**30,) '1073741824' >>> lt1/forums /home/forums 11> py3 Python 3.5.2 (default, Sep 10 2016, 08:21:44) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> '%u' % (2**30,) '1073741824' >>>
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  'answers 2' is not defined on line 27 0814uu 4 746 Sep-02-2023, 11:02 PM
Last Post: 0814uu
  Compiles Python code with no error but giving out no output - what's wrong with it? pythonflea 6 1,587 Mar-27-2023, 07:38 AM
Last Post: buran
  Non cryptographic hashes AndrzejB 3 837 Mar-21-2023, 07:36 PM
Last Post: AndrzejB
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 974 Feb-15-2023, 05:34 PM
Last Post: zsousa
  a function to get IP addresses of interfaces Skaperen 2 1,440 May-30-2022, 05:00 PM
Last Post: Skaperen
  Loop through list of ip-addresses [SOLVED] AlphaInc 7 4,005 May-11-2022, 02:23 PM
Last Post: menator01
  Division calcuation with answers to 1decimal place. sik 3 2,138 Jul-15-2021, 08:15 AM
Last Post: DeaD_EyE
  instance methods sharing addresses mim 1 2,248 Mar-28-2021, 05:22 AM
Last Post: deanhystad
  Convert email addresses to VCF format jehoshua 2 4,698 Mar-06-2021, 12:50 AM
Last Post: jehoshua
  Cannot Assign right Answers To Shuffled Questions Boblows 6 2,776 Jan-22-2021, 09:41 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020