Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
is ValueError a class?
#1
is ValueError a class? can a class be hashed? should a class be hashable? should the result of ValueError() be hashable?

one of the things i have noticed is:
Output:
>>> d={} >>> d[ValueError('woot')]=0 >>> d[ValueError('woot')]=1 >>> print(repr(d)) {ValueError('woot'): 0, ValueError('woot'): 1} >>>
which suggests to me that it should not be hashable.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#2
>>> ValueError
<class 'ValueError'>
>>> e = ValueError('spam')
>>> import collections
>>> isinstance(e, collections.abc.Hashable)
True
Like most objects in python, ValueError instances are hashable.
Python documentation Wrote:The only required property is that objects which compare equal have the same hash value
>>> ValueError('Woot') == ValueError('Woot')
False
Remember that == defaults to is for python objects unless the __eq__ method of a class redefines equality.
Reply
#3
i ran into an issue yesterday in some code that was collecting exceptions in a set and things became confused when two like ValueError exceptions left the set with len() of 2. while debugging that, it was a bit of a shock to see two of the same thing in a set. i was sure this kind of "bug" would have been seen in some way, so i dug into it for a couple hours. i'll go back and do the Exception measurements a different way. it would have been nice if classes were unhashable. IMHO, anything that can change should be unhashable.

classes i defined were hashing to what appeared to be their memory address. repr() was not showing the hash for the standard exception classes. i'm curious how that was done, if they really are classes (maybe they just pretend to be a class).
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#4
(Mar-21-2023, 01:36 AM)Skaperen Wrote: i'm curious how that was done
I think I've found the code in cpython/pyhash.c
Output:
Py_hash_t _Py_HashPointerRaw(const void *p) { size_t y = (size_t)p; /* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid excessive hash collisions for dicts and sets */ y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4)); return (Py_hash_t)y; }
Here is my own implementation of this in pure Python
>>> def myhash(x):
...     y = id(x)
...     return ((y & 15) << 60) | (y >> 4)
... 
>>> x = object()
>>> hash(x) == myhash(x)
True
>>> x = ValueError('woot')
>>> hash(x) == myhash(x)
True
Reply
#5
i don't think that C code creates this issue (was that what was intend to show?). whether it reduces collisions or not should not cause set duplication. IMHO, this issue is the perception that the hashes are the same because the creation looks to be the same. then, repr() does not show the hash or key, so it masks that cause. so, there really is no duplication of keys because the hashes are unique. i'll try to expose it here:

i create this try_hash.py file:
s = set()
for x in range(4):
    s.add(ValueError('Woot'))
for x in s:
    print(hash(x))
print(repr(s))
and ran it:
Output:
lt1a/forums/1 /home/forums 12> python3.8 try_hash.py 8726699187329 8726699214099 8726699187324 8726699211127 {ValueError('Woot'), ValueError('Woot'), ValueError('Woot'), ValueError('Woot')} lt1a/forums/1 /home/forums 13>
it looks to me like:

1. dictionary and set key from the hash of the member. i think of a set as a dictionary with no value.

2. ValueError and the like are implemented in a way that repr() will not show its hash (it does show it for other classes used as a key).

3. since each ValueError('Woot') is not destructed. each will have a distinct memory location and thus a distinct hash.

4. set and dictionary will see each ValueError('Woot') as different because their hash is different.

IMHO, that C code just makes slightly smaller internal hashes for laying out the memory of the object. i don't know if it really reduces collisions, but it might speed up the whole hash object other C code is working with as most hash values will be more compacted.

part of the confusion of all this is two meanings for the term "hash". one is a data structure where various data objects are laid out for rapid access in most cases. the other is a value associated with Python objects (C can do this, too, but has to know the object' bounds). this issue involves both meanings.
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#6
(Mar-21-2023, 05:34 PM)Skaperen Wrote: set and dictionary will see each ValueError('Woot') as different because their hash is different.
This is not true. Sets and dictinaries see each ValueError('Woot') as different because they compare different, in the sense of the ==operator, or the __eq__ method. Below is an example where I create a hashable subclass of int which shows this behavior with two objects having the same hash value
>>> class I(int):
...     def __eq__(self, other):
...         return isinstance(other, I) and int.__eq__(self, other)
...     def __hash__(self):
...         return int.__hash__(self)
... 
>>> 
>>> x = (1, 2, 3)
>>> y = (I(1), 2, 3)
>>> hash(x) == hash(y)
True
>>> x == y
False
>>> set([x, y])
{(1, 2, 3), (1, 2, 3)}
In a Hash table (such as a set or a dict), the hash value is only used to determine the bucket containing a value, but the bucket contains several objects, so having the same hash value does not create a unique instance in a hash table.

Mathematically, there are two equivalence relations here: the relation x == y and the relation hash(x) == hash(y). The constraint on the hash function is that the equality relation is finer than the hash equality relation.
Reply
#7
i thought it checked the hash value first and if that was different then it would have been considered different to avoid extra time comparing. as far is i could know, these instances of ValueError('Woot') were alike but ended up appearing different because their hash() value was derived from the memory address they were stored at.

you say they are really different. what detail is different?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#8
(Mar-23-2023, 05:59 PM)Skaperen Wrote: you say they are really different. what detail is different?
They are different because the ValueError type does not redefine the __eq__() method, which means that equality testing between ValueError instances defaults to equality testing between two object instances, which means that all instances differ from one another.
>>> a = object()
>>> b = object()
>>> a == b
False
Basically, they differ because they don't have the same memory addresses.
>>> set([a, b])
{<object object at 0x7f13fd924910>, <object object at 0x7f13fd9248f0>}
Reply
#9
isn't the memory address what you get for hash() of some generic object in the CPython implementation (implementation specific in other cases)?
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply
#10
well, close, anyway. it looks to be right shifted by 4 bits:
Output:
lt1a/forums/1 /home/forums 4> python3 Python 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> a = object() >>> b = object() >>> a == b False >>> set([a, b]) {<object object at 0x7f3914dc7e00>, <object object at 0x7f3914dc7df0>} >>> hex(hash(a)) '0x7f3914dc7df' >>> hex(hash(b)) '0x7f3914dc7e0' >>>
Tradition is peer pressure from dead people

What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020