Mastering Objectoriented Python
上QQ阅读APP看书,第一时间看更新

Deciding what to hash

Not every object should provide a hash value. Specifically, if we're creating a class of stateful, mutable objects, the class should never return a hash value. The definition of __hash__ should be None.

Immutable objects, on the other hand, might sensibly return a hash value so that the object can be used as the key in a dictionary or a member of a set. In this case, the hash value needs to parallel the way the test for equality works. It's bad to have objects that claim to be equal and have different hash values. The reverse—objects with the same hash that are actually not equal—is acceptable.

The __eq__() method, which we'll also look at in the section on comparison operators, is intimately tied up with hashing.

There are three tiers of equality comparison:

  • Same Hash Value: This means that two objects could be equal. The hash value provides us with a quick check for likely equality. If the hash value is different, the two objects cannot possibly be equal, nor can they be the same object.
  • Compare As Equal: This means that the hash values must also have been equal. This is the definition of the == operator. The objects may be the same object.
  • Same IDD: This means that they are the same object. They also compare as equal and will have the same hash value. This is the definition of the is operator.

The Fundamental Law of Hash (FLH) is this: objects that compare as equal have the same hash value.

We can think of a hash comparison as being the first step in an equality test.

The inverse, however, is not true. Objects can have the same hash value but compare as not equal. This is valid and leads to some expected processing overhead when creating sets or dictionaries. We can't reliably create distinct 64 bit hash values from much larger data structures. There will be unequal objects that are reduced to coincidentally equal hash values.

Coincidentally, equal hash values are an expected overhead when working with sets and dicts. These collections have internal algorithms to use alternate locations in the event of hash collisions.

There are three use cases for defining equality tests and hash values via the __eq__() and __hash__() method functions:

  • Immutable objects: These are stateless objects of types such as tuples, namedtuples, and frozensets that cannot be updated. We have two choices:
    • Define neither __hash__() nor __eq__(). This means doing nothing and using the inherited definitions. In this case, __hash__() returns a trivial function of the ID value for the object, and __eq__() compares the ID values. The default equality test may sometimes be counterintuitive. Our application might require two instances of Card( 1, Clubs ) to test as equal and compute the same hash; this won't happen by default.
    • Define both __hash__() and __eq__(). Note that we're expected to define both for an immutable object.
  • Mutable objects: These are stateful objects that can be modified internally. We have one design choice:
    • Define __eq__() but set __hash__ to None. These cannot be used as dict keys or items in sets.

Note that there's an additional possible combination: defining __hash__() but using a default definition for __eq__(). This is simply a waste of code, as the default __eq__() method is the same as the is operator. The default __hash__() method would have involved writing less code for the same behavior.

We'll look at each of the three situations in detail.