Based on a talk presented at the Swift Language User Group in San Francisco.
Hash values are all over our nerdy programmer lives: our git commits have SHA-1 hashes as their unique identifiers, common data structures such as sets and dictionaries use hash values for speedy lookups, and so on.
Things that can have a hash value conform to the Hashable
protocol in Swift. What is the protocol all about? What is a hash value? And what’s different in the post-Swift-4.2 world?
Hashable
The Hashable
protocol inherits from Equatable
as a starting point:
public protocol Hashable : Equatable {
Before Swift 4.2, the protocol had a single requirement:
var hashValue: Int { get }
That means you need to implement a single property that returns an Int
for the hash value.
Hash value, you say? 🤔
Hash Values
A hash value is an integer representation of some other value. In Swift that’s a 64-bit integer, which gives quite a bit of room for all your things: just over eighteen quintillion, or 18,446,744,073,709,551,615.
But how many possible values are there out there in the universe? Every possible string value, every possible type you might define, and then every possible instance of each type — turns out it’s a lot!
If we have an infinite number of things to represent then there’s no way each thing can have its own hash value, since there are only so many 64-bit integers.
There’s our first bit of intuition: there’s no way hash values can be completely unique; that is, there will surely be hash collisions where two different values out there in the universe will have the same hash value.
Rules and Requirements
To conform to the Hashable
protocol, we need to return a hash value. What are the requirements around the values we return?
There’s one big rule around hash values:
If two instances are equal, they must produce the same hash value.
Said another way: if a == b
then a.hashValue == b.hashValue
.
That’s it, that’s the one rule.
From that one rule though, we can derive some additional conclusions. And even some incorrect conclusions that our intuition might initially tell us are correct. 😈
What Values Tell Us
We’ve already established what happens when two values are equal: their hashes are equal too.
What about two non-equal values? Can we say anything conclusive about their hashes?
Perhaps their hash values are also non-equal. That sounds intuitive, but we know about hash collisions, so this can’t be the case—it’s possible for two different values to have the same hash.
So two non-equal values could have either the same or different hash values; we can conclude nothing about their hashes.
What Hash Values Tell Us
What if we have two values, but we don’t know anything about them except for their hash values?
If the hash values are the same, we might conclude that the values are also the same. Again though, we remember hash collisions. Based on equal hashes, we can’t conclude anything about their original values — they could be the same, or not.
If the hash values are different then it’s possible for the original values to be different. But could the original values be the same? This would violate the One Big Rule of hash values, that equal values mush have equal hashes.
So we can conclude something here: if two hash values are different, then their corresponding original values must be different too.
Although Equatable
and Hashable
often go together, I find it’s best to think of them as separate concepts to avoid coming to incorrect conclusions.
Hashers
In Swift 4.2, there’s an additional level of abstraction around calculating hash values.
Rather than leave us to our own devices on how to create a hash value from a collection of properties, Swift 4.2 introduces some structure around a hash combining function.
The Hashable
protocol now has this requirement:
func hash(into hasher: inout Hasher)
The compiler will pass in a Hasher
for you, which represents a particular hash combining function.
Swift uses the SipHash algorithm. You can see the implementation in SipHash.swift in the standard library sources.
All you need to do is call the combiner function with each component of your type.
func hash(into hasher: inout Hasher) {
hasher.combine(self.property1)
hasher.combine(self.property2)
// etc.
}
When your instance is asked for its hash value, this function will get called and then the hasher will be asked to provide the final value.
Hash Stability
Hash values must be consistent: if I have some immutable value and ask for its hash 1000 times during the life of the program, it should return the same value to me all 1000 times.
However, hash values aren’t guaranteed to be the same across launches of your program. Turns out they are stable in Swift 4.1, although the documentation is very clear that you shouldn’t rely on this.
Starting in Swift 4.2, hash values are seeded with a random value at launch. During the lifetime of the program, the hash values will be stable.
But if you store the hash value somewhere and expect it to remain the same forever, across multiple launches, then you’ll run into trouble. The big takeaway: don’t store hash values! They are not unique IDs or checksums that you can persist and use as stable identifiers.
The Closing Brace
Hash values are a mapping between some arbitrary data and an integer.
You might not use them directly, but they power many popular data structures such as sets and dictionaries.
Although Swift 4.2 introduces a more convenient way to calculate hashes, in many cases you don’t have to write the implementation yourself! If your struct or enum exclusively incorporates values that are themselves Hashable
, the compiler handle the Hashable
conformance automatically for you.
If you’re curious how the compiler does this, see Synthesized Conformance to Equatable. That article talks about the
Equatable
protocol, but the compiler synthesized conformance toHashable
works in a similar way.
}
Based on a talk presented at the Swift Language User Group in San Francisco, August 2018.