[swift unboxed]

Safely unboxing the Swift language & standard library


Hashable

Hash values in Swift: more than just XOR.

15 August 2018 ∙ Protocols ∙ written by Swift 4.2

Based on a talk presented at the Swift Language User Group in San Francisco.

Hash values are all over our nerdy programmer lives: our git commits have SHA-1 hashes as their unique identifiers, common data structures such as sets and dictionaries use hash values for speedy lookups, and so on.

Things that can have a hash value conform to the Hashable protocol in Swift. What is the protocol all about? What is a hash value? And what’s different in the post-Swift-4.2 world?

Hashable

The Hashable protocol inherits from Equatable as a starting point:

public protocol Hashable : Equatable {

Before Swift 4.2, the protocol had a single requirement:

var hashValue: Int { get }

That means you need to implement a single property that returns an Int for the hash value.

Hash value, you say? 🤔

Hash Values

A hash value is an integer representation of some other value. In Swift that’s a 64-bit integer, which gives quite a bit of room for all your things: just over eighteen quintillion, or 18,446,744,073,709,551,615.

But how many possible values are there out there in the universe? Every possible string value, every possible type you might define, and then every possible instance of each type — turns out it’s a lot!

If we have an infinite number of things to represent then there’s no way each thing can have its own hash value, since there are only so many 64-bit integers.

Hash values map data to an integer

There’s our first bit of intuition: there’s no way hash values can be completely unique; that is, there will surely be hash collisions where two different values out there in the universe will have the same hash value.

Rules and Requirements

To conform to the Hashable protocol, we need to return a hash value. What are the requirements around the values we return?

There’s one big rule around hash values:

If two instances are equal, they must produce the same hash value.

Said another way: if a == b then a.hashValue == b.hashValue.

That’s it, that’s the one rule.

Equal values must have equal hashes too

From that one rule though, we can derive some additional conclusions. And even some incorrect conclusions that our intuition might initially tell us are correct. 😈

What Values Tell Us

We’ve already established what happens when two values are equal: their hashes are equal too.

What about two non-equal values? Can we say anything conclusive about their hashes?

Perhaps their hash values are also non-equal. That sounds intuitive, but we know about hash collisions, so this can’t be the case—it’s possible for two different values to have the same hash.

So two non-equal values could have either the same or different hash values; we can conclude nothing about their hashes.

What Hash Values Tell Us

What if we have two values, but we don’t know anything about them except for their hash values?

If the hash values are the same, we might conclude that the values are also the same. Again though, we remember hash collisions. Based on equal hashes, we can’t conclude anything about their original values — they could be the same, or not.

If the hash values are different then it’s possible for the original values to be different. But could the original values be the same? This would violate the One Big Rule of hash values, that equal values mush have equal hashes.

So we can conclude something here: if two hash values are different, then their corresponding original values must be different too.

Non-equal hashes mean the corresponding values must also be non-equal

Although Equatable and Hashable often go together, I find it’s best to think of them as separate concepts to avoid coming to incorrect conclusions.

Hashers

In Swift 4.2, there’s an additional level of abstraction around calculating hash values.

Rather than leave us to our own devices on how to create a hash value from a collection of properties, Swift 4.2 introduces some structure around a hash combining function.

The Hashable protocol now has this requirement:

func hash(into hasher: inout Hasher)

The compiler will pass in a Hasher for you, which represents a particular hash combining function.

Swift uses the SipHash algorithm. You can see the implementation in SipHash.swift in the standard library sources.

All you need to do is call the combiner function with each component of your type.

func hash(into hasher: inout Hasher) {
  hasher.combine(self.property1)
  hasher.combine(self.property2)
  // etc.
}

When your instance is asked for its hash value, this function will get called and then the hasher will be asked to provide the final value.

Hash Stability

Hash values must be consistent: if I have some immutable value and ask for its hash 1000 times during the life of the program, it should return the same value to me all 1000 times.

However, hash values aren’t guaranteed to be the same across launches of your program. Turns out they are stable in Swift 4.1, although the documentation is very clear that you shouldn’t rely on this.

Starting in Swift 4.2, hash values are seeded with a random value at launch. During the lifetime of the program, the hash values will be stable.

But if you store the hash value somewhere and expect it to remain the same forever, across multiple launches, then you’ll run into trouble. The big takeaway: don’t store hash values! They are not unique IDs or checksums that you can persist and use as stable identifiers.

The Closing Brace

Hash values are a mapping between some arbitrary data and an integer.

You might not use them directly, but they power many popular data structures such as sets and dictionaries.

Although Swift 4.2 introduces a more convenient way to calculate hashes, in many cases you don’t have to write the implementation yourself! If your struct or enum exclusively incorporates values that are themselves Hashable, the compiler handle the Hashable conformance automatically for you.

If you’re curious how the compiler does this, see Synthesized Conformance to Equatable. That article talks about the Equatable protocol, but the compiler synthesized conformance to Hashable works in a similar way.

}

Based on a talk presented at the Swift Language User Group in San Francisco, August 2018.