Tomer Aberbach

The Making of Keyalesce

Published ·

Have you ever wanted to use tuples or objects for the keys of a Map or the values of a Set? It’s a very common question because the following code doesn’t do what you might expect:

const map = new Map()
map.set([1, 2], 3)
console.log(map.get([1, 2]))
//=> undefined

const set = new Set()
set.add([1, 2])
console.log(set.has([1, 2]))
//=> false

The code behaves this way because a Map’s keys and a Set’s values are compared using reference equality, as specified by the SameValueZero algorithm. Two deeply equal arrays are not referentially equal:

console.log([1, 2] === [1, 2])
//=> false

The following code behaves more predictably:

// A reference to a single array instance!
const tuple = [1, 2]

const map = new Map()
map.set(tuple, 3)
console.log(map.get(tuple))
//=> 3

const set = new Set()
set.add(tuple)
console.log(set.has(tuple))
//=> true

But that isn’t particularly useful because typically you’re constructing tuples on the fly using values from some external source:

const f = (x, y) => {
  // This will return undefined because `[x, y]` is a new array!
  const value = map.get([x, y])

  // ...
}

A common solution is to use JSON.stringify:

const map = new Map()
map.set(JSON.stringify([1, 2]), 3)
console.log(map.get(JSON.stringify([1, 2])))
//=> 3

const set = new Set()
set.add(JSON.stringify([1, 2]))
console.log(set.has(JSON.stringify([1, 2])))
//=> true

This works because strings are primitives, which are compared by value rather than by reference. Unfortunately, using JSON.stringify requires that the inner values are stringifiable. If they’re not, then you’re forced to write a custom serialization function.

Plus, sometimes you do want reference equality, per inner value, but serialization doesn’t preserve reference equality:

const person1 = { name: `Tomer`, occupation: `Software Engineer` }
const person2 = { name: `Tomer`, occupation: `Software Engineer` }

const salaryApplications = new Map()
salaryApplications.set(JSON.stringify([person1, Number.MAX_VALUE]), `approved`)

// Oh no! Two different software engineers named Tomer are considered the same due to stringifying!
console.log(salaryApplications.get(JSON.stringify([person2, Number.MAX_VALUE])))
//=> approved

Surely there’s a better way!

The solution: <code>keyalesce</code> permalinkThe solution: keyalesce

keyalesce is a module that returns the same unique key for the same value sequence1. It’s perfect for this use case:

const person1 = { name: `Tomer`, occupation: `Software Engineer` }
const person2 = { name: `Tomer`, occupation: `Software Engineer` }

const map = new Map()
map.set(keyalesce([1, 2]), 3)
map.set(keyalesce([2, `b`, 3]), 4)
map.set(keyalesce([person1, Number.MAX_VALUE]), `approved`)
map.set(keyalesce([person2, Number.MAX_VALUE]), `totally approved`)

console.log(map.get(keyalesce([1, 2])))
//=> 3

console.log(map.get(keyalesce([2, `b`, 3])))
//=> 4

console.log(map.get(keyalesce([person1, Number.MAX_VALUE])))
//=> approved

console.log(map.get(keyalesce([person2, Number.MAX_VALUE])))
//=> totally approved

const set = new Set()
set.add(keyalesce([1, 2, 3, 4, 5]))
set.add(keyalesce([3, 3, 2, 2, 1]))

console.log(set.has(keyalesce([1, 2, 3, 4, 5])))
//=> true

console.log(set.has(keyalesce([3, 3, 2, 2, 1])))
//=> true

console.log(keyalesce([1, 2, 3, 4, 5]) === keyalesce([1, 2, 3, 4, 5]))
//=> true

How does it work? permalinkHow does it work?

keyalesce internally maintains a trie containing the sequences of values it has been called with. It creates and returns new keys for new sequences and returns previously created keys for known sequences!

For example, the following code:

const key1 = keyalesce([1, 2, 3, 4])
const key2 = keyalesce([1, 2, 3])
const key3 = keyalesce([1, 2, 7, 8])
const key4 = keyalesce([`a`, `b`, `c`])

Would result in the following trie:

1
2
3
4
7
8
root
key1
key2
key3
key4
a
b
c

And calling keyalesce([1, 2, 3, 4]) again would return key1 after traversing nodes 1, 2, 3, and 4 in the trie.

Does <code>keyalesce</code> cause memory leaks? permalinkDoes keyalesce cause memory leaks?

A long running program using a naive implementation of keyalesce would have a memory leak due to unbounded growth of the trie. How are unreachable nodes pruned?

Sequence value reachability permalinkSequence value reachability

Consider the following code:

let object1 = {}
let object2 = {}

const key1 = keyalesce([1, object1, 4])
const key2 = keyalesce([1, 2, object2, 3])

// ...

Which would result in the following trie:

1
2
3
4
root
key1
key2
object1
object2

If the code continues like so:

object2 = null

Then object2’s original value is only reachable from the trie. keyalesce can now prune the associated sequence and its key from the trie because it has become impossible for keyalesce to be called with that sequence ever again.

I made the trie hold only weak references to objects passed to keyalesce and pruned the trie when the objects are garbage-collected using FinalizationRegistry.

After pruning in this case the trie would look like so:

1
4
root
key 1
object1
Note

Although 1 was in the sequence it was not pruned because it is still needed for another sequence.

Created key reachability permalinkCreated key reachability

Consider the following code:

let key1 = keyalesce([1, 2, 4])
let key2 = keyalesce([1, 2, 5, 7])

// ...

Which would result in the following trie:

1
2
4
5
7
root
key1
key2

The previous section’s logic does not apply because all of the sequence values are primitives, which are always reachable and not eligible for garbage collection. So how is the trie pruned in this case?

If the code continues like so:

key2 = null

Then key2’s original value is only reachable from the trie. Although it’s still possible to call keyalesce with [1, 2, 5, 7], keyalesce can actually prune the key and its associated value sequence because there isn’t any code that depends on receiving that specific key anymore. keyalesce doesn’t need to return the same unique key for a given value sequence. It only needs to prevent multiple keys existing simultaneously for the same value sequence.

Similarly to the handling of object sequence values, I made the trie hold only weak references to created keys and pruned the trie when the keys are garbage-collected using FinalizationRegistry.

After pruning in this case the trie would look like so:

1
2
4
root
key1
Note

Although 1 was in the sequence it was not pruned because it is still needed for another sequence.

In summary, the trie is pruned whenever object sequence values or keys have only weak references to them.

Go use it! permalinkGo use it!

Install keyalesce:

$ npm i keyalesce

And import it:

import keyalesce from 'keyalesce'

const hangouts = new Set()

const createHangoutKey = (person1, person2) =>
  keyalesce([person1, person2].sort())
const hangOut = (person1, person2) =>
  hangouts.add(createHangoutKey(person1, person2))
const didTheyHangOut = (person1, person2) =>
  hangouts.has(createHangoutKey(person1, person2))

hangOut(`Tomer`, `Samuel`)
hangOut(`Tomer`, `Amanda`)

console.log(didTheyHangOut(`Tomer`, `Samuel`))
console.log(didTheyHangOut(`Samuel`, `Tomer`))
//=> true
//=> true

console.log(didTheyHangOut(`Tomer`, `Amanda`))
console.log(didTheyHangOut(`Amanda`, `Tomer`))
//=> true
//=> true

console.log(didTheyHangOut(`Samuel`, `Amanda`))
console.log(didTheyHangOut(`Amanda`, `Samuel`))
//=> false
//=> false

Footnotes

  1. Two value sequences are considered equal if each of their values are equal using the SameValueZero algorithm.