Saturday, September 29, 2012

Reducing hash table resizing cost

I found the approach outlined here to be an interesting approach to reducing the cost of hash table resizing: Hash_table#Monotonic_keys:

"If it is known that key values will always increase (or decrease) monotonically, then a variation of consistent hashing can be achieved by keeping a list of the single most recent key value at each hash table resize operation. Upon lookup, keys that fall in the ranges defined by these list entries are directed to the appropriate hash function—and indeed hash table—both of which can be different for each range. Since it is common to grow the overall number of entries by doubling, there will only be O(lg(N)) ranges to check, and binary search time for the redirection would be O(lg(lg(N))). As with consistent hashing, this approach guarantees that any key's hash, once issued, will never change, even when the hash table is later grown."

Consistent hashing appears to be used widely in dealing with this cost.  I can imagine something similar of the sorts must be used to keep consistent large sets of data by search engines.

No comments:

Post a Comment