These questions are designed to accompany the task "Implementing a Hash Map in Python" in the "Data Structures and Algorithms" module. The questions are intended to test your understanding of hash maps, their implementation in Python, and the process of integrating data from a double linked list into a hash map. You will also be asked to reflect on your learning and the challenges you faced during the task.
The following are all examples of hash functions:
# (1) the simplest hash function (Stupidly Simple Hash)
def ssh(key):
return 1
# (2) hash function that sums the ASCII values of the characters in the key
def sum_of_ascii_values(key: str, size: int) -> int:
total = 0
for char in key:
total += ord(char)
return total % size
A more Pythonic version
# (2a)
def sum_of_ascii_values(key: str, size: int) -> int:
return sum(ord(char) for char in key) % size
A Pearson Hash function
# (3) Pearson hash function
# https://en.wikipedia.org/wiki/Pearson_hashing
import random
random.seed(42)
# This is INCORRECT:
# pearson_table = [random.randint(0, 255) for _ in range(256)]
pearson_table = list(range(256))
random.shuffle(pearson_table)
def pearson_hash(key: str, size: int) -> int:
hash_ = 0
for char in key:
hash_ = pearson_table[hash_ ^ ord(char)]
return hash_ % size
The following is a hash function that uses the built-in hash
function in Python
# (4) hash function that uses the built-in hash function
def built_in_hash(key: str, size: int) -> int:
return hash(key) % size
Finally, the following is a hash function that uses the SHA256
hash function from the hashlib
module
# (5) hash function that uses the SHA256 hash function
# https://docs.python.org/3/library/hashlib.html
# https://en.wikipedia.org/wiki/SHA-2
# https://en.wikipedia.org/wiki/SHA-2#Pseudocode
import hashlib
def sha256_hash(key: str, size: int) -> int:
return int(hashlib.sha256(key.encode()).hexdigest(), 16) % size
- All of the above functions are hash functions. Explain how so - what key properties do they all share?
Your answer here
- What are the advantages and disadvantages of each of the above hash functions? Evaluate in terms of uniformity, determinism, efficiency, collision resistance, sensitivity to input changes, and security1. You may need to do some reasearch to answer this question 😱
Your answer here
- List the three most important attributes (arranged from most to least) in the context of a hash map? Justify your answer.
Your answer here
- Which of the above hash functions would you choose to implement the requirements of the task? Why?
Your answer here
- In your own words, explain each line in the pearson hash function above in terms of the criteria you listed in question 2.
Your answer here
- Write pseudocode of how you would store Players in PlayerLists in a hash map.
Your answer here
- What was the most challenging aspect of this task?
Your answer here
- If you didn't have to use a PlayerList, how would you have changed them implementation of the hash map and why?
Your answer here
-
Uniformity: the probability of any given hash value within the range of possible hash values should be approximately equal.
-
Determinism: a given input will always produce the same output.
-
Efficiency: the time complexity of computing the hash value should be constant, the hash function should be fast to compute, and utilize the architecture of the computer effectively
-
Collision Resistance: minimize the probability of collisions, through a variety of mechanisms.
-
Sensitivity to input changes: small changes in the input should produce large changes in the output.
-
Security
- It should be computationally infeasible to find an input key that produces a specific hash value (non-reversibility)
- The output hash values should appear random and unpredictable.