Hashing is mainly used to implement dictionaries where you have key and value pairs. It is also used to implement sets, where you have set of keys.
The major use of Hashing is to implement the Search, Insert and Delete operation in O(1) on an average.
For example, you have a set of phone numbers of different people. You can search, insert or delete a specific phone number out of that set using the search operation. You can also have a key and value pair, where you can store the phone numbers as the key, and the details of the person which that phone number belongs to as the value.
Remember, a Hash Table or Hashing does not allow duplicates to be inserted. All the keys inserted are unique, and overwriting takes place in case, a duplicate value is passed to be inserted into that table.
Let’s analyze other data structures in comparison to the Hashing.
The hashing performs really well in terms of Search, Insert and Delete operation than other data structures.
For example, an Array data structure takes O(N) time for searching a specific element into that array, or O(Log(n)) time in case the Array is sorted. Whereas, the Hashing takes O(1) time on an average to perform the search operation.
In an array, the delete and insert operation takes O(N) time when the array is sorted.
The delete and insert operation takes O(1) time when array is not sorted, but in that case, the search operation takes O(N) time.
In the binary search trees, the search, insert and delete operations take O(Log(N)) time on an average, which is still higher than O(1) time provided by Hashing. Certain functions are bulky in Hashing which are provided by the Binary search trees, specially, the AVL trees or Red Black Trees, such as Cell Balancing, which provides a feature to return a value higher or lower than a specific key in O(Log(N)) time, in case that key is not present in the structure.
Hashing is also not recommended when you have to implement the prefix searching, where the Trie data structure is recommended.
So, you can consider Hashing when you have to perform Search, Insert and Delete operations majorly.
Applications of Hashing:
- Dictionaries
Dictionaries can be implemented using the Hashing where the valid words can be searched in minimal time using the Hash Tables. It can also be used to return the similar matches of a word when the required word is not available in the Hash Table.
- Database Indexing
Hashing can be very useful in the indexing of a database. There are other data structures too which can be used to index a database like B/B+. Most of the real-world applications use Hashing and B/B+ as the indexing technique.
- Cryptography
Hashing is very useful in the Cryptography of passwords. Most of the big websites we use in our daily life uses hashing technique to convert the password to a hash and whenever a user tries to login to that service, the hash function computes the hash again and matches the previous stored hash with the newly generated hash. If they match, the user gets logged-in. This has increased the security of online apps from the brute force login attempts.
These are just a few applications of Hashing; the application list of hashing is endless as it is largely used technique with 2nd largest popularity after the Arrays data structures.
Direct Address Tables:
The direct address tables are the simplest implementation of Hashing where we can perform Search, Insert and Delete operation in O(1) time.
Let us take an example, we want to insert the value ranging from 0 to 999 in a table and want to search and delete them in O(1) time. For this, we can use an array of size 999 where we would be initializing all the elements of that array to be 0 initially.
To insert an element into the array, we can write an insert function,
This function will set the value of that specific index to be 1, which will help us check for the value whether it is present.
To delete an element from the array, we can write a delete function,
We can check whether a key is available or not by checking the value at that index.
This search function will return the value at the required index in the array. If the element to be searched is present in the array, it will return 1, else 0.
Now let’s take a dry run of these functions combined.
The Direct Address Tables are not preferred too much because of some restrictions.
For example, if we want to store phone numbers in the table, we cannot initialize a table with these large values (Phone numbers) as index.
Also, we cannot use the combination of characters or strings as index of an array. This is where the idea of Hashing comes in.
The basic approach of Hashing is to use a Hashing function which converts the various types of data such as phone numbers, strings, characters to the small numbers which can be used as the index of an array, also called as Hash Table.
Now, how to write those functions which does this type of work.
What does a Hash Function need to achieve?
- It should always map a large key to small key.
- It should always generate the values from 0 to m-1, where the m is the size of the Hash Table.
- It should be fast, i.e., it should take O(1) time for integers and O(len) time for Strings of length ‘len’.
- It should uniformly distribute the large keys into Hash Table slots. This is the most crucial and difficult task to achieve as we need to create such a Hash Function which can generate a unique index in the range of 0 to m-1, which as the same time does not coincides with any other index of different key. It is possible for more than one key to have a same index generated by a Hash Function which causes collisions and possible data loss. We try to minimalize this by writing good Hash Functions.
Some Examples of Hash Functions:
1. Integral Keys Hashing:
h(large_key) = large_key%m
This function will generate the values which would be in the range of 0 to m-1, where m is the size of the Hash Table.
This function might produce the same value for different keys for example, storing 100 with a Hash Table of size 5 would give the value as 0 (100%5) and storing 200 will also give the value of Hash Function as 0 (200%5), which will produce collisions. We would discuss about the collision handling in the upcoming sections.
2. Strings Hashing
For strings, weighted sums can be used where the ASCII values of the characters of that string can be summed up to produce a value. The problem in this Hash Function is that, it creates the same value for the different permutations of a string. For example, “ABCD”, “BCDA”, “CDAB”, etc., will have the Hash value, which will increase the chances of collision.
There’s an other approach to this problem, where we can use a multiplier with each character of the String to generate unique Hash values.
For example,
str[] = “ABCD”
(str[0] * X0 + str[1] * X1 + str[2] * X2 + str[3] * X3)%m
This Hash function will generate the value based on the ASCII value of the characters of the String, where X is any constant of our choice and m is the size of Hash Table.
3. Universal Hashing
This is generally a group of Hash Functions which are pre-defined to help us in storing of different types of keys and achieve the computation in O(1) time on an average.