Serial and binary search methods c++


A function called the hash function , maps keys to array indices. Suppose we name our hash function hash. If a record has a key of k , then we will try to store that record at location data[hash k ]. Using the hash function to compute the correct array index is called hashing the key to an array index. The hash function must be chosen so that its return value is always a valid index for the array. Given this hash function and keys that are multiples of , every key produces a different index when it was hashed.

Thus, hash is a perfect hash function. Unfortunately, a perfect hash function cannot always be found. Suppose we no longer have a student ID , but we have instead.

The record with student ID will be stored in data[3] as before, but where will student ID be placed? So there are now two different records that belong in data[3].

This situation is known as a collision. In this case, we could redefine our hash function to avoid the collision, but in practice you do not know the exact numbers that will occur as keys, and therefore, you cannot design a hash function that is guaranteed to be free of collisions. Typically, though, you do know an upper bound on how many keys there will be.

The usual approach is to use an array size that is larger than needed. The extra array positions make the collisions less likely. A good hash function will distribute the keys uniformly throughout the locations of the array. If the array indices range from 0 to 99, then you might use the following hash function to produce an array index for a record with a given key:. One way to resolve collisions is to place the colliding record in another location that is still open.

This storage algorithm is called open-addressing. Open addressing requires that the array be initialized so that the program can test if an array position already contains a record. With this method of resolving collisions, we still must decide how to choose the locations to search for an open position when a collision occurs There are 2 main ways to do so.

There is a problem with linear probing. When several different keys hash to the same location, the result is a cluster of elements, one after another. As the table approaches its capacity, these clusters tend to merge into larger and larger clusters. This is the problem of clustering. Clustering makes insertions take longer because the insert function must step all the way through a cluster to find a vacant location.

Searches require more time for the same reason. The most common technique to avoid clustering is called double hashing.

With double hashing, we could return to our starting position before we have examined every available location.

An easy way to avoid this problem is to make sure that the array size is relatively prime with respect to the value returned by hash2 in other words, these two numbers must not have any common factor, apart from 1. Two possible implementations are:. In open addressing, each array element can hold just one entry.

When the array is full, no more records can be added to the table. In this case, we can store the records in an array called data with only components. We'll store the record with student ID k at location:. The record for student ID is stored in array component data[7]. This general technique is called hashing. Each record requires a unique value called its key.

In our example the student ID is the key, but other, more complex keys are sometimes used. A function called the hash function , maps keys to array indices. Suppose we name our hash function hash. If a record has a key of k , then we will try to store that record at location data[hash k ].

Using the hash function to compute the correct array index is called hashing the key to an array index. The hash function must be chosen so that its return value is always a valid index for the array. Given this hash function and keys that are multiples of , every key produces a different index when it was hashed.

Thus, hash is a perfect hash function. Unfortunately, a perfect hash function cannot always be found. Suppose we no longer have a student ID , but we have instead. The record with student ID will be stored in data[3] as before, but where will student ID be placed? So there are now two different records that belong in data[3]. This situation is known as a collision.

In this case, we could redefine our hash function to avoid the collision, but in practice you do not know the exact numbers that will occur as keys, and therefore, you cannot design a hash function that is guaranteed to be free of collisions.

Typically, though, you do know an upper bound on how many keys there will be. The usual approach is to use an array size that is larger than needed.

The extra array positions make the collisions less likely. A good hash function will distribute the keys uniformly throughout the locations of the array. If the array indices range from 0 to 99, then you might use the following hash function to produce an array index for a record with a given key:. One way to resolve collisions is to place the colliding record in another location that is still open.

This storage algorithm is called open-addressing. Open addressing requires that the array be initialized so that the program can test if an array position already contains a record. With this method of resolving collisions, we still must decide how to choose the locations to search for an open position when a collision occurs There are 2 main ways to do so.

There is a problem with linear probing. When several different keys hash to the same location, the result is a cluster of elements, one after another. As the table approaches its capacity, these clusters tend to merge into larger and larger clusters.

This is the problem of clustering. Clustering makes insertions take longer because the insert function must step all the way through a cluster to find a vacant location. Searches require more time for the same reason. This results in O n performance on a given list. A binary search comes with the prerequisite that the data must be sorted. We can leverage this information to decrease the number of items we need to look at to find our target. We know that if we look at a random item in the data let's say the middle item and that item is greater than our target, then all items to the right of that item will also be greater than our target.

This means that we only need to look at the left part of the data. Basically, each time we search for the target and miss, we can eliminate half of the remaining items. This gives us a nice O log n time complexity. So you should never sort data just to perform a single binary search later on. But if you will be performing many searches say at least O log n searches , it may be worthwhile to sort the data so that you can perform binary searches.

You might also consider other data structures such as a hash table in such situations. A linear search starts at the beginning of a list of values, and checks 1 by 1 in order for the result you are looking for.

A binary search starts in the middle of a sorted array, and determines which side if any the value you are looking for is on. That "half" of the array is then searched again in the same fashion, dividing the results in half by two each time. Make sure to deliberate about whether the win of the quicker binary search is worth the cost of keeping the list sorted to be able to use the binary search. Open the book at the half way point and look at the page.

Ask yourself, should this person be to the left or to the right. Repeat this procedure until you find the page where the entry should be and then either apply the same process to columns, or just search linearly along the names on the page as before. Linear search also referred to as sequential search looks at each element in sequence from the start to see if the desired element is present in the data structure.

When the amount of data is small, this search is fast. Its easy but work needed is in proportion to the amount of data to be searched. Doubling the number of elements will double the time to search if the desired element is not present. Binary search is efficient for larger array.

In this we check the middle element. If the value is bigger that what we are looking for, then look in the first half;otherwise,look in the second half. Repeat this until the desired item is found. The table must be sorted for binary search. It eliminates half the data at each iteration. If we have elements to search, binary search takes about 10 steps, linear search steps.