Edit

C++ hashing table in data structure

Last Edited By Krjb Donovan

Last Updated: Mar 11, 2014 07:51 PM GMT

1 Question
2 Question

Question

QUESTION: Hello..Zlatko,, I have some question to ask you regarding my project question.

This is the question :

Write a program to perform hash table using two different hashing functions and two different resolution methods to solve the collision when happened. The record to be inserted in the hashing table, must be contains of at least 4 fields.

Can you please help me on how i should start. Actually, I understand the hashing function but I got stuck on how i should start to code this program.

The question ask to choose any methods of hashing and i intend to choose the simplest one first, which is the direct method. Then, i should have at least 4 field of records. Is it means that i will going to make an output file like the .txt in order to save the records? Or just by make it in array? And also, I will going to have a dynamic array or just a static array? Can you please guide me steps by steps what i should make first to code this program because it really make me stuck..

Your kindness is much appreciated..Thank you so much..

ANSWER: Hello wale89. I would like some more time to think about some parts of your question but I can give you some ideas now. The problem states that the record must have at least 4 fields. I think this has nothing to do with input or output files. The data for the records could come from a file, but it's not necessary. Certainly there is no need for an output file.

Each record is a struct or a class that is stored in the hash table. For example, lets have a hash table that holds instances of type MyRecord. The record could be declared like this:

struct MyRecord {

   int field1;
   int field2;
   int field3;
   char field4[32];

};

In this example, I chose a record with 3 integers and a character array. The problem does not say what the fields should be, and it does not say how many fields are used in the hash function, so I'm not really sure why at least 4 fields were requested.

The hash table itself is an array of buckets. You can put the hash table into a class, with methods for adding and finding instances of MyRecord. At first its good to code the class specifically for your record type, but later you can use templates to make a generic hash table. For example:

class MyHashTable { public:

   MyHashTable(int bucketCount);

   bool add(const MyRecord* record);
   MyRecord* find(const MyRecord& key) const;

private:

   // hashValue calculates an index into the bucket array
   int hashValue(const *MyRecord record) const;

   // A record holder used for chaining
   struct RecordHolder
   {
       RecordHolder() { pRecord = NULL; pNext = NULL; }

       MyRecord* pRecord;
       struct RecordHolder* pNext;
   };

   RecordHolder *mBucketArray;
   int mBucketCount;

};

I like to start all my member variables with an 'm' character, but that's just my habit.

Here, I'm thinking about having a bucket array of type RecordHolder. The RecordHolder has a pointer to a MyRecord type for holding data, and a pointer to another RecordHolder. This type of record holder would be used when chaining is used to resolve collisions.

The constructor would look something like this:

MyHashTable::MyHashTable(int bucketCount) {

   mBucketCount = bucketCount;
   mBucketArray = new MyRecordHolder[bucketCount];

}

If you are allowed to use any part of the standard template library, then you can make your coding easier by using a std::list for chaining instead of making your own linked list. You should ask your instructor if that is allowed.

Now you need to implement the other three methods in your hash table class.

Does this help you to get started ? I will think about your problem a little more, and add maybe add to this answer. If you have more questions, just ask.

---------- FOLLOW-UP ----------

QUESTION: Hi..Zlatko,

Thanks so much for your kindness to help me.. I'm a begginers in Data Structure, therefore i really needs your help.

Just want to know, therefore we will going to have a hash table that implement what method? So far in the class, I have learned the theory of 6 methods which is Direct, Subtraction, Modulo-Division,Digit Extraction and Midsquare Rotation. In the class my lecturer just explain it by thoery that's why i having difficulty in order to implement it. Starting from scratch..

So far after reading your code, it make understand little bit..what i should do..The idea to have 3 int fields and 1 char fields is a good idea for me..my lecturer just gived the question and ask us to make it using our own way, therefore i thought, dont have to use template library.. because i'm not really well on it..unless you have the strong to teach me..

So, you said that i will have to implement the other three methods in my hash table class. Is it like the find, insert and delete function? Or like the one you have created.. bool add(const MyRecord* record); MyRecord* find(const MyRecord& key) const;

Why did you use the bool type for the add function? Can you explain it..

Therefore the program will behaves something like this..?

struct MyRecord {

  int field1;
  int field2;
  int field3;
  char field4[32];

};

class MyHashTable { public:

  MyHashTable(int bucketCount);

  bool add(const MyRecord* record);
  MyRecord* find(const MyRecord& key) const;

private:

  // hashValue calculates an index into the bucket array
  int hashValue(const *MyRecord record) const;

  // A record holder used for chaining
  struct RecordHolder
  {
      RecordHolder() { pRecord = NULL; pNext = NULL; }

      MyRecord* pRecord;
      struct RecordHolder* pNext;
  };

  RecordHolder *mBucketArray;
  int mBucketCount;

};

MyHashTable::MyHashTable(int bucketCount) {

  mBucketCount = bucketCount;  //can you explain this
  mBucketArray = new MyRecordHolder[bucketCount]; //can you explain this

}

Thanks so much for your kindness..

ANSWER: Hello wale. Before I go on to your new questions, I want to continue with what I started yesterday. I want to talk about how you would organize your program in a more professional way. You see yesterday that I had set up a hashValue function inside the hash table class to calculate the hash on a MyRecord. That is not really the best place to put the hash calculation code. If you put calculation code into MyHashTable, the hash table will forever be tied to MyRecord. It will not be useful for storing any other types of records. That is a poor program design.

It might make more sense to put the calculation code into MyRecord, since the calculation code will need access to the data in the record. However, that would not be a poor design too because MyRecord should not be aware that it is being stored on a hash table. After all, if MyRecord is going to have functions to help it be stored on a hash table, why not functions to help it be stored on a linked list, or a tree as well?

The other issue is that you need to implement more than one way of calculating a hash. One way of doing this is this:

int hashCalculation(MyRecord) {

  if (usingFunction_1)
  {
     calculate hash using some method
  }
  else if (usingFunction_2)
  {
      calculate hash using some other method
  }

}

This method has many disadvantages which I don't want to talk about right now.

The correct way to organize the program is to take what is changing, encapsulate it into a class, and have users of the class call it through an interface. In this case, it is the hash calculation that is changing. Don't worry, I will give you a complete sample program at the end.

First we define an interface to the hash function like this:

class HashCalculator { public:

   virtual int calculateHash(const MyRecord* record) const = 0;

};

Hash calculations will be done only through this interface. Do you understand the idea of virtual functions ?

Then you implement different hash functions, each in its own class, like this:

class HashFunction_1 : public HashCalculator { public:

   int calculateHash(const MyRecord* record) const
   {
       cout << "HashFunction_1 being called\n";
       // TODO implement calculation on the record
       return 0; // TODO return the real result
   }

};

Now, you need to pass a hash calculator object to the constructor of MyHashTable, so the constructor changes to look like this:

class MyHashTable { public:

   MyHashTable(int bucketCount, const HashCalculator& calc);

Finally, when the MyHashTable needs to calculate a hash value, it calls the calculator like this

int hashValue = myCalculator.caclculateHash(record);

You see that the calculation code is not in the MyHashTable class so the class is not tied to the contents of the records it is storing.

What I have described to you is called the strategy pattern, and it is used when you want different algorithms to be available without tying the code to any one particular algorithm. You should notice that when adding a second hash calculation method, no existing code has to be changed. Only new code has to be written. This a big advantage.

You can read more about the strategy pattern at http://en.wikipedia.org/wiki/Strategy_pattern

Ok, as I promised here is a sample program for you to experiment with. The things that you need to do are shown with comments like // TODO

// START OF CODE

include <stdlib.h>
include <iostream>

using namespace std;

struct MyRecord {

   int field1;
   int field2;
   int field3;
   char field4[32];

};

/*The HashCalculator is the interface to many different calculation functions Notice that the calculator is specific to MyRecord.

class HashCalculator { public:

   virtual int calculateHash(const MyRecord* record) const = 0;

};

//Here is one hash calculation function for MyRecord class HashFunction_1 : public HashCalculator { public:

   int calculateHash(const MyRecord* record) const
   {
       // TODO implement calculation on the record
       cout << "HashFunction_1 being called\n";
       return 0; // TODO return result
   }

}; /*The HashCalculator has no data, so we can have one instance of it and share it among all hash tables storing MyRecord

HashFunction_1 hf1;

// Here is another hash calculation function for MyRecord class HashFunction_2 : public HashCalculator { public:

   virtual int calculateHash(const MyRecord* record) const
   {
       cout << "HashFunction_2 being called\n";
       // TODO implement calculation on the record
       return 1; // TODO return result
   }

}; // Again, we need just 1 instance of the calculator class HashFunction_2 hf2;

// Here is the hash table class MyHashTable { public:

   // The hash table class has a reference to a hash calculator.
   // The calculator is set when the the hash table is created, and
   // cannot be changed for the life of the hash table
   MyHashTable(int bucketCount, const HashCalculator& calc) 
       : mHashCalculator(calc)
   {
       // Store the bucket count into a variable to be used later.
       mBucketCount = bucketCount;

       // Allocate memory for an array of buckets. 
       // The buckets will be of type MyRecordHoder
       // MyRecordHolder is a struct that points to a MyRecord
       // and a "next" pointer to point to another MyRecordHolder.
       // The next pointer is used to make a linked list for chaining.
       mBucketArray = new MyRecordHolder[bucketCount];
   }

   bool add(const MyRecord* record)
   {
       int hashValue = mHashCalculator.calculateHash(record);
       // TODO insert into bucket array based on hash value
       // TODO resolve collisions as needed
       return true; // TODO return false if record could not be added
   }

   MyRecord* find(const MyRecord& key)
   {
       // TODO return pointer to record or return NULL if not found
   }

private:

   // Hash calculation section
   const HashCalculator& mHashCalculator;

   // Bucket array section
   struct MyRecordHolder
   {
       MyRecordHolder() { pRecord = NULL; pNext = NULL; }

       MyRecord* pRecord;
       struct MyRecordHolder* pNext;
   };

   MyRecordHolder *mBucketArray;
   int mBucketCount;

};

/* Sample main creating 2 hash tables, each with its own type of hash calculator.

int main(void) {

   MyHashTable ht1(99, hf1);
   MyHashTable ht2(99, hf2);

   ht1.add(new MyRecord);
   ht2.add(new MyRecord);

   return 0;

}

// END OF CODE

That was a lot of information for you, and I've only spoken about organizing your program. Please read it slowly and ask questions about what you don't understand. You still need to know about hash functions and collision resolution. Your assignment says you need 2 different collision resolution methods. The collision resolution is probably the biggest part of the hash table. It will dictate how the table will be set up. For example, I set up the table in my sample code to use chaining. To use another type of collision resolution, you will probably need another type of hash table class. I don't think we can use the strategy pattern to encapsulate the collision resolution algorithm. For now, just work on one collision method.

When I started explaining things, I used pretty simple names, like MyRecord. That's OK for an explanation, but it would be better to use a more realistic example. Perhaps a student record, with a name, address, telephone number, student ID, etc.

Instead of a hash table named MyHashTable, you could have a class ChainingHashTable, or a class LinearProbingHashTable.

Again, I ask you to be patient. I need a little more time to refresh my memory about the different hashing functions you spoke about. It has been a long time since I was a student.

---------- FOLLOW-UP ----------

QUESTION: Hi my dear Zlatko..

Again, thanks so much for your much help that you have gaved to me.. I really appreciate it.. I will deeply immersed in the code that you gave me but it will take time for me. Actualy, now i'm in the examination week..on these coming tuesday i have Statistic paper and also coming saturday will going to have another paper. Therefore, I have to delay some time for this project in order to concentrate on it.

I really hope you can understand..I will give the follow up question after immersed in the code.This project will be due on the 8th April.. Therefore, I hope i will going to have enough time to complete it..

May God bless you..for all your kindness..Thank you,

Best Regards, wale89

Answer

Hello Wale.

I wrote a sample hash program in the hope that it will be useful to you. The hash table uses chaining to resolve collisions, and the program has 2 hash functions using the strategy pattern I wrote about earlier. The program compiles with the latest Microsoft compiler and with the GNU compiler.

I don't know how experienced you are in C++. Perhaps my talk of interfaces, and virtual functions has made things more confusing for you. Anyway, I am showing you the code as it would be written in C++.

If you get stuck on your assignment, ask me for help and I will work with whatever design you are using.

Good luck on your exams.

zlatko.c.help at gmail.com

include <stdlib.h>
include <iostream>
include <string>

using namespace std;

struct StudentRecord {

   // Some  convenience constructors
   StudentRecord(std::string name) 
   {
      mName = name; 
   }

   StudentRecord(std::string name, int id, std::string telephone, std::string email)
   {
       mName = name;
       mId = id;
       mTelephone = telephone;
       mEmail = email;
   }

   std::string mName;
   int mId;
   std::string mTelephone;
   std::string mEmail;

   /*
   Two student record are equal if they share the same student name
   (even though that is unrealistic in the real world)
   */
   bool operator==(const StudentRecord& other) const
   {
       return this->mName.compare(other.mName) == 0;
   }

};

/* A convenience function to print out a record

ostream& operator<<(ostream& out, const StudentRecord& rec) {

   out << rec.mName << "\n\t" 
       << "ID " << rec.mId << "\n\t" 
       << "Ph " << rec.mTelephone << "\n\t" 
       << "Em " << rec.mEmail;
   return out;

}

/* The HashCalculator is the interface to many different calculation functions Notice that the calculator is specific to StudentRecord.

class HashCalculator { public:

   virtual unsigned int calculateHash(const StudentRecord& record) const = 0;

};

/* Here is one hash calculation function for StudentRecord. It uses just the name field This was in the first edition of the K&R book. It is also found at http://www.cse.yorku.ca/~oz/hash.html

class VeryBadHashFunction : public HashCalculator { public:

   unsigned int calculateHash(const StudentRecord& record) const
   {
       // cout << "Very Bad Hash Function being called\n";
       int result = 0;
       for(size_t ix = 0; ix < record.mName.size(); ++ix) result += record.mName[ix];
       return result;
   }

}; /* The HashCalculator has no data, so we can have one instance of it and share it among all hash tables storing StudentRecord

VeryBadHashFunction veryBadHashFunction;

/* Here is another hash calculation function for StudentRecord. It also uses just the name field It is also found at http://www.cse.yorku.ca/~oz/hash.html

class Djb2HashFunction : public HashCalculator { public:

   virtual unsigned int calculateHash(const StudentRecord& record) const
   {
       // cout << "Djb2 Hash Function being called\n";
       int result = 5381; // Some magic number
       for(size_t ix = 0; ix < record.mName.size(); ++ix) 
           result = ((result << 5) + result) + record.mName[ix];
       return result;
   }

}; // Again, we need just 1 instance of the calculator class Djb2HashFunction djb2HashFunction;

// Here is a hash table that uses chaining to resolve collisions. class ChainingHashTable { public:

   /*
   The hash table class has a reference to a hash calculator.
   The calculator is set when the the hash table is created, and
   cannot be changed for the life of the hash table
   */
   ChainingHashTable(int bucketCount, const HashCalculator& calc) 
       : mHashCalculator(calc)
   {
       // create the array of buckets
       mBucketCount = bucketCount;
       mBucketArray = new RecordHolder[bucketCount];
   }

   bool add(StudentRecord* record)
   {
       // get the hash value using the calculator that was given to the constructor
       unsigned int hashValue = mHashCalculator.calculateHash(*record);
       // make the hash value fit into the bucket array
       hashValue = hashValue % mBucketCount;

       // Check for collisions
       if (mBucketArray[hashValue].pRecord == NULL)
       {
           // No collisions, put the record on the record holder
           mBucketArray[hashValue].pRecord = record;
       }
       else
       {
           // There is a collision.
           // Create a new record holder
           RecordHolder* new_rh = new RecordHolder;
           // put the record on the new record holder
           new_rh->pRecord = record;
           // put the new record holder on the end of the list
           RecordHolder* last_rh = findLastRecordHolder(hashValue);
           last_rh->pNext = new_rh;
       }
       // Would return false if for some reason the add could not be done
       return true;
   }

   StudentRecord* find(const StudentRecord& key)
   {
       // get the hash value using the calculator that was given to the constructor
       unsigned int hashValue = mHashCalculator.calculateHash(key);
       // make the hash value fit into the bucket array
       hashValue = hashValue % mBucketCount;

       if (mBucketArray[hashValue].pRecord == NULL) 
       {
           // There is no record for that hash value
           return NULL;
       }

       /*
       At this point, there is a record for the hash value.
       We go along the chain to find the record that matches the key.
       */
       RecordHolder* pRh = &mBucketArray[hashValue];
       while(pRh != NULL)
       {
           if (*(pRh->pRecord) == key)
           {
               // The desired record has been found, return it.
               return pRh->pRecord;
           }
           pRh = pRh->pNext;
       }

       // At this point, the desired record does not exist in the hash table
       return NULL;
   }

private:

   // Hash calculation section
   const HashCalculator& mHashCalculator;

   /********************************
   // Bucket array section
   ********************************/
   struct RecordHolder
   {
       RecordHolder() { pRecord = NULL; pNext = NULL; }

       StudentRecord* pRecord;
       struct RecordHolder* pNext;
   };

   /*
   findLastRecordHolder finds the last record in a chain
   at the given bucket array index
   */
   RecordHolder *findLastRecordHolder(unsigned int ix)
   {
       RecordHolder *last = &mBucketArray[ix];
       while(last->pNext != NULL) last = last->pNext;
       return last;
   }

   RecordHolder *mBucketArray; // the bucket array
   int mBucketCount; // the size of the bucket array

};

/* Sample main creating 2 hash tables, each with its own type of hash calculator.

int main(void) {

   // Create 2 hash tables, each with a different hash function
   ChainingHashTable ht1(10, djb2HashFunction);
   ChainingHashTable ht2(10, veryBadHashFunction);

   // For this test, we use only ht2
   ht1.add(new StudentRecord("Wale", 1, "555-1000", "wale@yyy.com"));
   ht1.add(new StudentRecord("Zlatko", 2, "555-1001", "zlatko@yyy.com"));
   ht1.add(new StudentRecord("Stephen", 3, "555-1002", "stephen@yyy.com"));

   cout << "\nTest 1\n";

   StudentRecord key("Stephen");
   StudentRecord *found = ht1.find(key);
   if (found) 
   {
       cout << "The key was found " << *found << endl;
   }
   else
   {
       cout << "The key " << key.mName << " was not found\n";
   }

   cout << "\nTest 2\n";

   key.mName = "Wale";
   found = ht1.find(key);
   if (found) 
   {
       cout << "The key was found " << *found << endl;
   }
   else
   {
       cout << "The key " << key.mName << " was not found\n";
   }

   cout << "\nTest 3\n";

   key.mName = "XXX";
   found = ht1.find(key);
   if (found) 
   {
       cout << "The key was found " << *found << endl;
   }
   else
   {
       cout << "The key " << key.mName << " was not found\n";
   }

   return 0;

}

Question

Hello, I have some question to ask you regarding this question.

This is the question :

Can you please help me on how i should start. Actually, I understand the hashing function but I got stuck on how i should start to code this program.

Your kindness is much appreciated..Thank you so much..

Answer

> Then, i should have at least 4 field of records. > Is it means that i will going to make an output file like the .txt in order to save the records? > Or just by make it in array?

To start with, just have a struct or class with four member variables to represent a record of four fields. Select one of them as the 'key' and write a hash function for that. The hash function would have to take into account the size of the array for the hash table.

> And also, I will going to have a dynamic array or just a static array?

Start with a simple static array; complete the rest of the code and make sure that it works correctly. You can easily convert it to use a dynamic array, if it is required later. The most commonly used hash table size is a prime number.

For more information and code fragments, see: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_hashtable.aspx

GETTING STARTED
Invite Others
Create

C++ hashing table in data structure

Contents

Question

Answer

Question

Answer