CS 241 Lab 6: Hashing frequencies (English words)

For this assigment:

  1. get the code we wrote in class, then clean it up (it needs it!);

  2. get a large word list of English words, such as the one located here, copy the list and save it to a text file;

  3. write Java code which will declare an array of size 2 to the 20 and increment a count in the array for each word it processes, reading the words from the text file you saved above (remember, you increment the count at the position given by the hash function of the string);

  4. print out a report describing the frequencies you encounter: for example, how many slots in the array went "empty" (had zero count), versus one entry, two, etc. What is the maximum collision rate? (Maybe 40 words all hashed to the same place? Maybe more?) If the maximum is reasonably small, you should be able to list the number of words with a given frequency of collision up through the maximum (i.e., for zero through forty, print the number of words that had that collision frequqncy, i.e., that number as their entry in the table).