CS 241 Lab 6: Hashing frequencies (English words)
For this assigment:
- get the code we wrote in class,
then clean it up (it needs it!);
- get a large word list of English words, such as the one located
here,
copy the list and save it to a text file;
- write Java code which will declare an array of size 2 to the 20 and
increment a count in the array for each word it processes, reading the
words from the text file you saved above (remember, you increment the
count at the position given by the hash function of the string);
- print out a report describing the frequencies you encounter: for example,
how many slots in the array went "empty" (had zero count), versus one
entry, two, etc. What is the maximum collision rate? (Maybe 40 words all
hashed to the same place? Maybe more?) If the maximum is reasonably small,
you should be able to list the number of words with a given frequency of
collision up through the maximum (i.e., for zero through forty, print the
number of words that had that collision frequqncy, i.e., that number as their
entry in the table).