Dictionary Examples

Sets

In the next section we will have an example making central use of a dictionary. It will also make use of a set. The generic C# version is a HashSet, which models a mathematical set: a collection with no repetitions and no defined order. We use a HashSet for the words to be ignored. We use a HashSet rather than a List because the Contains method for a List has linear order, while the Contains method for a HashSet uses the same trick as in a Dictionary to be of constant order on average.

Here is a csharp session using the type HashSet of strings. The Add method, like the Remove method for Lists, returns true or false depending on whether the method changes the set:

csharp> var set = new HashSet<string>();
csharp> set;
{  }
csharp> set.Add("hi");
true
csharp> set;
{ "hi" }
csharp> set.Add("up");
true
csharp> set;
{ "hi", "up" }
csharp> set.Add("hi");  // already there
false
csharp> set;
{ "hi", "up" }
csharp> set.Contains("hi");
true
csharp> set.Contains("down");
false
csharp> var set2 = new HashSet<string>(new string[]{"a", "be", "see"});
csharp> set2;
{ "a", "be", "see" }

That lack of order for a HashSet means it cannot be indexed, but otherwise it has mostly the same methods and constructors that have been discussed for a List, including Add and Contains and a constructor that takes a collection as parameter.

Word Count Example

Counting the number of repetitions of words in a text provides a realistic example of using a Dictionary. With each word that you find, you want to associate a number of repetitions. A complete program is in the example file count_words/count_words.cs.

The central functions are excerpted below, and they also introduce some extra features from the .Net libraries.

This constructor pattern taking the elements of one collection and creating another collection, possibly of another type, is used twice: first to create a HashSet from an array, and later to create a List from a HashSet. The latter is needed so the List can be sorted in alphabetical order with its Sort method, used here for the first time. Our table contains the words in alphabetical order.

Also used for the first time are two string methods: the pretty clearly named ToCharArray and another variation on Split. An alternative to supplying a single character to split on, is to use a char array as parameter, and the string is split at an occurrence of any of the characters in the array. This allows a split on all punctuation and special symbol characters, as well as a blank.

We separate the processing into two functions, one calculating the dictionary, and one printing a table. To reduce the amount of clutter in the Dictionary, the function GetCounts takes as a parameter a set of words to ignore.

      /// Return a Dictionary of word:count pairs from parsing s,
      ///  excluding all strings in ignore.
      public static Dictionary<string, int> GetCounts(string s,
            HashSet<string> ignore)
      {
         char[] sep = "\n\t !@#$%^&*()_+{}|[]\\:\";<>?,./".ToCharArray();
         string[] words = s.ToLower().Split(sep);
         ignore.Add(""); //generated from consecutive splitting characters
         var wc = new Dictionary<string, int>();
         foreach (string w in words) {
            if (!ignore.Contains(w)) {
               if (wc.ContainsKey(w)) { //increase count of word already seen
                  wc[w]++;
               }
               else {                   // make a first entry
                  wc[w] = 1;
               }
            }
         }
         return wc;
      }

      /// Print each word and its count, if the count is at least minCount.
      public static void PrintCounts(Dictionary<string, int> wc, int minCount)
      {
         List<string> words = new List<string>(wc.Keys);
         words.Sort();
         foreach (string w in words) {
            if (wc[w] >= minCount) {
               Console.WriteLine("{0}: {1}", w, wc[w]);
            }
         }
      }

Look at the code carefully, and look at the whole program that analyses the Gettysburg Address.