An Example-based Approach to Code Discovery

One of the hardest parts about programming is knowing where to start. For some problems, we've seen that a standard "template" can give us a foot-in-the-door on starting up a function which takes a list (or two) argument. But this doesn't work in all cases, for example if there are no lists or if we want to base our definition on pre-existing Prelude functions: what to do then?

This short note explains a possible strategy for developing a function definition by starting with an example and working systematically with a specific two-column layout, toward a solution. You'll still need to try bits of code in your Haskell system window, and you'll still need some familiarity with the relevant Prelude functions (perhaps supported by tools like Hoogle), but I hope that the overall strategy and the layout idea will give you a place to start and a convenient way to record your progress.

As a working example, let's consider the function from lab called longest, where you are trying to get a list of all the longest words contained in some text, where words are defined as in the Prelude function words. There are several steps to solve here, but one of the main ones is to find out the (common) length of those longest words in the text. We can do this by starting with the text itself, then breaking it into words, then finding the length of each word, and finally getting the maximum of these lengths. Of course, that's easy to say once you have some experience, but how do you get there, and how do you keep track of your progress toward a solution?

Let's start by writing the name of a variable on the left, representing the information you (or your function) would have to start with. In this case it could be the name of the parameter of the function we are trying to define; let's use "text". Now let's put a good sample value, one with which you're familiar, on the right-hand side, using Haskell notation for the specific, concrete value. Here I'll draw my sample directly from the one given in the write-up (always a natural choice):

              text  =   "the biggest words here have seven letters"

Now let's work our way down from the top to a solution at the bottom: as we go, we'll first write the next step in our example on the right, putting down a related value that in some sense brings us closer to our ultimate goal (the maximum length). In this case, it seems natural that we have to break the text up into words, so we write down the list of words. Then we should fill in an expression on the left, again with an equals sign between, that will get us to that next data value, in terms of what we had on the previous line: in this case, it turns out, perhaps after some experimentation in the Haskell window or searching with Hoogle, that the expression words text will do the trick.

              text  =   "the biggest words here have seven letters"
              
                            { "OK, here's the text: how can I turn it into words?" }
            
        words text  =   ["the","biggest","words","here","have","seven","letters"]

(I've also added a line here in between the two steps that expresses what you might be thinking about the problem as you go along, a sort of "thought balloon comment": you probably wouldn't have to write this part down explicitly if you were doing it yourself, although it could be helpful to document how you came up with your solution.)

Now the idea is to continue to follow the template to guide you through: in each step, you need to come up with a next concrete (sample) value that will get you a little closer to your goal, write it down on the next line on the right-hand side, then find some code that will get you to that step, and write it down on the left-hand side, in terms of the variables and code that were already there. For example, a natural next step here is to get the lengths of each of the words, which suggests the use of the length function, along with a map to spread the length across the list of words. (Terms such as "each" and "spread" are often associated with a use of map, so it can actually help here to formulate the problem in English terms in your own words, and then look for such clues.)

                  text  =   "the biggest words here have seven letters"
                  
                                { "OK, here's the text: how can I turn it into words?" }
                
            words text   =  ["the","biggest","words","here","have","seven","letters"]
                  
                                { "Now how can I get the length of each word?" }
        
map length (words text)  =  [3,7,5,4,4,5,7]

Note how we write down the concrete sample value (here the list of actual lengths, [3,7,5,4,4,5,7]) first, as a goal, then search out a bit of code to do the trick and add it on the left. Also note how we have to "wrap" the code from the previous right-hand line in an application of map length, putting in parentheses around the previous line's code to make it an argument (this wasn't necessary in the previous step because the variable text was atomic, and thus already clearly a single argument).

For the next step in the development (and the final one for this example), we need to find the maximum of the numbers in the list: this is of course pretty easy to do by eye, on the right, but you need to discover the existence of the maximum function from the Prelude to get the code shown here on the left (or else find something else equivalent).

                          text  =   "the biggest words here have seven letters"
                          
                                        { "OK, here's the text: how can I turn it into words?" }
                        
                    words text  =   ["the","biggest","words","here","have","seven","letters"]
                          
                                        { "Now how can I get the length of each word?" }
                
        map length (words text) =   [3,7,5,4,4,5,7]
                          
                                        { "And how can I get the maximum of those lengths?" }
                
maximum (map length (words text)) =   7

Overall, the process involves a successive "honing in" on a final sample value on the right, and to a progressive accumulation of code on the left, as in this picture:

In general, of course, the sample data might not decrease in size (for example, when transforming one list into another) or might even involve an increase in size. But I hope the diagram is suggestive of how the two sides complement each other and help to both prompt and record a final solution. (And, in real cases, you would likely need multiple stages of this process, and to stitch the results together in just the right way: here, for example, you still need to determine how to use the maximum value to pull out all of the longest words, and how to make sure you have access to the text argument while you are computing the maximum (hint: think sub-definitions).

Summary

Of course, you might be able to solve a simple problem like this in one or two steps just in your head, without the need for a "strategy" or a "template". But for harder problems, or if you are having trouble finding a way to start, this approach has the advantage of prompting, of stressing concrete values and their abstract code equivalents, and of "automating" some of the book-keeping involved in re-constructing a solution from an example.