Understanding list comprehensions

A list comprehension is a special kind of expression that generates list elements in a complex but intuitive way (well, “intuitive” once you understand it). Enclosed in brackets we have a main expression, then a vertical bar, and then a series of parts which are either a generator or a guard. The generators are made of a pattern, then a left arrow and then an expression. The guards are just boolean-valued (True/False) expressions.

The overall idea is that each generator generates values from a list; these values are then put together in all possible combinations to create a result for the overall list comprehension. The order of generating combinations is one of the tricky parts: the rightmost generators "cycle" through faster, whereas the leftmost ones go more slowly. You can remember this by thinking of an odomoeter on a car, where we eventually see all combinations of digits, but the righthand digit cycles the fastest:

How can we think about the process of generation? Imagine that we flow back and forth from left to right and back again across the parts of the generator according to these steps:

  1. We start at the left-hand side with the opening bracket: think of this as setting up the left-hand bracket of a list for the results.
  2. Next we see the main expression: skip over that for now (but you might remember that this is the overall form for the results) to the vertical bar: this forms a kind of boundary that we will come back to.
  3. Say a generator comes first: we will evaluate the expresion part of it to get a list of values (call them v1, v2, etc.). We can imagine circling the first result from this list for future reference (later we will cross it off and move to circle the second one, and so on through the list).
  4. Now that we have a value v1 circled from the first list, we go back to the pattern of the first generator: we will match this against the value v1 and bind any variables in the usual way. Of course, in many cases this pattern will be a simple variable, in which case we just bind that variable to the value v1. If the pattern fails to match, however, then the generating step fails and we continue on to the next value.
  5. Once we have a value from the first generator, we proceed to the right: say the next part of the comprehension is a guard. The guard is a boolean-valued expression (True or False) which may involve variables "from the outside" (the rest of the program), but also (and usually) the variable we just bound from the generator. If this guard returns False, then the generation fails, and we will have to go back leftwards for another one. But if it returns True, we continue on to the right. (Recall from lecture that the guard "faces" left: he may turn you back on the way in from the left, but he doesn't bother you if you come back from the right as part of the "ebb and flow" of the process.)
  6. (and 7.) The next generator we come across works just like the last one: we generate values in a list from the expression (say w1, w2, etc.) and match them (one by one) against the pattern in order to bind variables. If we succeed, we proceed to the right; if not, we fail and go back to try another match from the next value. You can imagine again circling the current successful match value, or crossing it out if it failed.
  7. (and 9.) When we finally reach the right end of the comprehension (signalled by the right-hand bracket), we circle back temporarily all the way to the far left, past the vertical bar boundary, and generate a result value (say r1) to add to the final result list for the whole comprehension. The result is generated by evaluating the main expression on the left, given all the variables we have bound during the generation process, from the various pattern matches we have made.
  8. Finally, we come back to the last generator we encountered, cross off the circled value from its list (say w1) and move to the next value (in this case w2). We again try to make a match, bind variables if successful, go to the right to generate a value, etc. (Or, failing to make a match, we cross off that value and continue.)

When we have exhausted the rightmost list of values, we go back to the next generator to the left (skipping over any guards in the way) in order to cross off their current value and go to the next one. As we go from left to right again, we have to pass through the guards again, and we start completely fresh on any generators to the right. Specifically, we may have generated a new value for some variable from the left, and that variable might be crucial to some generating expression on the right. So we re-evaluate the expression to the right, and cycle through all its values, etc., as before.

Once again, the overall process is one of ebb-and-flow from left to right, with the rightmost generator cycling the fastest, as on an odometer, and the leftmost generator cycling most slowly. In the end, we generate a list of result values r1, r2, r3, etc., which are made by combining values from v1, v2, w1, w2, etc. in all combinations (well, all those which are allowed by the guards and by the successful pattern matches).

One final (and subtle!) wrinkle: we actually don't generate these values immediately, but rather "on demand": in other words, as the values of the result list r1, r2 and so on are demanded by some part of the Haskell evaluation process, we slowly eke them out according to this process. But at any point along the way, we may have "paused" the process, holding all our circled and crossed-out values, etc. in "stasis" in order to keep track of where we were for the next time a result is "demanded". This is a pretty subtle issue, and certainly not one that we will worry about on the exam.

By way of an example, here's the list comprehension we did in class: we are generating pairs of values, (x,y), based on generators for the variables x and y. We also have a guard in between, which is preventing the generation of any x values which are not even.

In the picture, we have generated "candidate" values for x, namely 1, 2, 3 and 4, where 4 is the "current" value. Some of tese (namely 1 and 3) have failed the guard and were turned back. But the 4 value has passed through the guard and we have made it to the y part of the generation process. Here we have generated 4 successfully and so have added the pair (4,4) to the result list. Next we will cross off the y = 4 binding and move on to y = 5. Once we are done with this, we will circle back through the guard and try x = 5; this will fail to pass the guard and, sicne we willthen be done with all x values, we will be done with generating the list of pairs as a whole