A list comprehension is a special kind of expression that generates list elements in
a complex but intuitive way (well, “intuitive” once you understand it). Enclosed in
brackets we have a main expression, then a vertical bar, and then a series of parts
which are either a generator or a guard. The generators are made of a pattern, then
a left arrow and then an expression. The guards are just boolean-valued
(True
/False
) expressions.
The overall idea is that each generator generates values from a list; these values are then put together in all possible combinations to create a result for the overall list comprehension. The order of generating combinations is one of the tricky parts: the rightmost generators "cycle" through faster, whereas the leftmost ones go more slowly. You can remember this by thinking of an odomoeter on a car, where we eventually see all combinations of digits, but the righthand digit cycles the fastest:
How can we think about the process of generation? Imagine that we flow back and forth from left to right and back again across the parts of the generator according to these steps:
v1
, v2
, etc.). We can imagine circling the first result from this
list for future reference (later we will cross it off and move to circle the second one,
and so on through the list).
v1
circled from the first list, we go back to the pattern
of the first generator: we will match this against the value v1
and bind any variables
in the usual way. Of course, in many cases this pattern will be a simple variable, in
which case we just bind that variable to the value v1
. If the pattern fails to match,
however, then the generating step fails and we continue on to the next value.
True
or
False
) which may involve variables "from the outside" (the rest of the program), but also
(and usually) the variable we just bound from the generator. If this guard returns False
, then
the generation fails, and we will have to go back leftwards for another one. But if it
returns
w1
, w2
, etc.) and match them (one by one) against the
pattern in order to bind variables. If we succeed, we proceed to the right; if not, we fail
and go back to try another match from the next value. You can imagine again circling the
current successful match value, or crossing it out if it failed.
r1
) to add to the final result list for the
whole comprehension. The result is generated by evaluating the main expression on the left,
given all the variables we have bound during the generation process, from the various
pattern matches we have made.
w1
) and move to the next value (in this case w2
).
We again try to make a match, bind variables if successful, go to the right to generate a value, etc.
(Or, failing to make a match, we cross off that value and continue.)
When we have exhausted the rightmost list of values, we go back to the next generator to the left (skipping over any guards in the way) in order to cross off their current value and go to the next one. As we go from left to right again, we have to pass through the guards again, and we start completely fresh on any generators to the right. Specifically, we may have generated a new value for some variable from the left, and that variable might be crucial to some generating expression on the right. So we re-evaluate the expression to the right, and cycle through all its values, etc., as before.
Once again, the overall process is one of ebb-and-flow from left to right, with the rightmost
generator cycling the fastest, as on an odometer, and the leftmost generator cycling most
slowly. In the end, we generate a list of result values r1
, r2
, r3
,
etc., which are made by combining values from v1
, v2
, w1
, w2
,
etc. in all combinations (well, all those which are allowed by the guards and by the successful pattern matches).
One final (and subtle!) wrinkle: we actually don't generate these values immediately, but
rather "on demand": in other words, as the values of the result list r1
, r2
and so on are
demanded by some part of the Haskell evaluation process, we slowly eke them out according
to this process. But at any point along the way, we may have "paused" the process, holding
all our circled and crossed-out values, etc. in "stasis" in order to keep track of where we
were for the next time a result is "demanded". This is a pretty subtle issue, and certainly
not one that we will worry about on the exam.
By way of an example, here's the list comprehension we did in class: we are generating pairs of
values, (x,y)
, based on generators for the variables x
and y
.
We also have a guard in between, which is preventing the generation of any x
values
which are not even
.
In the picture, we have generated "candidate" values for x
, namely 1, 2, 3 and 4,
where 4 is the "current" value. Some of tese (namely 1 and 3) have failed the guard and were turned
back. But the 4 value has passed through the guard and we have made it to the y
part of the generation process. Here we have generated 4 successfully and so have added the pair
y = 4
binding and move on to y = 5
. Once we are done with this, we will circle
back through the guard and try x = 5
; this will fail to pass the guard and,
sicne we willthen be done with all x
values, we will be done with generating the list of pairs
as a whole