Fritz Ruehr, for CS 254: Intro FP
Willamette University, Spring 2009
This is a "literate Haskell" file which describes several different solutions to
the segment problem, from
CS 254 Lab #3.
You should be able to select this whole page, copy and paste it into a new file, name
it Segment.lhs
(notice the "l
" in the extension!), then save
and load it into Hugs.
According to the lab page, the problem is this:
Write a function that will take a "relation"r
and break a list of values into "segments", where each of the successive pairs of list elements bear the relation to each other. In other words, the segments "break" the list between elements where the relation fails. This will allow us, for example, to break up a list into increasing segments (or non-decreasing ones). Your function should be realized assegment :: (a -> a -> Bool) -> [a] -> [[a]]
, and should work as shown below. (This one is pretty hard!)
The following two examples are also given:
Hugs> segment (<) [1,2,3,0,1,2,2,4,6,8]
[[1,2,3],[0,1,2],[2,4,6,8]]
Hugs> segment (<=) [1,2,3,0,1,2,2,4,6,8]
[[1,2,3],[0,1,2,2,4,6,8]]
The problem can be solved using a number of different techniques, but all of these must address certain awkward aspects:
takeWhile
);
The last issue suggests that the function is not really well-specified; certainly it seems intended that:
for all r, xs:
xs == concat (segment r xs)
But this leaves open the possibility of returning [[]]
for an empty list
argument, or for that matter scattering some empty lists elsewhere among the results.
We will hope to employ Occam's razor and leave out any and all empty lists from
our results, while perhaps allowing for the possibility that we might need the other
option to make some recursion work out (although we won't).
To begin with, we name the module: in the literate style of file, we prefix each line of
code with an explicit marker ">
" (the non-code lines are left unmarked and
are treated as comments).
> module Segment where
Before we get down to defining the various versions of the function, it makes sense to
define some simple test cases: we'll define a few sample arguments, one longer sample
result, and then also a function called check
which will run a candidate segment
solution on all the sample test data to see if it works (this makes testing alternate
versions of the functions especially easy).
> fritz = "fritzruehr"
>
> text = filter (/=' ')
> "ask not what your country can do for you but what you can do for your country"
>
> segtext = [ "as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
> "bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry" ]
>
> try s = s (<=) text
>
> nums = [1,2,3,0,1,2,2,4,6,8]
>
> check :: (forall a. (a -> a -> Bool) -> [a] -> [[a]]) -> Bool
>
> check s = and [ s (<=) fritz == ["fr","itz","ru","ehr"],
> s (<=) text == segtext,
> s (<) nums == [ [1,2,3], [0,1,2], [2,4,6,8] ],
> s (<=) nums == [ [1,2,3], [0,1,2,2,4,6,8] ] ]
For a first attempt at a solution, let's try just using direct recursion. The usual
idea here is to define the function for two cases, a nil list and one built from cons,
then call the function recursively on the rest of the cons list, and ask how a larger
result can be assembled from the smaller one. But for this function we will need to use
deeper patterns: to make any real progress, we need to compare two consecutive elements,
so we should use a pattern for at least two elements, such as (x:y:xs)
.
(We will also need then to include cases for both a nil or empty list and one for a
one-element list.)
segment1 r (x:y:ys) = ... segment1 r (y:ys) ...
Note that we want to "recurse" on the (slightly) shorter list (y:ys)
rather
than on the usual ys
, since we want to compare y
to its
neighbor (if any) the next time around.
What else can we do on the right-hand side? Well, clearly we want our result to depend on
a test (using r
) comparing the two elements x
and y
:
if they are not related, we can just depend on the recursive result to work out and add a
singleton list containing x
on the front of this result using cons.
segment1 r (x:y:ys) = if r x y then ... else [x] : segment1 r (y:ys)
But what about the other case? Well, we'll want to "snug" x
into the list
which is itself the first element of the recursive result.
This suggests two things: first, we could use an auxiliary function to do the "snugging"
and, second, we could name the recursive result, so that we can refer to it more easily (and
obviously), by name, in the two branches of the if
:
segment1 r (x:y:ys) = if ... then ...(rest)... else ...(rest)...
where rest = segment1 r (y:ys)
The snug
function is easily enough defined by cases, although it's not
clear we'll ever need the case for the empty list (it's probably still a good idea to
define it, since we might use this function from some other context another day, and
the result we want is pretty clear). Putting it all together we get this:
> segment1 r [] = []
> segment1 r [x] = [[x]]
> segment1 r (x:y:ys) = if r x y then snug x rest else [x] : rest
> where rest = segment1 r (y:ys)
>
> snug x [] = [[x]]
> snug x (y:ys) = (x:y) : ys
Now we can test the function out using try
, check
, or just by hand.
Segment> try segment1
["as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
"bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry"]
Segment> check segment1
True
Another very general approach to problems is called "divide and conquer": we break the problem up into smaller parts, then solve each one and combine them back together. Of course, the direct recursive solution does this as well, in some sense, but here we shoot for a more balanced and intuitive breakdown of the problem.
Specifically, we can imagine taking a list and "snapping off" the relevant front part
(up to the point where the relation r
fails), then repeating this process
over again in order to "layer" the results back together again into the final nested lists.
Snapping of the front of the list naturally returns a pair of results: the snapped off
part and the remainder of the original list, so we can write a snap
function
which takes a list and returns a pair of lists. We can once again use a deeper pattern to
bring out the two elements to be compared, then call recursively on all but the first
element. In this case, however, we can name the parts of the pair which come out of the
recursive call (here called front
and back
). If the relation holds,
we can just cons the first element x
onto the first part of the pair; if it
doesn't hold, we can return the singleton list of x
as the whole first
half, and the remainder of the list as the second half. Note in particular in the latter
case that we don't even need to make the recursive call, since we don't need its results.
> snap r [] = ([], [])
> snap r [x] = ([x],[])
> snap r (x:y:xs) = if x `r` y then (x:front, back) else ([x], y:xs)
> where (front, back) = snap r (y:xs)
Now the layer
function just repeats this process, but in a more general way
(i.e., we might be able to use layer
for other purposes elsewhere). Given a
function f
which breaks a list into a pair, we just continually cons the first
half of the results of f
onto the front of a list of results, again naming the
results of a recursive call for easy access. Now a version of segment (segment2
)
can be written just as the composition of layer
and snap
:
> layer f [] = []
> layer f xs = a : layer f b where (a,b) = f xs
>
> segment2 = layer . snap
Note that, although we are using layer
on a function which happens to
return a pair of lists, so that our final result is a list of lists, we might also
use it on functions which return some other value and a list, so that we would get
a list of that other value as our final result.
layer :: ([a] -> (b,[a])) -> [a] -> [b]
Something about the way we separated a list off the front of snap's argument above
suggests that we might use functions like takeWhile
and dropWhile
to split off the successive elements of the result.
But these functions take predicates of a single argument, not relations of two.
Still, think back to the Fibonacci example from lecture, where we zipped a list with its
own tail together using the plus function: perhaps we could do something similar here.
Specifically, if we use zipWith r
on the list and its own tail, we will
get a list of booleans; we can join these together with the original values by zipping
the two lists together, calling the resulting function mark
:
> mark' r xs = zip (zipWith r xs (tail xs)) xs
This gives us almost what we need, but note that the position of the booleans is
"off by one": we'd probably prefer to have the booleans corresponding to a particular
list element be True
right up through the element we want to include
in the "snapped off" list ... but with this formulation, the last element we want to take
from the list corresponds to a False
value. Note also that we lose the last
element off the list, since the list of booleans is one shorter and the zip
therefore drops off the last element of the original list
Segment> mark' (<) [1,2,0,3,5,7,4]
[(True,1),(False,2),(True,0),(True,3),(True,5),(False,7)]
What we would really prefer is that we could "shift" the booleans over by one and keep all the original list elements, like this:
Segment> mark (<) [1,2,0,3,5,7,4]
[(True,1),(True,2),(False,0),(True,3),(True,5),(True,7),(False,4)]
We can get this latter version of mark
(now without the "prime") by adding an
extra True
value at the front of the zipWith
result: this has
the effect both of treating the first element correctly as a non-boundary case and also
fills out the length of the lists to be equal, so that we don't lose the last element:
> mark r xs = zip (True : zipWith r xs (tail xs)) xs
Now we might define a function which "peels" apart the list into sub-lists according to
a predicate p
(using the booleans generated by mark
); this is
like snap
above, but doing the whole (recursive) job, as follows:
> peel' p xs = takeWhile p xs : peel' p (dropWhile p xs)
This idea is in roughly the right direction, but it has a problem: after the first
sub-list is peeled off, the takeWhile
function will get a False
value, peel off an empty list, and then recursively continue to do this over and over
again, leading to an infinite list of empty lists:
Segment> peel' even [2,4,6,7,8,10,11,14,16,18]
[[2,4,6],[],[],[],[],[],[],[],[],[],...
To prevent this problem, we can just make sure that we make progress in every step, by always
including at least the first element in the sub-list: for the first sub-list this is always
correct, and for subsequent ones it makes sure that we include the elements corresponding to
False
values in the next sub-list:
> peel p [] = []
> peel p (x:xs) = (x : takeWhile p xs) : peel p (dropWhile p xs)
Using this version of peel
, we can define a version of segment
(segment3
) as follows: we mark
the list argument using the
relation r
, then peel
the result based on the first part
of the pairs (the one holding the booleans). This will give us a list of lists
(those that were peeled off) whose ultimate elements are pairs: we can strip off the
boolean parts we no longer need using map (map snd)
:
> segment3 r xs = map (map snd) (peel fst (mark r xs))
Finally, for this last version of segment, we know from studying the Prelude
that combinations of takeWhile
and dropWhile
can sometimes be
expressed using the span
function. If we first mark
the list
as above, giving is the list of pairs of booleans and values, we can then use span
to separate it into a pair of results (not unlike snap
, but let's call it
snip
here). Now, if we repeatedly snip
sub-lists off using
segment4
:
> snip r xs = snds (span fst (mark r xs))
> where snds (x,y) = (map snd x, map snd y)
>
> segment4 = layer . snip
Now that we have several different versions of the segment function, it's
interesting to compare how efficient they are for some of our standard test cases.
Note that these numbers are from the second run of the function on the argument,
which can sometimes be a bit "quicker" (in reduction numbers): this is due to some of the
work having been resolved the first time around. (In this case most of the extra work
the first time around is probably due to the filter
function used in the
definition of the text
sample).
version | fritz | text | change |
segment1 | 333 | 1978 | -- |
segment2 | 363 | 2158 | + 10% |
segment3 | 428 | 2550 | + 30% |
segment4 | 558 | 6930 | + 250% |
-- Minus the discussion, without the "false start" functions
-- and inlining the mark function into segment3 and segment4
module Segment where
----- segment1
segment1 r [] = []
segment1 r [x] = [[x]]
segment1 r (x:y:ys) = if r x y then snug x rest else [x] : rest
where rest = segment1 r (y:ys)
snug x [] = [[x]]
snug x (y:ys) = (x:y) : ys
----- segment2
snap r [] = ([], [])
snap r [x] = ([x],[])
snap r (x:y:xs) = if x `r` y then (x:front, back) else ([x], y:xs)
where (front, back) = snap r (y:xs)
layer f [] = []
layer f xs = a : layer f b where (a,b) = f xs
segment2 = layer . snap
----- segment3
peel p [] = []
peel p (x:xs) = (x : takeWhile p xs) : peel p (dropWhile p xs)
segment3 r xs = map (map snd) (peel fst (zip (True : zipWith r xs (tail xs)) xs))
----- segment4 (also uses layer from above)
snip r xs = snds (span fst (zip (True : zipWith r xs (tail xs)) xs))
where snds (x,y) = (map snd x, map snd y)
segment4 = layer . snip
------ testing
fritz = "fritzruehr"
text = filter (/=' ')
"ask not what your country can do for you but what you can do for your country"
segtext = [ "as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
"bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry" ]
try s = s (<=) text
nums = [1,2,3,0,1,2,2,4,6,8]
check :: (forall a. (a -> a -> Bool) -> [a] -> [[a]]) -> Bool
check s = and [ s (<=) fritz == ["fr","itz","ru","ehr"],
s (<=) text == segtext,
s (<) nums == [ [1,2,3], [0,1,2], [2,4,6,8] ],
s (<=) nums == [ [1,2,3], [0,1,2,2,4,6,8] ] ]