Four Solutions to the Segment Problem

Fritz Ruehr, for CS 254: Intro FP
Willamette University, Spring 2009

Segment: the problem specification

This is a "literate Haskell" file which describes several different solutions to the segment problem, from CS 254 Lab #3. You should be able to select this whole page, copy and paste it into a new file, name it Segment.lhs (notice the "l" in the extension!), then save and load it into Hugs.

According to the lab page, the problem is this:

Write a function that will take a "relation" r and break a list of values into "segments", where each of the successive pairs of list elements bear the relation to each other. In other words, the segments "break" the list between elements where the relation fails. This will allow us, for example, to break up a list into increasing segments (or non-decreasing ones). Your function should be realized as segment :: (a -> a -> Bool) -> [a] -> [[a]], and should work as shown below. (This one is pretty hard!)

The following two examples are also given:

Hugs> segment (<)  [1,2,3,0,1,2,2,4,6,8]
[[1,2,3],[0,1,2],[2,4,6,8]]

Hugs> segment (<=) [1,2,3,0,1,2,2,4,6,8]
[[1,2,3],[0,1,2,2,4,6,8]]

The problem can be solved using a number of different techniques, but all of these must address certain awkward aspects:

the need to compare two successive elements at a time using a relation (versus the use of a unary predicate with functions like takeWhile);
the need to "overlap" the pairings for successive comparisons;
the fact that the initial elements of segments are included unconditionally;
and the question of what to return for an empty list argument.

The last issue suggests that the function is not really well-specified; certainly it seems intended that:

for all r, xs:

    xs == concat (segment r xs)

But this leaves open the possibility of returning [[]] for an empty list argument, or for that matter scattering some empty lists elsewhere among the results. We will hope to employ Occam's razor and leave out any and all empty lists from our results, while perhaps allowing for the possibility that we might need the other option to make some recursion work out (although we won't).

Test cases and a simple test harness

To begin with, we name the module: in the literate style of file, we prefix each line of code with an explicit marker "> " (the non-code lines are left unmarked and are treated as comments).

> module Segment where

Before we get down to defining the various versions of the function, it makes sense to define some simple test cases: we'll define a few sample arguments, one longer sample result, and then also a function called check which will run a candidate segment solution on all the sample test data to see if it works (this makes testing alternate versions of the functions especially easy).

> fritz = "fritzruehr"
> 
> text = filter (/=' ') 
>        "ask not what your country can do for you but what you can do for your country"
> 
> segtext = [ "as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
>             "bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry" ]
> 
> try s = s (<=) text
> 
> nums = [1,2,3,0,1,2,2,4,6,8]
> 
> check :: (forall a. (a -> a -> Bool) -> [a] -> [[a]]) -> Bool
> 
> check s = and [ s (<=) fritz == ["fr","itz","ru","ehr"],
>                 s (<=) text  == segtext,
>                 s (<)  nums  == [ [1,2,3], [0,1,2], [2,4,6,8] ],
>                 s (<=) nums  == [ [1,2,3], [0,1,2,2,4,6,8] ]   ]

Segment1: "brute force" recursion

For a first attempt at a solution, let's try just using direct recursion. The usual idea here is to define the function for two cases, a nil list and one built from cons, then call the function recursively on the rest of the cons list, and ask how a larger result can be assembled from the smaller one. But for this function we will need to use deeper patterns: to make any real progress, we need to compare two consecutive elements, so we should use a pattern for at least two elements, such as (x:y:xs). (We will also need then to include cases for both a nil or empty list and one for a one-element list.)

segment1 r (x:y:ys) = ... segment1 r (y:ys) ...

Note that we want to "recurse" on the (slightly) shorter list (y:ys) rather than on the usual ys, since we want to compare y to its neighbor (if any) the next time around.

What else can we do on the right-hand side? Well, clearly we want our result to depend on a test (using r) comparing the two elements x and y: if they are not related, we can just depend on the recursive result to work out and add a singleton list containing x on the front of this result using cons.

segment1 r (x:y:ys) = if r x y then ... else [x] : segment1 r (y:ys)

But what about the other case? Well, we'll want to "snug" x into the list which is itself the first element of the recursive result.

This suggests two things: first, we could use an auxiliary function to do the "snugging" and, second, we could name the recursive result, so that we can refer to it more easily (and obviously), by name, in the two branches of the if:

segment1 r (x:y:ys) = if ... then ...(rest)... else ...(rest)...
                      where rest = segment1 r (y:ys)

The snug function is easily enough defined by cases, although it's not clear we'll ever need the case for the empty list (it's probably still a good idea to define it, since we might use this function from some other context another day, and the result we want is pretty clear). Putting it all together we get this:

> segment1 r []       = []
> segment1 r [x]      = [[x]]
> segment1 r (x:y:ys) = if r x y then snug x rest else [x] : rest
>                       where rest = segment1 r (y:ys)
> 
> snug x [] = [[x]]
> snug x (y:ys) = (x:y) : ys

Now we can test the function out using try, check, or just by hand.

Segment> try segment1
["as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
"bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry"]
Segment> check segment1
True

Segment2: divide and conquer ("snapping" off one list at a time)

Another very general approach to problems is called "divide and conquer": we break the problem up into smaller parts, then solve each one and combine them back together. Of course, the direct recursive solution does this as well, in some sense, but here we shoot for a more balanced and intuitive breakdown of the problem.

Specifically, we can imagine taking a list and "snapping off" the relevant front part (up to the point where the relation r fails), then repeating this process over again in order to "layer" the results back together again into the final nested lists.

Snapping of the front of the list naturally returns a pair of results: the snapped off part and the remainder of the original list, so we can write a snap function which takes a list and returns a pair of lists. We can once again use a deeper pattern to bring out the two elements to be compared, then call recursively on all but the first element. In this case, however, we can name the parts of the pair which come out of the recursive call (here called front and back). If the relation holds, we can just cons the first element x onto the first part of the pair; if it doesn't hold, we can return the singleton list of x as the whole first half, and the remainder of the list as the second half. Note in particular in the latter case that we don't even need to make the recursive call, since we don't need its results.

> snap r []       = ([], [])
> snap r [x]      = ([x],[])
> snap r (x:y:xs) = if x `r` y then (x:front, back) else ([x], y:xs)
>                   where (front, back) = snap r (y:xs)

Now the layer function just repeats this process, but in a more general way (i.e., we might be able to use layer for other purposes elsewhere). Given a function f which breaks a list into a pair, we just continually cons the first half of the results of f onto the front of a list of results, again naming the results of a recursive call for easy access. Now a version of segment (segment2) can be written just as the composition of layer and snap:

> layer f [] = []
> layer f xs = a : layer f b  where (a,b) = f xs
> 
> segment2 = layer . snap

Note that, although we are using layer on a function which happens to return a pair of lists, so that our final result is a list of lists, we might also use it on functions which return some other value and a list, so that we would get a list of that other value as our final result.

layer :: ([a] -> (b,[a])) -> [a] -> [b]

Segment3: using zipWith, takeWhile and dropWhile

Something about the way we separated a list off the front of snap's argument above suggests that we might use functions like takeWhile and dropWhile to split off the successive elements of the result. But these functions take predicates of a single argument, not relations of two. Still, think back to the Fibonacci example from lecture, where we zipped a list with its own tail together using the plus function: perhaps we could do something similar here.

Specifically, if we use zipWith r on the list and its own tail, we will get a list of booleans; we can join these together with the original values by zipping the two lists together, calling the resulting function mark:

> mark' r xs = zip (zipWith r xs (tail xs)) xs

This gives us almost what we need, but note that the position of the booleans is "off by one": we'd probably prefer to have the booleans corresponding to a particular list element be True right up through the element we want to include in the "snapped off" list ... but with this formulation, the last element we want to take from the list corresponds to a False value. Note also that we lose the last element off the list, since the list of booleans is one shorter and the zip therefore drops off the last element of the original list

Segment> mark' (<) [1,2,0,3,5,7,4]
[(True,1),(False,2),(True,0),(True,3),(True,5),(False,7)]

What we would really prefer is that we could "shift" the booleans over by one and keep all the original list elements, like this:

Segment> mark (<) [1,2,0,3,5,7,4]
[(True,1),(True,2),(False,0),(True,3),(True,5),(True,7),(False,4)]

We can get this latter version of mark (now without the "prime") by adding an extra True value at the front of the zipWith result: this has the effect both of treating the first element correctly as a non-boundary case and also fills out the length of the lists to be equal, so that we don't lose the last element:

> mark r xs = zip (True : zipWith r xs (tail xs)) xs

Now we might define a function which "peels" apart the list into sub-lists according to a predicate p (using the booleans generated by mark); this is like snap above, but doing the whole (recursive) job, as follows:

> peel' p xs = takeWhile p xs : peel' p (dropWhile p xs)

This idea is in roughly the right direction, but it has a problem: after the first sub-list is peeled off, the takeWhile function will get a False value, peel off an empty list, and then recursively continue to do this over and over again, leading to an infinite list of empty lists:

Segment> peel' even [2,4,6,7,8,10,11,14,16,18]
[[2,4,6],[],[],[],[],[],[],[],[],[],...

To prevent this problem, we can just make sure that we make progress in every step, by always including at least the first element in the sub-list: for the first sub-list this is always correct, and for subsequent ones it makes sure that we include the elements corresponding to False values in the next sub-list:

> peel p [] = []
> peel p (x:xs) = (x : takeWhile p xs) : peel p (dropWhile p xs)

Using this version of peel, we can define a version of segment (segment3) as follows: we mark the list argument using the relation r, then peel the result based on the first part of the pairs (the one holding the booleans). This will give us a list of lists (those that were peeled off) whose ultimate elements are pairs: we can strip off the boolean parts we no longer need using map (map snd):

> segment3 r xs = map (map snd) (peel fst (mark r xs))

Segment4: using zipWith and span

Finally, for this last version of segment, we know from studying the Prelude that combinations of takeWhile and dropWhile can sometimes be expressed using the span function. If we first mark the list as above, giving is the list of pairs of booleans and values, we can then use span to separate it into a pair of results (not unlike snap, but let's call it snip here). Now, if we repeatedly snip sub-lists off using layer, we can get a version called segment4:

> snip r xs = snds (span fst (mark r xs))
>             where snds (x,y) = (map snd x, map snd y)
> 
> segment4 = layer . snip

Testing time efficiency

Now that we have several different versions of the segment function, it's interesting to compare how efficient they are for some of our standard test cases. Note that these numbers are from the second run of the function on the argument, which can sometimes be a bit "quicker" (in reduction numbers): this is due to some of the work having been resolved the first time around. (In this case most of the extra work the first time around is probably due to the filter function used in the definition of the text sample).

version	fritz	text	change
segment1	333	1978	--
segment2	363	2158	+ 10%
segment3	428	2550	+ 30%
segment4	558	6930	+ 250%

Appendix: short, consolidated version of the code


-- Minus the discussion, without the "false start" functions
-- and inlining the mark function into segment3 and segment4

module Segment where

----- segment1

segment1 r []       = []
segment1 r [x]      = [[x]]
segment1 r (x:y:ys) = if r x y then snug x rest else [x] : rest
                      where rest = segment1 r (y:ys)

snug x [] = [[x]]
snug x (y:ys) = (x:y) : ys


----- segment2

snap r []       = ([], [])
snap r [x]      = ([x],[])
snap r (x:y:xs) = if x `r` y then (x:front, back) else ([x], y:xs)
                  where (front, back) = snap r (y:xs)
layer f [] = []
layer f xs = a : layer f b  where (a,b) = f xs

segment2 = layer . snap


----- segment3

peel p [] = []
peel p (x:xs) = (x : takeWhile p xs) : peel p (dropWhile p xs)

segment3 r xs = map (map snd) (peel fst (zip (True : zipWith r xs (tail xs)) xs))


----- segment4  (also uses layer from above)

snip r xs = snds (span fst (zip (True : zipWith r xs (tail xs)) xs))
            where snds (x,y) = (map snd x, map snd y)

segment4 = layer . snip


------ testing

fritz = "fritzruehr"

text = filter (/=' ') 
       "ask not what your country can do for you but what you can do for your country"

segtext = [ "as","knotw","h","aty","ou","r","cou","nt","ry","c","an","do","fory","ou",
            "bu","tw","h","aty","ou","c","an","do","fory","ou","r","cou","nt","ry" ]

try s = s (<=) text

nums = [1,2,3,0,1,2,2,4,6,8]

check :: (forall a. (a -> a -> Bool) -> [a] -> [[a]]) -> Bool

check s = and [ s (<=) fritz == ["fr","itz","ru","ehr"],
                s (<=) text  == segtext,
                s (<)  nums  == [ [1,2,3], [0,1,2], [2,4,6,8] ],
                s (<=) nums  == [ [1,2,3], [0,1,2,2,4,6,8] ]   ]