CS 241 Lab 2: Timing Java Programs

[Note: this is the new, fuller version of this lab. I originally posted a simpler summary in graphic form, which you can still find here … but this version now becomes the official one you should work against. Nothing of substance in the lab itself has changed, but there is more expository background and detail here.]

[… and this is the second, revised edition of the lab, with the new directional strategy descriptions. Thanks and a hat tip to Mason and Izzy!]


 

In this lab you will gather some empirical results about actual running programs and gain some personal appreciation for how extra-linear running times can affect an algorithm’s performance.

In lecture we have seen that accumulated linear effort can lead to quadratic running times: as in Joel Spolsky’s story of Schlemiel the painter, if we do successive amounts of work proportional to a simply increasing amount per step, the total is proportional to the square of the number of individual steps. This is the so-called “triangle rule”, whereby the sum of the integers from 1 to n is proportional to n2.

 

 

Why is this insight important as we look at linear structures like arrays, ArrayListss, and linked lists? There are many operations which can be naïvely implemented at each stage with linear effort, but whose total effort can therefore go quadratic. The dangers of making such a strategic mistake are exaggerated if the linear behavior is hidden behind an innocent-seeming method call on a library structure. It’s not that these structures are poorly implemented or wrong to use, but rather that we should understand the behavior of any structure well before using it, and then be careful to use it wisely.

 

The “shoving” problem

Let’s look in more detail at one specific operation supported by the ArrayList structure, namely the ability to add a new item at the left end. (The problem actually crops up with adding items in the middle of the structure as well, but it is easiest to see if we focus on the left end.) If we were to implement this operation in an array, we would see that we have to make room for the new item by “shoving” the current items down the array by one slot each, in order to leave room at the left end for the new item at index 0. Why? Because we cannot just allocate new memory at the left/low end of the array … and neither anywhere else in the array, even the right/high end: arrays are allocated once, when they are created, and always remain the same size.

 

 

But moving all the current items out of the way takes time proportional to the number of items (n) held in the array, i.e., it is O(n). Because of the way we ignore constant factors with O-notation, this is even true if we are only moving half the items, or any other fraction of n, just so long as the number moved is not constant, but somehow dependent on n.

The actual shoving process is itself a bit of a “bother”, if only because it is easy to get it wrong the first time (or two, or three, or …): we have to move the items starting at the right end and working left, in order not to over-write them as we go (i.e., we can’t work in the “natural” left to right order here). In addition, it is easy to get even the central assignment statement wrong, as the “physical” movement of the data is in the opposite direction from the assignment statement as written. (In other words, we write a[i+1] = a[i];, so the element on the left in the code actually corresponds to the slot on the right, and vice versa).

In any case, if we implement this ourselves for arrays, we can at least see the lurking danger of the quadratic time build-up, as we see that the shoving process involves an n-ary loop, and we have to do it for each item we add at the left. One time around, or two, or even a few is OK: but doing this each time for n items means that we will be doing more work than we might think.

But the ArrayList does this shoving (or something very much like it) internally as well: if we know this, and keep our guard up, it’s not a problem. But if we forget and blithely add to the left end of an ArrayList in a loop, we may be incurring much more cost than we need. The problem then is not the ArrayList itself, so much as our own forgetfulness or carelessness. But these mistakes are almost encouraged by the fact that the ArrayList presents the operation .add(0, item) almost as if it were “atomic”: in particular, it looks rather parallel to adding at the other end, which we do using the call .add(item). But adding at the right end of the structure is (often) relatively simple: just putting an item into an unused slot in the array and increasing the size by 1, and thus only O(1).

Of course, if the array held internally in the ArrayList is full, then even the relatively benign right-end addition can lead to a linear cost, since the ArrayList must then re-allocate a new array and copy the n items into it. Thus we have to keep track of the current size of the ArrayList carefully in order to recognize that we might “suddenly” incur such a big cost. On the other hand, this right-end extension cost should not be incurred too often, as it only happens when the internal array gets full: the left-end extension will incur a linear cost every time we add to the left end … so be careful not to do it in a loop!

 

The content the lab I: the StopWatch class

Whew! Sounds like managing an ArrayList (or an array, for that matter) is pretty complicated stuff! What’s a poor Journeyman Programmer™ to do?

Well, that’s the point of this lab: to build up your intuition, to give you some practice (and practical experience), to drive home the idea that certain kinds of operations should raise your suspicions (or perhaps better: your concerns, your caution). In this lab, you will write some code that adds items to both arrays and ArrayLists, in a couple of different ways, and for several different large values of n, so to speak. In addition to simply coding up these operations, you’ll also perform some timing experiments, i.e., you will keep track of and report the amount of time each operation takes up. Hopefully the experience will help you understand where and when you need to tred carefully with these structures.

In order to time your code fragments, you should implement a Java Stopwatch class. This is a fairly intuitive way to time algorithms, using the usual metaphor in which Java objects model real-world ones. Once you have the class, you should be able to grab yourself a fresh Stopwatch and use it to keep track of the running time of any code you wish. I won’t specify the interface for this class completely for you, but it would be natural to have at least some methods to start the watch, to stop it, to read it’s current time, and perhaps to reset it to zero. (You might choose to reset it each time it starts, but that doesn’t allow for cumulative timings, i.e., those which add up across stops and starts, and that might turn out to be useful someday.)

Internally, you can use the Java system method call java.lang.System.currentTimeMillis() to get the current time in milliseconds. Technically:

This method returns the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC (coordinated universal time).
You need’t worry too much about UTC and so on, but the idea is that if you “note” the time at one point (by saving the result of this call into a variable) and then subtract it from some later-noted time, the difference will be independent of the starting point, and will give you an elapsed time measured in milliseconds.

[Note that you should use variables of Java type long to store the time measurements.]

 

The content the lab II: filling on the left edge versus on the right edge

OK, back to the actual tasks that we want to time: what should they be? The idea will be to compare and contrast a couple of different structures (arrays versus ArrayLists) for various relatively large values of n, where this measures the number of items stored. But what actual operations should you perform?

Well, the idea is to underscore the danger of the “shoving process” from the perspective of timing, so of course we ought to measure that. But what about a similar operation, one which does not involve this shoving? From the perspective of ArrayLists, we can see two different forms of the add method, one which adds an item at a specified index and one which adds the item at the “right end” of the structure:

(Technically, of course, these are two different methods, with different parameter lists and presumably method bodies. But we think of them as related because of their names and their behaviors, even though their implementations may be different.)

On the surface, these look like very similar things, and we might (naïvely) assume that they both incur the same sorts of costs. We could use the simple, no-index form of the method to fill up an ArrayList from the left side or low-index end, and going to the right, or high-index end, by simply adding repeatedly. Likewise, we could use the indexed version to do the same job simply by using a loop variable to index the slots from low to high. For either of these, if we use a structure of Java ints (integers), we might just fill the structure with the running loop index values, naturally from low to high. This should yield a structure which, when printed, shows the numbers from 0 to n-1. Since these are the indices of the structure, perhaps it would be better to fill in with items that run from 1 to n, just to avoid confusion.

But this approach won’t exhibit the problematic behavior, as it won’t require the dreaded shoving: in order to illustrate that, we could shoot for structures filled in order by the integers, but insist that we fill them starting with the largest index and work our way down, rather than the other way ‘round. This can be justified by imagining that we are receiving the values from some other source, in an order that it determines, and that we don‘t. This sort of circumstance arises all the time in real-world programs, where there is a clearly-desired outcome (integer items in increasing order), but we face some difficulty in actual practice, such as receiving them over time in the opposite order. The question then is, how do we manage to produce the desired result, and what traps lurk for us on the way?

With regard to the filling approach or strategy, we might try one of two things:

So, we will assume that:

Then, in order to get the maximum possible effect (and experience), you should implement all of the following (at least):

 

 

Of course, you should probably test with much lower values of n, both so that you don’t have to wait so long for the output, and so that you can print out intermediate values (like the array) in order to see that you are doing the work correctly. (Note that you can print out ArrayLists in Java without using a loop: just print the relevant variable; arrays you must print with a loop.) For this reason, it might be a good idea to make your code take the n value as a parameter, and allow the user to test some small values first, perhaps with debugging output turned on. But when you actually run t he comparison timings, for all those combinations of structures and strategies, and all those high values of n, you’ll want to turn debugging output off, so as not to see millions of values, or to count the printing in the timings.

 

The content the lab III: summary of your output

What, then, should your output look like? For all the combinations above, you should show the actual timings, in milliseconds, of the structure-filling involved: that's sixteen timings all together, since two structures, times two directions, times four values of n all together.

By way of example, you might output something like this:

Timings for filling a structure with n items (integers from 1 to n)
(all timings are in milliseconds):

                Values of n:    1,000       10,000      100,000     1,000,000
            
Structure:

Arrays with 
    strategy:
    
    *   on left edge:              23          123        2,017        94,242
    
    *   on right edge:             42          126        1,280        12,734
    
ArrayLists with
    strategy:
    
    *   on left edge:              58          179        3,112       137,086
    
    *   on right edge:             37          140        6,803        62,115

[Note: these numbers are completely made up; the whole point is to get your own numbers, and to sit through the run-time yourself!]

As mentioned above, you might also want to include a short test phase first, allowing you (or me) to try some smaller sample values of n (like 10) and to verify (via debugging output) that the implementation is correct (both in filling strategy and in the final result). This will likely be necessary for debugging anyway, so why not set out to do it right from the beginning?

 

Appendix: using Java’s printf method

“OK, Fritz, we see your pretty, columnar output on the web page, but how are we supposed to get such nice-looking output from Java?

… you might say in dismay after seeing the sample above.

Well, as the saying goes, do I have a method for you! Namely, the printf method from the Java standard libraries. (Technically, in all fairness, we should note that printf comes from the C language, back in the 1970’s … but Java’s version is based on it, and similar in many details as well.) The idea behind printf is that you print out a kind of “template string” and some actual values to print, where the printed form of the values get substituted into the template string at particular places. The places in turn are marked by special codes in the template string—technically the template string is known as a format string and the codes are known as format specifiers.

An example may be more helpful; consider the line of code below:

System.out.printf("Here are %d and %d\n", 2, 3);
The first argument to printf is a string with a few "%d" codes thrown in; the other two arguments are some numbers: 2 and 3. The string gets printed out just as it looks, but with the printed versions of 2 and 3 substituted in where the %d codes were; so we see this on output:
Here are 2 and 3

(Notice that this is not a println, so we have to add the explicit newline character at the end of the format string on our own.)

The %d codes here are fairly simple: the “d” stands for decimal, which is the number base (10) the numbers are printed in. By using more complicated codes, we can also specify: