CS 448 Lab 3, Due 10/18/2011
Rote learning game player.
Phase 1: Pawn
Write a program that will allow a person to play hexapawn. Note: if you have a strong desire to program some other game, I'm willing to negotiate.
The rules
Hexapawn is played on a 3x3 board with 6 pawns. A player can
win in two ways: 1) advance a pawn to the opposite side (like promoting
a pawn in chess), or, 2) play in such a way that the other player has no
legal moves on their turn. It starts like this.
xxx
ooo
The pieces move like pawns in chess; they move one square straight ahead,
and take diagonally forward. Assuming x goes first there are three possible
first moves:
xx x x
xx
x x
x
ooo ooo
ooo
From the leftmost position, the second player has three possible responses:
xx xx
xx
o x ox
o
oo o o
o o
User Interface
Display the initial board, and allow the user to click and drag one pawn.
Make the board clear and easy to use; perhaps display the pieces as filled
ovals of different colors? Only allow legal moves. Inform the user
when the game is over and who won.
Include a control that will allow them to choose hexapawn, or octapawn
(same game, only 4x4 with 8 pawns), or decapawn (5x5, with 10).
Strategy
Your program, on its turn, should play at random (this will give the user
a feeling of prowess!).
Representation
As usual, how you chose to represent the information is critically important.
With a good representation, 3x3, 4x4, and 5x5 are identical, algorithm-wise;
with a poorly thought out representation they are 3 separate problems.
Phase 2: rote learning
Enhance your Pawn program so that it learns to play perfectly.
Start your program without any information about how to win. Do
not program any strategy; rather, make your program learn so that it does not
make the same losing move twice.
3x3
To begin, work with hexapawn. In this case it is almost sufficient to eliminate
moves which are answered by your opponent winning immediately. Record the positions
which lead to an immediate loss, and subsequently avoid making any move
that creates that position. E.g. when the first player plays:
xx.
..x (a)
ooo
we have three possible responses If we play
xx.
.ox (b)
o.o
then they can play
.x.
xox (c)
o.o
and we lose. At that point, if we add the position (b) to our
list of positions to avoid, then the next time we see (a) we can instead
choose one of the other two moves.
But... if, in position (b), x always takes, like this:
.x.
.xx (d)
o.o
... then even though it loses over and over,
your program will never learn not to make the bad (b) move.
So... If your
program reaches a position where it every possible moves is known to lead to a losing position,
then it should add that position its previous choice lead to to the list of positions to avoid.
4x4
Once your 3x3 plays perfectly, move on to 4x4.
You will find that it
is a little tedious to play against it as it learns. So, add a button to
train it, so you don't have to lead it down every losing path (if the first player runs minimax the second player will
learn *much* faster than if the first player plays at random!). I leave
it to you to decide whether the first player or the second wins with best
play.
Free advice
Let the computer play second for hexapawn (since the second player wins with best play).
Work on hexapawn first, the state space is much smaller.
Credit
Demo your working learning octapawn code by the due date, and send me email
containing an explanation of:
- what data structures you used
- which player wins at 4x4
- how many boards you had to store to play perfectly
- what was the most difficult part of completing this lab
- any extra special techniques you implemented.
Alternative
Implement rote learning on some other smallish game.