Recently there was a high-profile set
of matches between reigning champion chess AI stockfish and a
newcomer called AlphaZero. AlphaZero was created with the same deep
learning System that created AlphaGo, an AI that beat the world's
best at the game Go. In the 50 matches that AlphaZero played as
white, it won 24 of them and drew on the other 26. In the 50 games
that it played as black it won three of them and drew on the other
47.
This advantage towards the white player
may seem startling, however it's not out of line with other matches
between artificial intelligence programs at the world-class level,
nor is it out of line between matches between world-class human
players. Stockfish, which evaluates positions in a chess game in
terms of pawns of advantage starts the game with an advantage towards
white of 0.1 pawns. AlphaZero, on the other hand, has no idea how
many pawns of advantage white has, because it looks at the game
holistically which is a radically different method of analysis
compared to other modern AIs.
Part of the reason I bring up issues of
artificial intelligence is to look at how well the these various
systems will carry over to different chess variants rather than just
the orthrodox version of the game.
Let's start with Stockfish: Stockfish
is a system very much like Deep Blue and many of the other ones that
came between it and Stockfish. The difference being that Stockfish is
open source, meaning anyone can examine the code and edit it. This
and many of the artificial intelligence programs that came before it
run on a minimax principle, meaning that they try to choose the move
on the assumption that their opponent will choose the best counter
move in response to it, thus they try to pick the move which has the
worst best counter-solution. (They try to minimize their opponent's
maximum move quality.
To simplify, consider this abstract
game. You have two options: Option A allows your opponent to score 5
points. Option B allows your opponent to choose between a move that
scores 6 points, and a move that scores 2 points. Assuming that your
opponent will choose their best move, your best choice is to select
Option A, because it limits their score to 5. The fact that Option B
provides the possibility for your opponent to score only 2 points is
irrelevant. This is the minimax principle.
Most of what a traditional chess AI
does when selecting a move is to evaluate a particular position is
worth in terms of some abstract score, such as 'number of pawns'. The
value of pieces is straightfoward: a pawn is worth approximately 1, a
bishop or knight is worth about 3, and a queen is worth roughly 9.
However the position of these pieces also matters. Having a piece in
the middle or able to reach the middle at any point is worth a
premium. A 'passed pawn', or one which has no opposing pawn directly
ahead of it, is worth more than it would otherwise be, because of its
greater potential to be promoted. The alpha-beta algorithm (not
related to AlphaZero) contains set of parameters which decide how
much each board piece is worth on each square. Different machine
learning methods such as neural networks can be used to determine
what these parameters should be.
For variants of chess that are very
close to the original game such as Chess 960 (a.k.a Fischer Random
Chess) or Really Bad Chess, which both feature 8 by 8 grids, 16
pieces per side, and only the orthodox six pieces, an AI using the
alpha-beta algorithm should be able to play such games with few if
any complications.
These AIs work even after pieces have
been removed from the game so variants that use fewer pieces don't
produce any difficulties either. In practice, variants with different
board sizes are different arrangements such as Martin Gardener's mini
chess or Romanchenko Chess (shown in the figure, source: Jocly) work
well too as long as the value of any squares beyond the board are
hard-coded to zero. This also means in practice that an alpha-beta
algorithm can produce a viable chess AI on a board that is not a
perfect square or rectangle. However it can increase the
computational load the non-viable squares are considered, as they are
in Jocly's implementation of alpha-beta on Romanchenko Chess.
Some systems, including Deep Blue, take
advantage of chess literature, specifically for the orthodox game
also take advantage of openings and their reputations methods for
winning particular and games such as when you have a rook and a
bishop against an opponent who just has a rook. However, after the
opening and before the end game it's pretty much alpha-beta all the
way. [1]
Variants that included new pieces such
as fairy chess, or non-linear board movement such as Smess, the
Ninny's Chess, can also be supported by AI programs that uses the
alpha-beta algorithm. However these programs will need additional
manual training to be able to evaluate the value of different pieces
and space.
AlphaZero works on an entirely
different principle; it does not assume that its opponent is the best
possible opponent, one which will make the best possible counter
move. Instead, AlphaZero evaluates a candidate position by simulating
games of weighted random moves starting from the position to be
evaluated. The evaluation is simply the proportion those random-move
games that win* from AlphaZero's side. It evaluates the position
this way for each move that it could make, and simply chooses the
move that results in the best win proportion.
In these simulation games that
AlphaZero uses, the weighting of the moves is based on moves that are
likely to lead to a win based on games that AlphaZero played against
itself. For example, AlphaZero may assign more weight towards a move
that takes a piece over one that doesn't. It may also assign greater
weight towards moves that give it control of the centre of the board.
But these weight assignments would not be the result of any human
supervision.
Similarly, AlphaZero has no concept of
chess theory such as openings or their refutations, and it doesn't
have a book of endgames to rely upon. AlphaZero was trained simply by
giving the system the rules of chess, and letting it play many games
against different versions of itself. It's reasonable to assume from
here that AlphaZero would be able to handle many chess variants
without any additional modifications other than informing it of the
new rules. Furthermore a very similar training system could be given
to nearly any chess variant to produce an AI program that could play
that particular game.
* More exactly, the evaluation is
(Proportion of Wins) + 1/2*(Proportion of Ties)
[1] Beyond Deep Blue: Chess in the
Stratosphere, Monty Newborn
No comments:
Post a Comment