This class continues topic introduced in the previous lab. This time we will focus on the famous Monte Carlo Tree Search (MCTS). While MiniMax with alpha/beta pruning uses branch and bound approach to search the game tree exhaustively, some game trees are just too big to be explored, e.g. Go game).
MCTS on the other hand is a probabilistic algorithm, only sampling the game tree in the indeterministic manner.
Related info:
The Monte Carlo algorithms are named after the famous casions, where you can play the 'one-armed bandit' slot machines. The idea is that one has many different slot machines to choose from in the casino and every machine has different unknown winning chance. Problem to solve: having a fixed amount of game tokens, how to find a machine that maximizes profits?
In games a “slot-machine” is represented by a possible move with unknown impact on the game, and we have to find the best move, having limited amount of computational resources (tokens). The simplest approach would be to choose moves at random and simulate the game (such simulation is called playout) until the end. We do it as long as we can, then we check which move had the best wins to playouts ratio.
as long as we can: move = choose_random_move() score = payout(move) save_score(move) calculate_average_win_ratio() choose_move_with_the_best_ratio()
SampleMonteCarloGamer.java
Monte Carlo wasn't the smartest approach to the problem - we can be more intelligent than some random sampling. The idea behind a more advanced Monte Carlo Tree Search is to choose moves, according to a simple scoring formula:
where:
As you see, the UCT assigns high scores to machines that:
This way we balance both exploration and exploitation of the search space. The algorithm is called Tree Search because in essence it explores tree in a intelligent manner — you can say that it approximates MiniMax by randomized exploration.
The algorithm always starts in the current game state (root of the tree) and consists in four phases:
We repeat those phases as long as we can - finally we select a move that had the highest number of playouts - we know it's the best because algorithm favors the most promising moves when sampling.
Below you can see illustration of one MCTS cycle. There are two things worth of noting:
SampleMonteCarloGamer.java
, implement the SampleMCTSGamer.java
playerSampleMCTSGamer.java
vs SampleMonteCarloGamer.java
at some big game (chess, checkers?). Does any of them have an upper hand?