====== Classical Game Theory Algorithms ======

This class introduces classical game theory algorithms for zero sum games with alternate moves. "Zero sum" means that one player's victory is at the same time a defeat for the other player (sum of the rewards/penalties is equal zero). All presented algorithms will based on so called  [[https://en.wikipedia.org/wiki/Game_tree|"game tree" representation]].

Game tree, is (as name suggests) a directed acyclic graph where nodes represent possible game states, root is the initial state and edges are possible players' move changing the game state to another.

Below you can see a game tree of the 'tic-tac-toe' game, starting three moves before the ending.

{{ :pl:dydaktyka:ggp:tic-tac-toe-tree.png?500 | }}

We will use the 'ggp-base' framework, [[en:dydaktyka:ggp:gdl#validation|introduced together with the gdl language]].

===== First Legal Move =====

The simplest strategy is to perform the first legal move:

<code python>
move = first_legal_move()
perform(move)
</code>

This complicated algorithm is implementd in the ''ggp-base'', in ''org.ggp.base.player.gamer.statemachine.sample.SampleAlphabetGamer.java'' class. The bot sorts the legal moves lexically, then performs the first one.

The few things should catch our attention:

  * this class inherits from the ''SampleGamer.java'' class, which implements all the boring communication stuff
  * it is enough to implement the <code java>public Move stateMachineSelectMove(long timeout) throws Something</code> methods, which returns a move chosen by bot in the current turn.
  * <code java>getStateMachine();</code> returns state machine, simulating the game
  * <code java>getCurrentState();</code> return the current game state
  * <code java>getRole();</code> return player's role (as ''role'' in gdl)
  * State Machine's method <code java>getLegalMoves(<state>,<role>)</code> returns list of the possible moves for the given state and player

==== Testing ====

Testing of the algorithm can be done in two ways:

  - via the Kiosk app (''org.ggp.base.apps.kiosk''), introduced on the previous class. We can select the bot there and play with it a sparing.
  - via the GameServer (''org.ggp.base.app.server'') and Player (''org.ggp.base.app.player'') apps: 
    - run the GameServer and select a game
    - run the Player app and create two players (''Create'' button, you can also select player's type)
    - start new match in the GameServer app

==== Assignments ====

  - based on the ''SampleAlphabetGamer.java'' create a ''SampleRandomGamer.java'' bot, that performs random action. 
  - run tic-tac-toe match''SampleAlphabetGamer.java'' vs ''SampleRandomGamer.java''
 
===== Two-moves Lookahead =====

The more complicated algorithms search the game tree to find a better move. The strategy looking two moves ahead, could be:
  * Choose a move if:
    * it's a winning move 
    * it's not a losing move
    * it doesn't lead to a state, where your opponent can win in one move
Or:

<code>
current_move_candidate = null

for every possible move:
  if is a winning move:
    choose the move
  if is a losing move:
    forget about it
  if is worse than the current_move_candidate
    forget about it
  simulate performing the move and for every opponent's move after that:
    if the move is a losing move (for you):
      forget about it
  current_move_candidate = current_move
 
perform(current_move_candidate)
</code>

This strategy is implemented in the ''SampleSearchLightGamer.java'' class. 

It's worth to notice few things:

  * algorithms stops work after the timeou
  * you can simulate the move via: ''getNextState'' which takes as arguments state and all the players' moves. Thanks to the empty (''noop'') move, we can get the move via the ''getRandomJointMove'' method.
  * <code java>isTerminal(state);</code> checks if the state is teminal
  * <code java>getGoal(state, role);</code> returns the result of the game for the player. In the terminal states 100 means victory; 0, defeat. The intermediate values (e.g. 50) represent ties. In some games we can check this value to check which player is in a better situation.

==== Assignments ====

  * Run tic-tac-toe match: ''SampleSearchLightGamer.java'' vs ''SampleRandomGamer.java''.

===== MiniMax =====

The [[https://en.wikipedia.org/wiki/Minimax|MiniMax]] algorithm is a classical game-theory algorithms which results in an optimal move (as long as we have enough time to run it to the end).
It assumes, that our opponent will always perform the best move (for him). The MiniMax player always chooses a move that minimizes our loss. 

To learn more check the links below:

  * [[http://www.flyingmachinestudios.com/programming/minimax/]]
  * [[http://sandbox.thimbleby.net/algorithms/doku.php?id=minimax_search|an interactive visualization]]
  * [[http://neverstopbuilding.com/minimax]]
  * [[https://www.youtube.com/watch?v=OkP8BAwfO24|a video presenting the algorithm]]

Please make sure that you understand the algorithm.

==== Assignments ====

  - Create a MiniMax player based on the code below: <code java>
public class SampleMiniMaxGamer extends SampleGamer {

        // our role's index
	Integer roleIndex = 0;

	@Override
	public Move stateMachineSelectMove(long timeout)
			throws TransitionDefinitionException, MoveDefinitionException,
			GoalDefinitionException {
		long start = System.currentTimeMillis();
                
                // find our player's index
		roleIndex = getStateMachine().getRoleIndices().get(getRole());
		// find the best move
                Move selection = getBestMove(getCurrentState());
		

                // just to satisfy the ggp interface, not important
                long stop = System.currentTimeMillis();
                List<Move> moves = getStateMachine().getLegalMoves(getCurrentState(), getRole());
		notifyObservers(new GamerSelectedMoveEvent(moves, selection, stop - start));
		return selection;
	}

        // finds the best move using the MiniMax algorithm
	private Move getBestMove(MachineState state) throws MoveDefinitionException, TransitionDefinitionException, GoalDefinitionException{
		List<List<Move>> moves = getStateMachine().getLegalJointMoves(state);
		Integer score = 0;
		Move bestMove = moves.get(0).get(roleIndex);

		for(List<Move> move : moves){
			Integer result = getMinScore(move, getCurrentState());

			if (result > score){
				score = result;
				bestMove = move.get(roleIndex);
			}
		}
		return bestMove;
	}


	private Integer getMinScore(List<Move> jointMove, MachineState state) throws MoveDefinitionException, TransitionDefinitionException, GoalDefinitionException {
	     // TODO: 
             // 1) find a state after the move
             // 2) if the state is terminal, return result
             // 3) check all the possible moves
             // 4) find a move that gives as the minimum reward in the end 
             // 5) return the best reward we can achieve from the resulting state
             return 0;
	}

	private Integer getMaxScore(List<Move> jointMove, MachineState state) throws MoveDefinitionException, TransitionDefinitionException, GoalDefinitionException {
	     // TODO: 
             // 1) find a state after the move
             // 2) if the state is terminal, return result
             // 3) check all the possible moves
             // 4) find a move that gives as the highest reward in the end 
             // 5) return the best reward we can achieve from the resulting state
             return 100;
	}
}</code>
  - test if the MiniMax wins with other bots in 'tic-tac-toe'
  - the same with 'checkers'
  - implement timeout handling (return a random move)
  - test the 'checkers' again
  - bonus assignment: implement the [[https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/minimax_search/|NegaMax algorithm]]
 

===== Alpha-Beta Pruning =====

{{ :pl:dydaktyka:ggp:tree_pruning.jpg?200|}}
The MiniMax algorithm search all the tree's nodes which is unnecessary. To avoid it we can use so called [[https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning|Alpha-Beta cuts]]. The idea is that MiniMax doesn't have to check nodes, which are known to not have the better result than one we already have. 

Ending of [[https://www.youtube.com/watch?v=Ewh-rF7KSEg|the video]] shows an example of the Alpha-Beta pruning.

<WRAP center round tip 60%>[[https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/minimax_search_with_alpha-beta_pruning/|Interactive Visualization of the algorithm]].
</WRAP>
 
Please make sure you understand the algorithm, it belongs to the [[https://en.wikipedia.org/wiki/Branch_and_bound|branch and bound family]], a popular optimization technique.

==== Assignments ====

  - Implement the ''SampleAlphaBetaGamer.java''  based on the MiniMax player.
  - Run several matches ('checkers' or something similar): SampleAlphaBetaGamer.java'' vs ''SampleMiniMaxGamer.java''.