An Informatio Offering: Minmax search

GreedyPlayerもSimpleHeuristicsPlayerも自分の手しか考えていない。これを、次の相手の手も考えて手を打つことにする。いわゆる先読みを行うことを考える。

相手の手を考えるというのは、相手がどの手を打ってくるかを予想することではなく、相手の最善手を考えること。各自分の手に対する相手の手の最善手を考え、その結果、最も有利になる自分の手を調べるということ。この際、相手の最善手もこちらの最善手を考慮してと再帰的に考えることができ、この過程はゲームの最終状態まで続く。これをminmaxアルゴリズム[1]といい、プログラムで表すと次のようになる。

abstract class MinmaxPlayer[N <: Node[N]] extends Player[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    val (m, s) = play(node)
    m
  }

  def play(node: N): (Move, Int) = {
    if (node.isTerminal) {
      return (Move.empty, score(node))
    }
    val moves = node.possibleMoves()
    var bestMove = Move.empty
    var maxS = Int.MinValue
    for (m <- moves) {
      val n = node.play(m).get
      val s = playOpponent(n)
      if (s > maxS) {
        bestMove = m
        maxS = s
      }
    }
    (bestMove, maxS)
  }

  def playOpponent(node: N): Int = {
    if (node.isTerminal) {
      return score(node)
    }
    val moves = node.possibleMoves()
    var minS = Int.MaxValue
    for (m <- moves) {
      val n = node.play(m).get
      val s = play(n)._2
      if (s < minS) {
        minS = s
      }
    }
    minS
  }

  def score(node: N): Int
}

しかし、このコードはreversiに対しては実際には動かない。このコードはゲームが終了するまでのすべての盤面を展開し、その中で最善となる手を探す。reversiの場合、その盤面数は10⁵⁸程度といわれている[2]。そこで、探索の深さを制限し、ある一定の深さまでで探索を打ち切ることによって、次の手を考える。

abstract class MinmaxPlayer[N <: Node[N]](val maxDepth: Int) extends Player[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    val (m, s) = play(node, maxDepth)
    m
  }

  def play(node: N, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }
    val moves = node.possibleMoves()
    var bestMove = Move.empty
    var maxS = Int.MinValue
    for (m <- moves) {
      val n = node.play(m).get
      val s = playOpponent(n, depth - 1)
      if (s > maxS) {
        bestMove = m
        maxS = s
      }
    }
    (bestMove, maxS)
  }

  def playOpponent(node: N, depth: Int): Int = {
    if (depth == 0 || node.isTerminal) {
      return score(node)
    }
    val moves = node.possibleMoves()
    var minS = Int.MaxValue
    for (m <- moves) {
      val n = node.play(m).get
      val s = play(n, depth - 1)._2
      if (s < minS) {
        minS = s
      }
    }
    minS
  }

  def score(node: N): Int
}

打ち切る際の盤面から、最終的な盤面でのscoreをいかに正確に近似できるかが鍵となるが、ここでは単純にそのときの盤面でのマーカーの数の差をそのまま用いると、depth=2での対戦成績は次のようになる。Greedyはちょうど深さ1の探索に相当するため、それには負けないが、Heuristicsを用いたものに負けている。

minmax2 D: 69, random L: 27, -: 4
random D: 29, minmax2 L: 68, -: 3
(28s)

minmax2 D: 68, greedy L: 28, -: 4
greedy D: 22, minmax2 L: 75, -: 3
(31s)

minmax2 D: 37, simple_heuristics L: 61, -: 2
simple_heuristics D: 70, minmax2 L: 29, -: 1
(27s)

depth=3での対戦成績は次のようになる。深さが奇数の探索は、自分の手番までなのが気になるが、戦績は向上している。

minmax3 D: 77, random L: 21, -: 2
random D: 23, minmax3 L: 75, -: 2
(161s)

minmax3 D: 84, greedy L: 14, -: 2
greedy D: 16, minmax3 L: 78, -: 6
(220s)

minmax3 D: 32, simple_heuristics L: 61, -: 7
simple_heuristics D: 56, minmax3 L: 41, -: 3
(208s)

depth=4での対戦成績は次のようになる。計算時間が指数的に増大するため、10回のみの対戦になっている。 simple heuristicsとやっと互角程度になっている。

minmax4 D: 7, random L: 2, -: 1
random D: 1, minmax4 L: 9, -: 0
(174s)

minmax4 D: 9, greedy L: 1, -: 0
greedy D: 2, minmax4 L: 8, -: 0
(216s)

minmax4 D: 7, simple_heuristics L: 2, -: 1
simple_heuristics D: 5, minmax4 L: 5, -: 0
(198s)

実際の対戦を見てみると、隅を取られながらも勝っていることが多いよう。

60: L-h8
 abcdefgh 
1OOOXXXXO1
2XOOXXXXO2
3XXOOOXXO3
4XXXOXOOO4
5XXOXOOXO5
6XXXXOOXO6
7XXXOXXOO7
8XXXXXXXO8
 abcdefgh 

Dark

直接、depthが異なるもので対戦してみても、探索の深さが増すにつれ強くなっていることが分かる。

minmax2 D: 5, minmax3 L: 4, -: 1
minmax3 D: 7, minmax2 L: 3, -: 0

minmax2 D: 0, minmax4 L: 10, -: 0
minmax4 D: 8, minmax2 L: 2, -: 0

minmax3 D: 2, minmax4 L: 8, -: 0
minmax4 D: 9, minmax3 L: 1, -: 0

[1] Von Neumann, J: Zur Theorie der Gesellschaftsspiele Math. Annalen. 100 (1928) 295-320 [2] Victor Allis (1994). Searching for Solutions in Games and Artificial Intelligence. Ph.D. Thesis, University of Limburg, Maastricht, The Netherlands.

1 件のコメント:

raffyrabas2022年11月9日 12:11
Try your hand in demo mode, and whenever you're ready, come back with actual bets to take part in bonus games and get presents for deposits. You will in a position to|be succesful of|have the ability to} discover a game with a live supplier that fits you. The games differ within the variety of playing cards and specific guidelines, which probably to|are inclined to} range from area to area. There are also extra tables for VIP players who need to wager massive sums and get great prizes. Every player's record is comprised of each money and bonus adjusts. On the https://thekingofdealer.com/pharaoh-casino/ off probability that you just put down a wager on on line casino game using a No-Deposit bonus and win the wager, you will rise up to $100 in rewards (contingent upon your record's favored money).
返信削除
返信

コメントを追加

2011/05/29

Minmax search

1 件のコメント: