An Informatio Offering: 2011/06

2011/06/27

Benchmark of the FFO test suite

これまでbenchmarkは、random playerを相手に、先手で100戦、後手で100戦を行い、各plyごとに平均をとってきた。回数が多いのは分散が小さくないため、安定した結果を得るには、ある程度の数が必要であるからである。しかし、この方法ではreversiがほとんどの場合で1試合に32 plyかかることから、6400 ply程度の実行が必要となる。1 plyを平均5秒で実行したとしても結果を得るのに1時間程度の時間がかかる。また、実際の大会などでは1 plyに1分程度の時間が与えられており、探索の深さももっと深く設定することが普通である。しかし、それでは結果を得るのに丸1日かかってしまう。また、random playerを相手にしているため、試合の終盤では圧勝の状況になっていることが多く、実践的なデータがとれているとはいえない。

そこで、Zebraの作者であるGunnarさんのホームページで紹介されている、The FFO endgame test suiteを用いてbenchmarkを行うことにする。

終盤だけしかないという欠点はあるが、まだ序盤や終盤で変化するようなプログラムは作成していないので現在のところ問題ない。

ここまでの手法のbenchmarkをとってみると次のようになる。

まず、negamaxにくらべてnegaalphaが、ずっと効率が良いことが確認できる。
negaalphaでは、odd depthよりも、その次のeven depthの方がノード数が少ないときがある。internal node数や実行時間ではそのような逆転は起きていない。実行時間はマシン環境によって変化するが、最も重要な指標である。leaf node数よりも、internal node数で比較した方がよいという意見もある[?]。

> xtabs(node ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negamax2                30       126        68        84       106
  negamax3               305      1193       586       578      1061
  negamax4              1325     13165      4517      7297     10652
  negamax5             12843    122563     37324     55434    103153
  negamax6             63589   1211104    282095    625740    987883
  negaalpha2              13        30        30        49        37
  negaalpha3             117       289       223        77       208
  negaalpha4             139       525       338       673       575
  negaalpha5             866      4332      3747      1388      4023
  negaalpha6            1278      6865      4063     10677      6969
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negamax2                88       111        86        98       116
  negamax3              1112      1143       736      1127      1092
  negamax4              7913     11557      8327      8756     13930
  negamax5             92259    107326     70239     92677    142210
  negamax6            707766   1093863    791984    727446   1635128
  negaalpha2              34        25        47        29        34
  negaalpha3             370       335        77       313       273
  negaalpha4             329       594       788       470       494
  negaalpha5            5777     10135       843      6681      2408
  negaalpha6            2723     11688     14354      9337      6707
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negamax2               131        94       111       143        63
  negamax3              1942       959      1089      1573       707
  negamax4             18665      9470     10863     20035      4796
  negamax5            260643     96971    104217    219668     54825
  negamax6           2491678    962986    978789   2691658    415888
  negaalpha2              33        59        63        43        23
  negaalpha3             452       183       204       350       239
  negaalpha4             626      1160       992      1214       586
  negaalpha5           15963      2032      1946     11389      2323
  negaalpha6           10816     31843      6498     29471     14222
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negamax2               143       134        85       193        66
  negamax3              1990      1319       758      2448       760
  negamax4             21662     18921      7733     35050      5828
  negamax5            283164    191384     74328    442133     65369
  negamax6           3175727   2631485    745547   6005655    562206
  negaalpha2              49        56        38        55        18
  negaalpha3             322       148       289       828       240
  negaalpha4             841      1532       458       904       210
  negaalpha5            5657      2013      5651     15508      3747
  negaalpha6           20688     26438      4485     15619      2592

> xtabs(inode ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negamax2                11        11        10         7        11
  negamax3                41       137        78        91       117
  negamax4               346      1330       664       669      1178
  negamax5              1671     14495      5181      7966     11830
  negamax6             14514    137058     42505     63400    114983
  negaalpha2              11        11        10         7        11
  negaalpha3              23        71        49        36        50
  negaalpha4             132       190       189       141       252
  negaalpha5             223      1358       979       729      1181
  negaalpha6            1302      2769      2491      2416      3143
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negamax2                15        13         9        14         9
  negamax3               103       124        95       112       125
  negamax4              1215      1267       831      1239      1217
  negamax5              9128     12824      9158      9995     15147
  negamax6            101387    120150     79397    102672    157357
  negaalpha2              15        13         9        14         9
  negaalpha3              56        64        23        51        67
  negaalpha4             251       287       220       256       150
  negaalpha5            1126      2781       241      1412       752
  negaalpha6            2685      5602      4855      5112      2270
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negamax2                16        11        11        12        11
  negamax3               147       105       122       155        74
  negamax4              2089      1064      1211      1728       781
  negamax5             20754     10534     12074     21763      5577
  negamax6            281397    107505    116291    241431     60402
  negaalpha2              16        11        11        12        11
  negaalpha3              58        41        50        73        44
  negaalpha4             357       305       273       358       251
  negaalpha5            2672       607       686      2355       509
  negaalpha6            6816      8116      2085      8971      5509
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negamax2                15        10         9        14        12
  negamax3               158       144        94       207        78
  negamax4              2148      1463       852      2655       838
  negamax5             23810     20384      8585     37705      6666
  negamax6            306974    211768     82913    479838     72035
  negaalpha2              15        10         9        14        12
  negaalpha3              55        45        52       143        40
  negaalpha4             350       291       167       290       192
  negaalpha5            1157       818      1505      3752       735
  negaalpha6            8782      5578      1911      5514      2643

> xtabs(time ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negamax2               162       212       183       190       221
  negamax3               386       671       520       545       625
  negamax4               794      1755      1229      1400      1773
  negamax5              2035     11600      4724      6012      8419
  negamax6              6909     79483     21643     40787     73706
  negaalpha2             148       166       166       172       187
  negaalpha3             278       446       405       326       382
  negaalpha4             467       600       569       614       689
  negaalpha5             689      1651      1463      1106      1514
  negaalpha6            1201      2089      1835      2181      2389
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negamax2               231       238       204       217       217
  negamax3               613       641       581       649       672
  negamax4              1648      1846      1521      1605      1851
  negamax5              7495      8294      6351      9081     11898
  negamax6             58783     84254     57276     56096    103962
  negaalpha2             202       200       186       188       175
  negaalpha3             437       436       261       433       442
  negaalpha4             649       741       715       662       580
  negaalpha5            1544      2261       778      1909      1202
  negaalpha6            1950      3732      3420      2964      2204
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negamax2               237       226       222       252       201
  negamax3               771       641       668       741       538
  negamax4              2292      1779      1805      2269      1359
  negamax5             18820      7853     10665     16513      5537
  negamax6            157724     75202     66660    171968     37524
  negaalpha2             197       208       200       208       177
  negaalpha3             469       356       404       485       388
  negaalpha4             762       858       751       862       704
  negaalpha5            2648      1105      1207      2321      1065
  negaalpha6            3720      5664      1937      5738      3626
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negamax2               266       229       200       285       205
  negamax3               807       697       570       896       616
  negamax4              2642      2161      1473      3234      1425
  negamax5             19360     16159      7499     32512      6098
  negamax6            227045    167510     48530    366104     46530
  negaalpha2             225       203       187       239       174
  negaalpha3             434       370       428       669       379
  negaalpha4             837       857       592       811       602
  negaalpha5            1641      1281      1814      3272      1327
  negaalpha6            5587      4255      1793      4273      2051

nagealphaとkiller heuristic, history heuristic, transposition tableの比較は次の通り。
多くの盤面でkiller heuristicとtransposition tableの組み合わせが最もよい成績を収めている。

> xtabs(node ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negaalpha6            1278      6865      4063     10677      6969
  killer6_32            1259      4463      2894      3733      4865
  history6              1445      6983      3980      3720      7391
  transposition6        1205      5159      3818      9939      6347
  transposition_k6      1220      3338      2695      3327      4397
  transposition_h6      1329      5744      3669      3511      6947
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negaalpha6            2723     11688     14354      9337      6707
  killer6_32            2491      6313      8238      6105      6608
  history6              2902      6068     16889      6299      9535
  transposition6        2120     10377     12686      8672      5633
  transposition_k6      2035      5122      7778      5599      5928
  transposition_h6      2179      5533     14725      6113      8294
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negaalpha6           10816     31843      6498     29471     14222
  killer6_32            8119     11697      3956     13370      4067
  history6              6781     19981      8367     13158      4789
  transposition6        9222     29411      5734     25134     12567
  transposition_k6      6826     11206      3598     11054      3530
  transposition_h6      5375     20090      8070     11771      4416
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negaalpha6           20688     26438      4485     15619      2592
  killer6_32           13649      6337      4357     12740      2370
  history6             30422      7353      5982     12378      2264
  transposition6       19080     22952      4152     14000      2248
  transposition_k6     12423      5298      4193     10983      2127
  transposition_h6     27967      6018      5387     10597      2062

> xtabs(inode ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negaalpha6            1302      2769      2491      2416      3143
  killer6_32            1277      2255      2170      1186      2595
  history6              1459      2849      2504      1151      3245
  transposition6        1302      2338      2446      2306      3017
  transposition_k6      1277      1988      2151      1118      2429
  transposition_h6      1430      2592      2439      1124      3287
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negaalpha6            2685      5602      4855      5112      2270
  killer6_32            2683      3511      2781      4105      1901
  history6              2927      3243      4742      3879      2924
  transposition6        2581      5307      4504      5037      2040
  transposition_k6      2578      3368      2682      4026      1765
  transposition_h6      2461      3127      4395      3947      2646
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negaalpha6            6816      8116      2085      8971      5509
  history6              4578      5938      2701      4077      3167
  killer6_32            6011      4977      2069      4757      3260
  transposition6        6376      7878      1990      8316      5345
  transposition_k6      5750      4947      1940      4282      3038
  transposition_h6      4142      6099      2649      3764      3046
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negaalpha6            8782      5578      1911      5514      2643
  killer6_32            6394      2333      1810      3845      2562
  history6             11729      2467      2252      3589      2374
  transposition6        8537      5139      1839      5233      2529
  transposition_k6      6183      2109      1759      3691      2485
  transposition_h6     11182      2267      2100      3325      2300

> xtabs(time ~ name + test, data=data)
                  test
name               end40.pos end41.pos end42.pos end43.pos end44.pos
  negaalpha6            1201      2089      1835      2181      2389
  killer6_32            1236      1871      1672      1489      2107
  history6              1272      2244      1812      1382      2538
  transposition6        1227      1951      1890      2198      2426
  transposition_k6      1308      1740      1822      1470      2123
  transposition_h6      1387      2220      2041      1514      2712
                  test
name               end45.pos end46.pos end47.pos end48.pos end49.pos
  negamax6             58783     84254     57276     56096    103962
  killer6_32            1971      2809      2460      2640      2095
  history6              2091      2592      3613      2470      2483
  transposition6        1859      3714      3293      3026      2061
  transposition_k6      1939      2698      2497      2691      2005
  transposition_h6      2026      2714      3627      2730      2532
                  test
name               end50.pos end51.pos end52.pos end53.pos end54.pos
  negaalpha6            3720      5664      1937      5738      3626
  killer6_32            3410      3707      1849      3680      2527
  history6              2797      4391      2313      3294      2521
  transposition6        3542      5527      1928      5426      3566
  transposition_k6      3335      3733      1824      3418      2431
  transposition_h6      2708      4745      2521      3297      2670
                  test
name               end55.pos end56.pos end57.pos end58.pos end59.pos
  negaalpha6            5587      4255      1793      4273      2051
  killer6_32            4559      2274      1807      3399      1981
  history6              7623      2309      1972      3188      1853
  transposition6        5498      4029      1893      4106      1991
  transposition_k6      4437      2232      1820      3298      2069
  transposition_h6      7493      2321      2116      3143      1963

2011/06/18

Transposition Table

Revised: 2011/06/25

探索の手順が変わっても同じ盤面にたどり着くことがある。その盤面からの探索の結果は当然、同じ結果となる。そこで、探索の過程で出てくる各盤面の探索結果を記憶しておき、同じ盤面にたどり着いたときにその探索結果を利用することで、探索にかかるコストを大幅に削減することができる。

記憶する探索結果はminmax値、最善手、そのノードからの探索の深さがある。同じ盤面にたどり着いたときには、探索の深さを比較しそれが十分であれば記憶してあるminmax値と最善手を用いる。もし、探索の深さが足りないときには記憶してある最善手をmove orderingで優先的に用い、新たに得られたminmax値、最善手、探索の深さでtransposition tableを更新する。

しかし、αβ探索では、必ずしもノードの正確な値が得られているわけではない。そこで、αカット、あるいはβカットされた結果のスコアである場合はそのことも保持しておくことで、記録されているスコアが正確な値であるのか、下限値あるいは上限値であるのかが分かる。

別の見方をすると、transposition tableを用いた探索は木構造を対象としていた探索をグラフに対する探索として捉え直したものである。ゲーム木は本来グラフ構造であるといえる。

しかし、すべてのノードを記憶することは探索の最大深さが大きくなるにつれノード数が指数的に増すため難しくなる。そこで、transposition tableはサイズの上限が決められたhash tableを用いて実装される。hash tableのconflictが起こった場合、tableに格納されているノードを書き換えるかどうかでいくつかの戦略が考えられる[1]。

探索の最大の深さが6plyではノード数はそれほど多くなく、すべてのノードをメモリ内に記憶することができるため、ここでは通常のhash tableを用いて実装している。

object TranspositionTable {
  val UNKNOWN = Int.MinValue

  val EXACT = 0
  val ALPHA = -1
  val BETA = 1
}

trait TranspositionTable[N <: Node[N]] {

  var transpositionTable: mutable.Map[BigInt, (Int, Int, Int, Move)] = _

  def initTranspositionTable() {
    transpositionTable = mutable.Map.empty
  }

  def probeNode(node: N, depth: Int, alpha: Int, beta: Int)
        : (Move, Int) = {
    val key = node.toSignature
    if (transpositionTable contains key) {
      val (d, score, flag, best) = transpositionTable(key)
      if (d >= depth) { // d == depth is safer.
        if (flag == TranspositionTable.EXACT) {
          return (best, score)
        } else if (flag == TranspositionTable.ALPHA) {
          if (score <= alpha)
            return (best, alpha)
        } else if (flag == TranspositionTable.BETA) {
          if (score >= beta)
            return (best, beta)
        }
      }
      return (best, TranspositionTable.UNKNOWN)
    }
    return (Move.empty, TranspositionTable.UNKNOWN)
  }

  def recordNode(node: N, depth: Int, score: Int, flag: Int, best: Move) {
    if (best == Move.empty) return
    val key = node.toSignature
    if (transpositionTable contains key) {
      val (d, s, f, b) = transpositionTable(key)
      if (d > depth
          || (d == depth && (f == TranspositionTable.EXACT
                          || flag != TranspositionTable.EXACT)))
        return
    }
    transpositionTable(key) = (depth, score, flag, best)
  }
}

abstract class TranspositionTablePlayer[N <: Node[N]](val maxDepth: Int) extends Player[N] with TranspositionTable[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    initTranspositionTable()
    val (m, s) = play(node, Int.MinValue + 1, Int.MaxValue, maxDepth)
    m
  }

  def play(node: N, alpha: Int, beta: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }

    // check transposition table
    val (recordedMove, recordedScore) = probeNode(node, depth, alpha, beta)
    if (recordedScore != TranspositionTable.UNKNOWN)
      return (recordedMove, recordedScore)
    
    var moves = node.possibleMoves().toList

    // use the recorded move if it is available
    if (recordedMove != Move.empty && (moves contains recordedMove)) {
      moves = recordedMove :: (moves filterNot {_ == recordedMove})
    }

    var bestMove = Move.empty
    var alpha_ = alpha
    for (m <- moves) {
      val n = node.play(m).get
      val (_, s) = play(n, -beta, -alpha_, depth - 1)
      if (-s >= beta) {
        // record the node into transposition table
        recordNode(node, depth, beta, TranspositionTable.BETA, m)
        return (m, beta)
      }
      if (-s > alpha_) {
        bestMove = m
        alpha_ = -s
      }
    }

    // record the node into transposition table
    if (alpha_ > alpha)
      recordNode(node, depth, alpha_, TranspositionTable.EXACT, bestMove)
    else
      recordNode(node, depth, alpha, TranspositionTable.ALPHA, bestMove)

    (bestMove, alpha_)
  }

  def score(node: N): Int
}

transposition tableはkiller heuristicやhistory heuristicと組み合わせて用いることができる。
組み合わせることで、同じ盤面がより出やすくなるため、相乗効果が期待できる。

実験結果は、reversiのdepth=6の探索では、killer heuristicとの組み合わせで最も効果があることを示している。

[1] Breuker et al. Replacement schemes for transposition tables. ICCA Journal (1994) vol. 17 (4) pp. 183-193

2011/06/15

History Heuristic

History Heuristicではsufficient moveを記録する。sufficient moveとは、cutoffとなった手あるいは、cutoffがなかった場合のbest minmax scoreとなった手のことである。

各moveごとにスコアを保持し、sufficient moveとなるごとにスコアを増やしていく。
各ノードにおいて、move orderingを考える際に、sufficient moveのスコアを用いることで「有望そうな」手を先に調べるというheuristicである。

また、killer heuristicと異なり、同じdepthのmoveだけではなく、すべてのdepthのmoveを考慮する。

trait HistoryHeuristic[N <: Node[N]] {

  val MIN_DEPTH_FOR_HISTORY = 2

  val numHistories: Int

  var historyMoves: mutable.Map[Move, Int] = _

  def initHistory() {
    historyMoves = mutable.Map.empty
  }

  def reorderByHistory(moves: List[Move]): List[Move] = {
    val histList = historyMoves.toList.sortBy(- _._2) map (_._1)
    var is = histList intersect moves
    if (!is.isEmpty)
      is ++ (moves diff is)
    else
      moves
  }

  def recordHistory(best: Move, depth: Int) {
    // depth > 1: Schaeffer, History Heuristic and
    // Alpha-Beta Search Enhancements (1989).
    if (depth >= MIN_DEPTH_FOR_HISTORY) {
      if (historyMoves contains best) {
        historyMoves(best) = historyMoves(best) + (1 << (depth - MIN_DEPTH_FOR_HISTORY))
      } else {
        historyMoves(best) = 1 << (depth - MIN_DEPTH_FOR_HISTORY)
      }
    }
  }

}


abstract class HistoryPlayer[N <: Node[N]](val maxDepth: Int, override val numHistories: Int) extends Player[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    initHistory()
    val (m, s) = play(node, Int.MinValue + 1, Int.MaxValue, maxDepth)
    m
  }

  def play(node: N, alpha: Int, beta: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }

    var moves = node.possibleMoves().toList
    var bestMove = Move.empty
    var alpha_ = alpha

    // use history
    moves = reorderByHistory(moves)

    for (m <- moves) {
      val n = node.play(m).get
      val (_, s) = play(n, -beta, -alpha_, depth - 1)
      if (-s >= beta) { // beta cut
        // record the move into history
        recordHistory(m, depth)
        return (m, beta)
      }
      if (-s > alpha_) {
        bestMove = m
        alpha_ = -s
      }
    }

    // record the best move into history
    if (alpha_ > alpha)
      recordHistory(bestMove, depth)

    (bestMove, alpha_)
  }

  def score(node: N): Int
}

スコアには2^{(leafまでのdepth)}を用い、また、leafのそばのnodeではsufficient moveという扱いにはしていない[1]。

実際にreversiでrandom playerを相手にした結果は、killer moveの方がむしろいいという結果に。
reversiだと手が進むと盤面のmarkerの数は増えていくので、ply違いで共通の盤面というのは基本的になく、何手か進んだ後の盤面で共通のいい手というのは盤面の右のほうと左のほうで別々に展開するという状況しかない。

[1] Schaeffer. The history heuristic and alpha-beta search enhancements in practice. Pattern Analysis and Machine Intelligence, IEEE Transactions on (1989) vol. 11 (11) pp. 1203-1212

2011/06/13

Killer Moves

αβ探索で最も効率良く枝刈りが行われるのは、一番、最初に選択される探索パスが常に最善のものであるときである。

探索木がuniformであるとは、すべての中間ノードの分岐数が一定の数wであり、すべての端末ノードが同じ高さdにあることという。探索木がuniformであるとすると、MinMax探索はすべてのパスを探索することから調べる端末ノードの数はw^d個だけあるのに対して、αβ探索では最善の場合、w^floor(d/2)+w^ceil(d/2)-1個となる[1]。

つまり、αβ探索を効率良く実行するためには、どの手から探索をするかが重要となり、これをmove orderingとよぶ。

move orderingを改善するための一つの手法としてkiller heuristicという手法がある。これは、探索木のある場所で枝刈りが行われたときに、枝刈りをおこした"killing" moveを、次に同じ深さのノードに到達した際に、その手が可能であれば先に調べるというものである。

NegaAlphaを元にした、プログラムは次のとおり、

trait KillerHeuristic[N <: Node[N]] {

  val numKillerMoves: Int

  var killerMoves: Array[List[Move]] = _

  def initKillerMoves(maxDepth: Int) {
    killerMoves = Array.fill(maxDepth) { List.empty }
  }

  def reorderByKillerMoves(i: Int, moves: List[Move]): List[Move] = {
    val km = killerMoves(i)
    var is = km intersect moves
    if (!is.isEmpty)
      is ++ (moves diff is)
    else
      moves
  }

  def recordKillerMove(i: Int, move: Move) {
    val km = killerMoves(i)
    if (km contains move) {
      killerMoves(i) = move :: (km filterNot (_ == move))
    } else {
      killerMoves(i) = (move :: km) take numKillerMoves
    }
  }

}


abstract class KillerHeuristicPlayer[N <: Node[N]](val maxDepth: Int, override val numKillerMoves: Int) extends Player[N] with KillerHeuristic[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    initKillerMoves(maxDepth)
    val (m, s) = play(node, Int.MinValue + 1, Int.MaxValue, maxDepth)
    Log.i("killer_moves", numKillerMoves.toString + "," + (killerMoves map { _.length }).mkString(","))
    m
  }

  def play(node: N, alpha: Int, beta: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }

    var moves = node.possibleMoves().toList

    // reorder by killer moves
    moves = reorderByKillerMoves(depth - 1, moves)

    // nega-alpha search
    var bestMove = Move.empty
    var alpha_ = alpha
    for (m <- moves) {
      val n = node.play(m).get
      val (_, s) = play(n, -beta, -alpha_, depth - 1)
      if (-s >= beta) { // beta cut
        // record the killer move
        recordKillerMove(depth - 1, m)
        return (m, beta)
      }
      if (-s > alpha_) {
        bestMove = m
        alpha_ = -s
      }
    }
    (bestMove, alpha_)
  }

  def score(node: N): Int
}

深さ6の探索で、保持するkiller move数を変化させて実験した結果は次の通り。αβ探索の結果も比較のためにプロットしている。

killer move数は1つだけのときは、少し悪いが他はほぼ同程度である。

また、1試合あたりの総ノード数の平均を表にすると次のようになる。最も小さなkiller move数が32のときで、αβ探索に対して48.4%の改善となっている。

1        128         16          2         32          4 
 112789.62   98864.71   97492.46  108800.69   93997.23   94617.41 
        64          8 alpha beta 
  96379.85   98554.55  182336.39

ここで、32のときに最小となっているが、depthが深くなるにつれてkiller move数は増える傾向にあり、depth==6のときに最大30~40個のkiller moveが保持されていた。

killer moveの処理の分、探索が遅くなるが、時間でみてもほぼ同じく33.4%の改善となっている。interior nodeでの重い処理以上に、枝刈りの効果のほうが大きいことが分かる。

1        128         16          2         32          4 
  16365.91   15681.85   15086.16   16000.39   15284.74   14969.75 
        64          8 alpha beta 
  15488.62   15192.91   22964.40

[1] Knuth, D. E., and Moore, R. W., "An Analysis of Alpha-Beta Pruning". Artificial Intelligence Vol. 6, No. 4, pp. 293–326 (1975).

[2] Akl, S.G., Newborn, M.M., "The principle continuation and the killer heuristic". In Proceedings of the ACM Annual Conference, pp. 82–118 (1977).

2011/06/10

MinmaxとAlphaBetaの探索ノード数

実際のreversiで、Minmaxに対してAlphaBetaはどの程度、調べるnode数を削減しているのかを調べた。

ルールを守る範囲で無作為に手を打つplayerを相手に、各plyごとに調べた端末node数をプロットしたものが次の図である。

縦軸はnode数で対数軸となっている。ばらつきがかなりあるが、平均をとったものが次の図となる。
＃偶数手の方が奇数手より調べるnode数が少なくなる傾向があるのはどうしてだろう？

plyごとのnode数の変化に関する傾向はMinmaxとAlphaBetaで違いがみられないので、単純に1試合あたりのnode数の平均を調べると次のようになる。

depth      MinMax   AlphaBeta
    2    1569.445     734.205
    3   13429.935    3182.210
    4  116136.220   11957.795
    5 1225244.320   42139.270
    6          NA  182336.385

大雑把にMinmax手法では探索の深さが１増すごとに9倍程度のnodeを調べていることが分かる。それに対して、AlphaBetaでは探索の深さが１増すごとにおよそ4倍程度のnode数の増加となっている。その結果、探索の深さが深くなるにつれてAlphaBetaの探索node数はMinmaxに比べると劇的に小さなものとなる。

各depthにおける探索時間(ms)は次の通り。

depth    MinMax AlphaBeta
    2   123.035    95.960
    3   910.690   311.085
    4  7931.905  1559.270
    5 73328.305  4320.155
    6        NA 22964.395

2011/06/08

Nega Alpha-Beta

Alpha-Beta探索もMinmax探索同様にnega-実装によってコードサイズをほぼ半分にできる。
コードサイズが半分になるとキャッシュが効いて速く・・・となるかどうかは確認していない。

abstract class NegaAlphaBetaPlayer[N <: Node[N]](val maxDepth: Int) extends Player[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    val (m, s) = play(node, Int.MinValue + 1, Int.MaxValue, maxDepth)
    m
  }

  def play(node: N, alpha: Int, beta: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }
    val moves = node.possibleMoves()
    var bestMove = Move.empty
    var alpha_ = alpha
    for (m <- moves) {
      val n = node.play(m).get
      val (_, s) = play(n, -beta, -alpha_, depth - 1)
      if (-s >= beta) { // beta cut
        return (m, beta)
      }
      if (-s > alpha_) {
        bestMove = m
        alpha_ = -s
      }
    }
    (bestMove, alpha_)
  }

  def score(node: N): Int
}

- Int.MinValue がIntから桁あふれしてしまうことに注意。

探索木はminノードのスコアが反転する以外はalpha-betaと同じになる。

Alpha-Beta

branch-and-boundはmin{bound, max{...}, ...}という計算を考えるとmaxの中にbound以上の値が出たときには、それ以上maxの中を考える必要がないということに基づいて枝刈りを行う。max{bound, min{...}, ...}も同様である。

さらに、この考えを進めることができる。max{2, min{max{min{3, 2, ...}, max{...}, ....}, min{...}, ...}, ...}という探索木を考える。min{3, 2, となったところで、このminが2以下の値を返すことが確実になる。するとこの値を取り囲むmaxおよびminがこの値をたとえ採用したとしても、もっとも外側のmaxがこの値（関係した手）を採用することがないことが分かる。

つまり、maxの計算である時点での最大値αはそれが含むすべてのminの計算を続けるかを判断する際の下限値となる。同様に、minの計算における最小値βはmaxの計算の下限値となる。

この上限値αと下限値βを用いた探索をαβ探索と呼ぶ。プログラムは次のようになる。

abstract class AlphaBetaPlayer[N <: Node[N]](val maxDepth: Int) extends Player[N] {

  override def play(ply: Int, node: N, last: Move): Move = {
    val (m, s) = play(node, Int.MinValue, Int.MaxValue, maxDepth)
    m
  }

  def play(node: N, alpha: Int, beta: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }
    val moves = node.possibleMoves()
    var bestMove = Move.empty
    var alpha_ = alpha
    for (m <- moves) {
      val n = node.play(m).get
      val s = playOpponent(n, alpha_, beta, depth - 1)
      if (s >= beta) { // beta cut
        return (m, beta)
      }
      if (s > alpha_) {
        bestMove = m
        alpha_ = s
      }
    }
    (bestMove, alpha_)
  }

  def playOpponent(node: N, alpha: Int, beta: Int, depth: Int): Int = {
    if (depth == 0 || node.isTerminal) {
      return score(node)
    }
    val moves = node.possibleMoves()
    var beta_ = beta
    for (m <- moves) {
      val n = node.play(m).get
      val s = play(n, alpha, beta_, depth - 1)._2
      if (s <= alpha) { // alpha cut
        return alpha
      }
      if (s < beta_) {
        beta_ = s
      }
    }
    beta_
  }

  def score(node: N): Int
}

探索木は次のようになる。branch-and-boundとの差は4段以上の探索で見られ、3, 2, 7となるところの7や、0, 9, 7の9, 7など、探索された端末ノードはさらに5つ減り31個になった。このようにbranch-and-boundに比べ深いノードへと影響を与えることから、このような枝刈りをdeep-cutoffという。

Branch and Bound

次の図は[Knuth75]に出てくる円周率πの少数以下80桁目までの数をノードのスコアとみなした分岐数3、深さ4ゲーム木をMinmaxによって探索した様子である。

# 解像度が...

この探索木を左下からみていく。左下は相手の番である。このとき3つの手に対する3, 1, 4というスコアに対して、相手は最も小さなスコアmin{3, 1, 4} = 1となる手を選択すると考えられる。つまり、その直上のノードである、自分の番では最初の手を一番左となる手を選ぶと1となることが分かった。そこで、このノードでの自分のスコアはmax{1, _, _}であることから、少なくとも1以上であることが保証された。では、その次の手を選んだときの相手の手を考えると、まず1となる手があることが分かる。すると、このあと2つの手を考慮することなく、min{1, _ , _} <= 1であることから、自分がこの手を選んでも1より大きなスコアとなることはないことが分かる。つまり、残り2つの手を調べる必要がないことが分かる。このように探索木の一部をスキップすることを枝刈りという。

自分の番においても同様に枝刈りを行うことができる。3141..265まで調べたところで、自分の番において、少なくとも2となる手があることが分かる。すると、その上のノードである相手の番では2より小さなスコアとなる手を探すことになる。しかし、その次の探索でmin{3, 5, 8} = 3ことから、この手を選ぶことで自分は少なくとも3以上のスコアを得ることができることが分かる。もちろん、相手はこの手を選ばないだろう。そこで、残りの979323となる探索はすべてスキップされる。

このような手法をbranch-and-bound探索と呼ぶ。プログラムにすると次のようになる。

abstract class BranchAndBoundPlayer[N <: Node[N]](val maxDepth: Int) extends Player[N] {

  override def play(node: N, last: Move): Move = {
    val (m, s) = play(node, Int.MaxValue, maxDepth)
    m
  }

  def play(node: N, bound: Int, depth: Int): (Move, Int) = {
    if (depth == 0 || node.isTerminal) {
      return (Move.empty, score(node))
    }
    val moves = node.possibleMoves(marker)
    var nextMove = Move.empty
    var maxS = Int.MinValue
    breakable {
      for (m <- moves) {
        val n = node.play(m).get
        val s = playOpponent(n, maxS, depth - 1)
        if (s > maxS) {
          nextMove = m
          maxS = s
        }
        if (s >= bound) break
      }
    }
    (nextMove, maxS)
  }

  def playOpponent(node: N, bound: Int, depth: Int): Int = {
    if (depth == 0 || node.isTerminal) {
      return score(node)
    }
    val moves = node.possibleMoves(opponentMarker)
    var minS = Int.MaxValue
    breakable {
      for (m <- moves) {
        val n = node.play(m).get
        val s = play(n, minS, depth - 1)._2
        if (s < minS) {
          minS = s
        }
        if (s <= bound) break
      }
    }
    minS
  }

  def score(node: N): Int
}

このプログラムを用いることで得られる探索木は次のとおり。

Minmaxは探索木に対する全探索手法であり、81個の端末ノードを調べたのに対して、branch-and-boundでは36個の端末ノードしか調べていない。

[Knuth75] Knuth, D. E., and Moore, R. W. (1975). "An Analysis of Alpha-Beta Pruning". Artificial Intelligence Vol. 6, No. 4: 293–326.

2011/06/06

Minmax searchの視覚化

2011/06/05

サブクラスで返り値がそのサブクラスへと変化するメソッドの定義

Scalaは静的な型システムを持つ言語で、Javaだとcastですましてしまうところも、きちんと定義しなくてはならない。もちろん、castももちろんできるけれども、それは本当にJavaのクラスを使用するときなどにとどめておいたほうがいい。
その結果、少しに分かりにくいところを解説しておく。

次のようなNodeの定義を考える。

trait Node {
  def play(move: Move): Option[Node]
  def possibleMoves(m: Marker): Seq[Move]
  def isTerminal: Boolean
}

探索は、この3つのメッソドとあとは適切なscore関数があれば可能である。
そして、このNodeを実装したReversiNodeを定義することを考える。ReversiNodeは当然、この3つのメソッドに対する実装を与える必要がある。さらに、ReversiNodeにはNodeにはない、winnerメソッドなどが含まれる。

class ReversiNode extends Node {
  override def play(move: Move): Option[Node] = ...
  override def possibleMoves(m: Marker): Seq[Move] = ...
  override def isTerminal: Boolean = ...
  def winner: Marker = ...
}

ここで、問題となるのはplayの返り値の型である。このままでは、playから帰ってきたオブジェクトに対してwinnerメソッドなどReversiNode特有のメソッドを呼び出せない。

val n2 = n1.play(move)
n2.winner // Error!

Javaであれば、こういうときにはcastを使うことになるが、動作時にエラーとなる危険がある。

ReversiNode n2 = (ReversiNode) n1.play(move);
n2.winner();

Scalaでは、型パラメータを用いることでcastを使わずに安全なコードを作成することができる。
まず、Nodeの宣言を次のように修正する。

trait Node[Repr <: Node[Repr]] {
  def play(move: Move): Option[Repr]
  def possibleMoves(m: Marker): Seq[Move]
  def isTerminal: Boolean
}

この[Repr <: Node[Repr]]というのは、 1. Reprという型変数を用いる（playの返り値に用いている）、 2. ReprはNode[Repr]のサブクラスである、という意味である。そして、実装クラスであるReversiNodeを次のようにする。

class ReversiNode extends Node[ReversiNode] {
  override def play(move: Move): Option[ReversiNode] = ...
  override def possibleMoves(m: Marker): Seq[Move] = ...
  override def isTerminal: Boolean = ...
  def winner: Marker = ...
}

これで、playの返り値がサブクラスのReversiNodeであることが保証された。

最後に、Nodeを使うクラスを作る場合は、Nodeが型パラメータを必要とするために、次のように用いる。

trait Player[N <: Node[N]] {
  def play(node: N, last: Move): Move
}