https://blog.csdn.net/u014397729/article/details/27366363 这是一篇用UCT算法实现四子棋AI的博客。这里给出了UCT的完整伪代码,而且有现成的可运行代码以供参考
https://blog.csdn.net/yw2978777543/article/details/70799799 这篇文章则用数学语言和伪代码进一步阐述了UCT算法的工作原理
https://jeffbradberry.com/posts/2015/09/intro-to-monte-carlo-tree-search/ 这篇英文文章则有一个清晰的图示,可以直观地认识UCT算法。
class Board { //记录了棋盘,以及上一子的颜色和位置 int chessboard[8][8], latestColor; chesspos latestStep;public: //棋盘的操控 Board(); //初始化,在四角放上棋子 bool isEnd(); //用对黑白子都算可下位置的方法计算是否终局 bool notConj(chesspos a, chesspos b); //用于判断某点是否可下 bool search(chesspos p, int color, int d); //判断某个方向上是否满足翻子条件 void rev(chesspos p, int color, int d); //在search满足后翻棋子 void oneMove(chesspos p); //确保合法的状态下给定位置,自动完成落子过程 vectorgetValidPos(int, chesspos); //提供颜色和上一子的位置,返回可下位置 //将会用于UCT算法的操作 int calScore(); //计算分数 chesspos randomMove(); //随机落子 int simulate(); //用随机落子的方法完成棋盘 //作业具体要求下的操作,可以无视之 void graphBoard(); void graphBoard(string path); void printScore(string path);};
- 我们在计算正方形内切圆面积时,可以随机播撒豆子。然后,数一数圆内的豆子数量,与所有豆子的数量比较,就可以知道圆的面积了。
- 基于同样的道理,我可以随机在所有的可下位置选一点,然后通过随机落子完成棋盘。只要模拟次数够多,那么不同位置的可能胜率就会有差异。选择胜率最好的那个位置,然后就可以一路走向成功喽!
- 首先,凭我们的直觉,当然是每个赌博机都拉一次,先看看他们的表现如何。
- 然后,如果A,B,C,D四个赌博机中,只有B赌博机给了你硬币,那么你要怎么选择呢?从当前的局面来看,当然是要继续拉B赌博机了——毕竟,从统计学上说,B赌博机的出奖概率是100%呢。
- 然而,又两次的拉动,都没有出奖。即使B赌博机的出奖概率还是比A,C,D的高——33%对0%,但你有理由怀疑,A,C,D中有更好的选择,只是样本太少暂时没有显现出来。于是你就先放下了B,转而尝试其它赌博机。
- 通过多次模拟的结果,寻找到概率最高的那一个节点。将自己的主要精力用在这一个节点上,避免不必要的浪费。这个过程叫利用(Exploitation)。
- 但是,也要照顾到那些被“冷落”的节点,避免失去机会。这个过程叫探索(Exploration)。
- Cw为节点分数。
- Cv为该节点总访问数。
- Pv为所有节点总访问数。
- C为比例系数。这个系数越大越注重探索,越小越注重利用。
- 首先,我们从以当前棋盘状态对应的节点,作为博弈树的根节点。
- 每次UCT搜索,看的是当前所到的节点,是不是尚未完全扩展的节点。这就好比在看,是否存在没有拉动拉杆的赌博机。
- 如果这个节点是完全扩展的,那么我们就计算UCB值,选择最大的那个往下走。
- 最终可能出现两种可能:我们遇到了没有完全扩展的节点,或者遇到了终局节点。
- 终局节点当然好说,就是直接沿着我们刚才来的路径,一个一个节点备份棋局结果。
- 不然的话,我们就相当于发现了没有拉动的赌博机。这时候就选一个拉下去,即以一个可行状态出发,进行随机模拟。这个模拟过程就是随机在可行位置下不断下子,直到棋盘结束。这个随机过程中我们并不记录任何东西。模拟的结果,从刚才生成的0/0节点开始,依次向上备份结果。
class Node {public: chesspos pos; //此状态的落子位置,如果上一回合没有落子,就是(-1,-1) int total, score; //节点的胜率信息 int color; //落子的颜色 Node* parent; vectorchild; vector validPos; //生成每个节点的时,都保存了可下位置,这样方便判断是否完全扩展,也可以快速找到可扩展节点 Node(chesspos p, int c, Node* par, vector v);};class Tree { Node *subroot, *tail; //一开始的时候想复用搜索树,所以还写了个root保存开局节点,但这实际上是不需要的,因为这个算法不复用搜索结果 int ownColor; //本方的颜色,用于记录胜率public: //下面这些在后面细讲 Tree(int ownc); Node* expend(Board board);//expend tail void nextnode(chesspos nextp, Board board); //includes nonexist node constuction Node* bestChild(Node * tarRoot, double cof); Board getTail(Board board);//tree policy void backup(int result); //下面的这两个都不用管,是作业特殊要求的函数。 void printInfo(); void newTurn();};
//到自己的回合了... //树是Tree UCTtree //当前棋盘是Board bs = clock();n = clock();while ((int)(n - s)<4750) { UCTtree.backup(UCTtree.getTail(b).simulate()); n = clock();}//根据搜索结果落子...
Board Tree::getTail(Board board) { tail = subroot; while (!board.isEnd()) { int vs = tail->validPos.size(), cs = tail->child.size(); if (vs != cs) { tail = expend(board); board.oneMove(tail->pos); break; } else { tail = bestChild(tail); board.oneMove(tail->pos); } } return board;}
你可以看到,如果主路径直达终局,那么就退出while,返回一个终局的棋盘。如果不是,也就是vs > cs的时候,就基于当前棋盘,扩展一个节点,然后根据这节点落子,最后返回棋盘。
Node* Tree::expend(Board board) { Node* newNode; vectorpossiblePos; bool matched; //以下的循环就是找出validPos中不在child的那些位置 for (auto v : tail->validPos) { matched = false; for (auto c : tail->child) { if (v == c->pos) { matched = true; break; } } if (!matched) possiblePos.push_back(v); } int index = rand() % possiblePos.size(); board.oneMove(possiblePos[index]); newNode = new Node(possiblePos[index], !(bool)tail->color, tail, board.getValidPos()); //你可以看到,节点在生成的时候就保留了可下位置。 tail->child.push_back(newNode); //把新节点放入tail的子节点行列中。事实上,getTail里的tail = expend(board)是可以合并在expend里的,这就是具体实现细节的问题了。 return newNode;}
Node* Tree::bestChild(Node *tarRoot = NULL, double cof = 150) { double argmax = -99999999, ucb; Node* best = NULL; if (tarRoot == NULL) tarRoot = subroot; for (auto c : tarRoot->child) { ucb = 1.0 * c->score / c->total + cof * sqrt(log(tarRoot->total) / c->total); if (ucb > argmax) { argmax = ucb; best = c; } } return best;}
int Board::simulate() { vector> aps; int score, tmpcolor = latestColor, tmpBoard[8][8]; chesspos tmpStep = latestStep; memcpy(tmpBoard, chessboard, sizeof(chessboard)); //以上是备份当前棋盘。其实这个备份环节是出于调试的需要,实际上不会直接对本地棋盘这么调用,所以不备份或许也可以。 while (!isEnd()) { randomMove(); } score = calScore(); //保存分数 memcpy(chessboard, tmpBoard, sizeof(chessboard)); //以下是恢复棋盘。 latestColor = tmpcolor; latestStep = tmpStep; return score; //返回模拟结果}chesspos Board::randomMove(){ vector aps; int index; aps = getValidPos(); index = rand() % aps.size(); if (aps[index].first != -1) oneMove(aps[index]); //在oneMove里已经转换了颜色 else latestColor = !(bool)latestColor; //说明一下,getValidPos在无子可下的时候会返回一个(-1,-1)的位置。 latestStep = aps[index]; return latestStep; //其实不一定要return,这里是调试需要}
void Tree::backup(int result) { //simulate的结果通过正负号来记录黑白子的胜利信息。 int mod = result > 0 ? BLACK_WINS : WHITE_WINS; result = abs(result); while (tail != subroot) { tail->total += 64; if (!(tail->color ^ mod)) tail->score += result; tail = tail->parent; } //由于之前的规划问题,这里还要再对subroot进行处理。如果每次转移搜索树的根节点的时候,都清除subroot的parent,那么就可以用while(tail)一步到位。 tail->total += 64; if (!(tail->color ^ result)) tail->score += result; //这一行貌似可以不要,因为根节点的胜率不在计算的考虑范围内。}
- 如果是计算胜负,那么主路径的所有节点Cv+1,胜方颜色节点Cw+1,但负方不扣分。
- 如果是计算胜子,那么Cv+64,胜方Cw加棋盘上的本方子数,同样的,负方不扣分。注意,不能Cv+32,然后Cw考虑负方扣分,这会导致奇奇怪怪的情况。
- 计算胜负,可以直观地看到胜率信息,但是最终只是能赢,不能考虑赢多。此时的比例系数c照常为1.38
- 计算胜子,就根据胜子数量细分了胜率,可以追求更多的胜子。然而,Cv+64导致增长过快,1.38的比例系数会导致极为不平衡的利用,所以必须把c调大。我尝试过从88.32到180的比例系数,但是由于时间上的限制,没办法清晰地展现出这些系数的不同。最终我采用了150,当然小一点也是没问题的。
//...搜索结束//要获得最佳节点,就把比例系数设为0,即完全利用,只看胜率了。Node* best = UCTtree.bestChild(NULL, 0);UCTtree.nextnode(best->pos, b);//进行下一回合,轮到对手落子...
void Tree::nextnode(chesspos nextp, Board board) { for (auto c : subroot->child) { if (c->pos == nextp) { subroot = c; subroot->score = 0; subroot->total = 0; subroot->child.clear(); return; } }}
- 会不会出现我的目标节点并未被扩展出来?实际上不需要担心这个,一个局面的可下位置至多不超过30多,而5秒已经可以达到800多次的UCT搜索,所以并没有要为还没扩展的节点考虑在树上新生成节点。此外,bestChild也保证了只会在已扩展节点中选择位置。
- 注意subroot->child.clear(),也就是每次转移根节点,都不必要保存之前的搜索结果,因为这可能会妨碍最优子节点的判断。而且,实际上搜索结果的复用效率很低,即使保存了也不会有很大的能力提升。
- 虽然UCT靠的是随机模拟,但是靠着模拟次数足够和UCB策略,也能有着很不错的表现。
- UCT算法是独立于游戏本身的算法,只要有接口,大部分相似的游戏都可以使用UCT,比如五子棋,象棋等。
- α-β剪枝是常用的算法,但是它需要针对游戏进行精细的估值。相比之下,虽然UCT算法可能打不过精细调参的剪枝算法,但是它只需要调一个比例系数,非常省事高效。
- 搜索次数也是限制UCT算法能力的一个因素。开局情况下只能搜索800次,只有到后期才可能上千上万。如果开局不好,UCT算法可能会无法给自己布好局,从而早早地给出低胜率。当然了,对付猴子还是绰绰有余的。
1 #include2 #include 3 #include 4 #include 5 #include 6 #include 7 #include 8 //#include 9 #define BLACK_WINS 0 10 #define WHITE_WINS 1 11 //#define TESTING 12 using namespace std; 13 typedef pair chesspos; 14 15 class Node { 16 public: 17 chesspos pos; 18 int total, score; // long long? 19 int color; 20 Node* parent; 21 vector child; 22 vector > validPos; 23 Node(chesspos p, int c, Node* par, vector v); 24 }; 25 class Board { 26 int chessboard[8][8], latestColor; 27 chesspos latestStep; 28 public: 29 Board(); 30 int calScore(); 31 bool isEnd(); 32 bool notConj(chesspos a, chesspos b); 33 bool search(chesspos p, int color, int d); 34 void rev(chesspos p, int color, int d); 35 void oneMove(chesspos p); 36 vector getValidPos(int, chesspos); 37 chesspos randomMove(); 38 int simulate(); 39 void graphBoard(); 40 void printScore(); 41 }; 42 43 class Tree { 44 Node *root, *subroot, *tail; 45 int ownColor; 46 public: 47 Tree(int ownc, Board board); 48 Node* expend(Board board);//expend tail 49 void nextnode(chesspos nextp, Board board); //includes nonexist node constuction 50 Node* bestChild(Node * tarRoot, double cof); 51 Board getTail(Board board);//tree policy 52 void backup(int result); 53 void printInfo(); 54 void newTurn(); 55 }; 56 int dr[8] = { 0,0,1,1,1,-1,-1,-1 }; 57 int dc[8] = { 1,-1,1,0,-1,1,0,-1 }; 58 59 int main() { 60 srand(time(NULL)); 61 int x, y; 62 time_t s, n; 63 Board b=Board(); 64 Tree UCTtree(0, b); 65 Node* best; 66 int res, searchCount ,total = 0; 67 chesspos r; 68 while (!b.isEnd()) { 69 s = clock(); 70 n = clock(); 71 searchCount = 0; 72 while ((int)(n - s)<4750) { 73 //Board t = UCTtree.getTail(b); 74 //res = t.simulate(); 75 UCTtree.backup(UCTtree.getTail(b).simulate()); 76 //printf("%d\n", i); 77 n = clock(); 78 searchCount++; 79 } 80 n = clock(); 81 printf("time use:%d\nSearch times:%d\n", (int)(n - s), searchCount); 82 UCTtree.printInfo(); 83 best = UCTtree.bestChild(NULL, 0); 84 printf("win rate:%lf\n", 1.0 * best->score / best->total * 64); 85 b.oneMove(best->pos); 86 total++; 87 printf("total:%d\n", total); 88 b.graphBoard(); 89 if (!b.isEnd()) { 90 UCTtree.nextnode(best->pos, b); 91 UCTtree.printInfo(); 92 best = UCTtree.bestChild(NULL, 0); 93 printf("win rate:%lf\n", 1.0 * best->score / best->total * 64); 94 //cin >> x >> y; 95 //b.oneMove(r); //if you want to see how monkey moves, delete this two lines and use the next line 96 r = b.randomMove(); 97 total++; 98 printf("The monkey choose to move in (%d,%d)\n", r.first, r.second); 99 printf("total:%d\n", total);100 UCTtree.nextnode(r, b);101 b.graphBoard();102 }103 system("pause");104 }105 b.printScore();106 system("pause");107 //take white as owncolor, monkey mode only108 total = 0;109 b = Board();110 UCTtree.newTurn();111 while (!b.isEnd()) {112 UCTtree.printInfo();113 best = UCTtree.bestChild(NULL, 0);114 printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);115 r = b.randomMove();116 total++;117 printf("The monkey choose to move in (%d,%d)\n", r.first, r.second);118 printf("total:%d\n", total);119 UCTtree.nextnode(r, b);120 b.graphBoard();121 if (!b.isEnd()) {122 s = clock();123 n = clock();124 searchCount = 0;125 while ((int)(n - s)<4750) {126 res = UCTtree.getTail(b).simulate();127 UCTtree.backup(res);128 //printf("%d\n", i);129 n = clock();130 searchCount++;131 }132 n = clock();133 printf("time use:%d\nSearch times:%d\n", (int)(n - s), searchCount);134 UCTtree.printInfo();135 best = UCTtree.bestChild(NULL, 0);136 printf("win rate:%lf\n", 1.0 * best->score / best->total * 64);137 b.oneMove(best->pos);138 UCTtree.nextnode(best->pos, b);139 total++;140 printf("total:%d\n", total);141 b.graphBoard();142 system("pause");143 }144 }145 b.printScore();146 system("pause");147 return 0;148 }149 150 Board::Board()151 {152 for (int i = 0; i < 8; i++)153 for (int j = 0; j < 8; j++)154 chessboard[i][j] = -1;155 chessboard[3][3] = 0;156 chessboard[4][4] = 0;157 chessboard[3][4] = 1;158 chessboard[4][3] = 1;159 latestColor = 1;//gameStartRoot is white, next and first move is black160 latestStep = make_pair(-1, -1);161 }162 163 int Board::calScore() {164 int c[2] = { 0 };165 for (int i = 0; i < 8; i++)166 for (int j = 0; j < 8; j++)167 if (chessboard[i][j] >= 0) c[chessboard[i][j]]++;168 #ifdef TESTING169 return c[0] - c[1];170 #else171 if (c[0] == c[1]) return 0;172 else if (c[0] > c[1]) return c[0];173 else if (c[0] < c[1]) return -1 * c[1];174 //return c[0] > c[1] ? BLACK_WINS : WHITE_WINS;175 #endif // TESTING176 177 }178 179 bool Board::isEnd() {180 if (getValidPos(1, make_pair(-1, -1))[0].first == -1 && getValidPos(0, make_pair(-1, -1))[0].first == -1) return true;181 return false;182 }183 184 bool Board::notConj(chesspos a, chesspos b) {185 if (abs(a.first - b.first) + abs(a.second - b.second) == 1) return false;186 else return true;187 }188 189 bool Board::search(chesspos p, int color, int d)190 {191 int r = p.first + dr[d], c = p.second + dc[d];192 if (chessboard[r][c] == color) return false; //diff color should be in the middle193 while (0 <= r && r <= 7 && 0 <= c && c <= 7) {194 if (chessboard[r][c] == -1) return false;195 else if (chessboard[r][c] == color) return true;196 else {197 r += dr[d];198 c += dc[d];199 }200 }201 return false;202 }203 204 void Board::rev(chesspos p, int color, int d)205 {206 int r = p.first + dr[d], c = p.second + dc[d], oppcolor = !(bool)color;207 while (0 <= r && r <= 7 && 0 <= c && c <= 7 && chessboard[r][c] == oppcolor) {208 chessboard[r][c] = color;209 r += dr[d];210 c += dc[d];211 }212 }213 214 void Board::oneMove(chesspos p)215 {216 latestColor = !(bool)latestColor;217 if (p.first != -1) {218 chessboard[p.first][p.second] = latestColor;219 for (int d = 0; d < 8; d++) { //flip in 8 direction220 if (search(p, latestColor, d)) {221 rev(p, latestColor, d);222 }223 }224 }225 latestStep = p;226 }227 228 vector Board::getValidPos(int targetColor = -1, chesspos lstep = make_pair(233, 233))229 {230 vector result;231 chesspos pos;232 if (targetColor == -1) targetColor = !(bool)latestColor; //next step is for the opp233 if (lstep == make_pair(233, 233)) lstep = latestStep;234 #pragma omp parallel for235 for (int k = 0; k < 64; k++) {236 int i = k / 8;237 int j = k % 8;238 if (chessboard[i][j] == -1) {239 pos = make_pair(i, j);240 for (int d = 0; d < 8; d++) {241 if (notConj(pos, lstep) && search(pos, targetColor, d)) {242 result.push_back(pos);243 break;244 }245 }246 } 247 }248 if (result.size() == 0) result.push_back(make_pair(-1, -1));249 return result;250 }251 252 chesspos Board::randomMove()253 {254 vector > aps;255 int index;256 aps = getValidPos();257 index = rand() % aps.size();258 if (aps[index].first != -1) oneMove(aps[index]); //in this func color has been flipped259 else latestColor = !(bool)latestColor;260 latestStep = aps[index];261 return latestStep;262 }263 264 int Board::simulate() {265 vector > aps;266 int index, tmpcolor = latestColor, tmpBoard[8][8];267 chesspos tmpStep = latestStep;268 memcpy(tmpBoard, chessboard, sizeof(chessboard)); //backup269 while (!isEnd()) {270 randomMove();271 #ifdef TESTING272 graphBoard(); //273 #endif // TESTING274 }275 index = calScore(); //for temp use276 memcpy(chessboard, tmpBoard, sizeof(chessboard)); // reset to initial state277 latestColor = tmpcolor;278 latestStep = tmpStep;279 return index;280 }281 282 void Board::graphBoard() {283 int markboard[8][8];284 vector aps = getValidPos();285 memcpy(markboard, chessboard, sizeof(chessboard));286 if (latestStep.first != -1) markboard[latestStep.first][latestStep.second] = 2;287 for (auto p : aps) markboard[p.first][p.second] = 3;288 printf(" 0 1 2 3 4 5 6 7\n");289 for (int i = 0; i < 8; i++) {290 printf(" %d", i);291 for (int j = 0; j < 8; j++) {292 switch (markboard[i][j]) {293 case 0:printf("●"); break;294 case 1:printf("○"); break;295 case 2: {296 if (latestColor == 0) printf("★");297 else printf("☆");298 break;299 }300 case 3:printf("♂"); break;301 case -1: {302 if (notConj(latestStep, make_pair(i, j))) printf(" ");303 else printf("×");304 break;305 }306 }307 }308 printf("||\n");309 }310 printf("=======================\n");311 }312 313 void Board::printScore() {314 int c[2] = { 0 };315 for (int i = 0; i < 8; i++)316 for (int j = 0; j < 8; j++)317 if (chessboard[i][j] != -1) c[chessboard[i][j]]++;318 printf("BLACK %d : %d WHITE\n", c[0], c[1]);319 }320 Node::Node(chesspos p, int c, Node * par, vector v)321 {322 pos = p;323 color = c;324 parent = par;325 total = 0;326 score = 0;327 validPos = v;328 }329 330 Tree::Tree(int ownc, Board board)331 {332 root = new Node(make_pair(-1, -1), 1, NULL, board.getValidPos());333 subroot = root;334 tail = root;335 ownColor = ownc;336 }337 338 Node* Tree::expend(Board board) {339 Node* newNode;340 vector possiblePos;341 bool matched;342 for (auto v : tail->validPos) {343 matched = false;344 for (auto c : tail->child) {345 if (v == c->pos) {346 matched = true;347 break;348 }349 }350 if (!matched) possiblePos.push_back(v);351 }352 int index = rand() % possiblePos.size();353 board.oneMove(possiblePos[index]);354 newNode = new Node(possiblePos[index], !(bool)tail->color, tail, board.getValidPos());355 tail->child.push_back(newNode);356 return newNode;357 }358 359 void Tree::nextnode(chesspos nextp, Board board) {360 for (auto c : subroot->child) {361 if (c->pos == nextp) {362 subroot = c;363 return;364 }365 }366 //no child matched367 board.oneMove(nextp);368 Node *newNode = new Node(nextp, !(bool)subroot->color, subroot, board.getValidPos());369 subroot = newNode;370 }371 372 Node* Tree::bestChild(Node *tarRoot = NULL, double cof = 150) {373 double argmax = -99999999, ucb;374 Node* best = NULL;375 if (tarRoot == NULL) tarRoot = subroot;376 for (auto c : tarRoot->child) {377 ucb = 1.0 * c->score / c->total + cof * sqrt(log(tarRoot->total) / c->total);378 if (ucb > argmax) {379 argmax = ucb;380 best = c;381 }382 }383 return best;384 }385 Board Tree::getTail(Board board) {386 tail = subroot;387 while (!board.isEnd()) {388 int vs = tail->validPos.size(), cs = tail->child.size();389 if (vs != cs) {390 tail = expend(board);391 board.oneMove(tail->pos);392 break;393 }394 else {395 tail = bestChild(tail);396 board.oneMove(tail->pos);397 }398 }399 return board;400 }401 402 void Tree::backup(int result) {403 //if a subroot is avoidable, then use while(tail) for root node's parent is NULL404 int mod = result > 0 ? BLACK_WINS : WHITE_WINS;405 result = abs(result);406 while (tail != subroot) {407 tail->total += 64;408 if (!(tail->color ^ mod)) tail->score += result;409 //tail->score += result * mod;410 //mod *= -1;411 tail = tail->parent;412 }413 tail->total += 64;414 //tail->score += result * mod;415 if (!(tail->color ^ result)) tail->score += result;416 }417 418 void Tree::printInfo() {419 printf("subroot:(%d,%d)\n", subroot->pos.first, subroot->pos.second);420 for (auto c : subroot->child) {421 printf("-child:(%d,%d), score:%d, total:%d\n", c->pos.first, c->pos.second, c->score, c->total);422 }423 Node* n = bestChild(subroot, 0);424 chesspos p = n == NULL ? make_pair(8, 8) : n->pos;425 printf("bestchild:(%d,%d)\n",p.first,p.second);426 }427 428 void Tree::newTurn() {429 subroot = root;430 ownColor = !(bool)ownColor;431 }