Earlier this week, Google’s DeepMind team published a paper describing AlphaZero, a new generic reinforcement learning algorithm that has done some remarkable things. First, in about eight hours, it taught itself to beat AlphaGo, a human-trained AI system that beat the best human Go players in the world. It also taught itself chess and Shogi (known as Japanese chess) in about four hours and beat the best human-trained AI systems at those games.
How did AlphaZero teach itself? The rules of the games were programmed into the system. Then, AlphaZero started to play itself. The more it played itself, the better it became. Over a few hours, it learned to play the games better than the games had ever been played.
Perfect Information
It is extremely important to understand that AlphaZero is not thinking like humans think, and it is not capable of general-knowledge decision-making. AlphaZero has performed brilliantly learning to play and win at what is known as a “perfect information” game. In a board game such as Go, chess, or Shogi, both players know all of the rules, can see the entire board, can see all of the game pieces, know the starting position of each piece, and know every move that has been made. This is dramatically different from game play with “incomplete information” such as bidding in a programmatic ad auction. It is also different from game play where players have “imperfect but complete information” such as poker, or contract bridge, or negotiating rates at the upfronts.
Sourced through Scoop.it from: www.shellypalmer.com
Leave A Comment