Google's "Learning Computer" Alpha-Zero: Real life Skynet?

jaywheel · Feb 9, 2018

It only took AlphaZero 4 hrs to master chess from scratch and crush top brute strength AI Stockfish.

For anyone that doesn't know, top chess computers are now so far ahead of humans they don't play. It's a non-competitive joke. Chess computers are dominating humans by using brute force calculation to find the least imperfect move in each position. Human's had the ability to out-think computers into the nineties, but that was it.

Now, Google has developed a "learning" machine which was given the rules of chess and then provided with four hours to play itself over and over again in order that it could learn the game. After these four hours it was matched with Stockfish.

Stockfish was the 2016 machine chess champion, so basically the strongest chess entity in the world.

When it was matched with "Alpha-Zero", the learning computer, it was utterly destroyed. They played 100 games and Alpha-Zero didn't lose a single game. They drew about seventy, the new machine won the rest.

I don't know how many chess guys there are here, but what was really spooky was that it was armed with no opening theory at all, but it "learned" the Ruy Lopez, the white end of the Berlin Defence to be precise. This is an opening that has been evolving since the 1600s and is seen now as having reached a kind of pinnacle with the Berlin. It's spooky that they played that variation, of all the variations they could have played. It really underlines that what took humans millions of games played over 400 years to decide that this was maybe the "toughest" version to play and the computer gets there in four hours.

Gary Kasparov, former chess champion, said the computer was using a "human-like" approach, so it represents of evolution from brute force calculations to a broader understanding of strategy, maybe.

Google aren't saying much.

Anyway, anyone who go this far through this mad ramble will probably be interested in reading a more coherent account here:

https://www.chess.com/news/view/google-s-alphazero-destroys-stockfish-in-100-game-match

And the detailed paper here:

https://arxiv.org/pdf/1712.01815.pdf

jaywheel · Feb 9, 2018

2 articles from chess.com shedding light on this:

Part One:

By now you've heard about the new kid on the chess-engine block, AlphaZero, and its crushing match win vs Stockfish, the strongest open-source chess engine.

The reactions from the chess community to this match ranged from admiration to utter disbelief.

But how does AlphaZero actually work?

How is it different from other engines and why is it so much better? In this two-part article I’ll try to explain a bit of what goes on under AlphaZero’s hood.

First, let’s reflect on what happened. AlphaZero was developed by DeepMind (a Google-owned company) to specialize in learning how to play two-player, alternate-move games. It was primed with the rules of chess, and nothing else.

It then started learning chess by playing games against itself. Game one would have involved totally random moves. At the end of this game, AlphaZero had learned that the losing side had done stuff that wasn’t all that smart, and that the winning side had played better. AlphaZero had taught itself its first chess lesson. The quality of chess in game two was a just a tiny bit better than the first.
It didn’t calculate more variations than Stockfish.

Quite the opposite in fact: Stockfish examined 70 million positions per second while AlphaZero contented itself with about 0.1 percent of that: 80,000 per second. This brings to mind a remark made by Jonathan Rowson after Michael Adams crushed him in a match in 1998: “I was amazed at how little he saw.”

Stronger players tend to calculate fewer variations than weaker ones. Instead their highly-honed intuition guides them to focus their calculation on the most relevant lines. This is exactly what AlphaZero did. It taught itself chess in quite a human-like way, developing an “intuition” like no other chess machine has ever done, and it combined this with an amount of cold calculation.
Nine hours and 44 million games of split-personality chess later, AlphaZero had (very possibly) taught itself enough to become the greatest chess player, silicon- or carbon-based, of all time.

How on earth did it do it?

The Analysis Tree

Chess engines use a tree-like structure to calculate variations, and use an evaluation function to assign the position at the end of a variation a value like +1.5 (White’s advantage is worth a pawn and a half) or -9.0 (Black’s advantage is worth a queen). AlphaZero’s approach to both calculating variations and evaluating positions is radically different to what other engines do.

All popular chess engines are based on the minimax algorithm, which is a fancy name that simply means you pick the move that gives you the biggest advantage regardless of what the opponent plays. Minimax is invariably enhanced with alpha-beta pruning, which is used to reduce the size of the tree of variations to be examined. Here’s an extreme example of how this pruning works: Say an engine is considering a move and sees its opponent has 20 feasible replies. One of those replies leads to a forced checkmate. Then the engine can abandon (or “cutoff”) the move it was considering, no matter how well it would stand after any of the other 19 replies.

Another issue is that if an engine prunes away moves that only seem bad, e.g. those that lose material, it will fail to consider any kind of sacrifice, which is partly why early engines were so materialistic. In current engines like Stockfish, alpha-beta pruning is combined with a range of other chess-specific enhancements such the killer-move heuristic (a strong move in another similar variation is likely to be strong here), counter-move heuristic (some moves have natural responses regardless of position — I bet you’ve often met axb5 with axb5, right?) and many others.

AlphaZero, in contrast, uses Monte Carlo Tree Search, or MCTS for short. Monte Carlo is famous for its casinos, so when you see this term in a computing context it means there’s something random going on. An engine using pure MCTS would evaluate a position by generating a number of move sequences (called “playouts”) from that position randomly, and averaging the final scores (win/draw/loss) that they yield. This approach may seem altogether too simple, but if you think about it you’ll realize it’s actually quite a plausible way of evaluating a position.

The Monte Carlo Casino.

AlphaZero creates a number of playouts on each move (800 during its training). It also augments pure MCTS by preferring moves that it has not tried (much) already, that seem probable and that seem to lead to “good” positions, where “good” means that the evaluation function (more on this next article) gives them a high value. It’s really creating semi-random playouts, lines that seem appropriate to its ever-improving evaluation function. Isn’t this quite like how you calculate? By focussing on plausible lines of play?

Notice that so far there’s absolutely nothing chess-specific in what AlphaZero is doing. In my next article, when we look at how AlphaZero learns to evaluate chess positions, we’ll see there’s absolutely nothing chess-specific there either!

Like a newborn baby, AlphaZero came into the world with little knowledge, but is massively geared to learn. One weakness of MCTS is that since it’s based on creating semi-random playouts, it can get it completely wrong in tense positions where there is one precise line of optimal play. If it doesn’t randomly select this line, it is likely to blunder. This blindness was probably what caused AlphaZero’s Go playing predecessor, AlphaGo, to lose a game to 18-time world Go champion Lee Sedol. It seems not to have been an issue in the match with Stockfish, however.

MCTS has been used previously for two-player gameplay, but was found to perform much worse than the well-established minimax plus alpha-beta approach. In AlphaZero, MCTS combines really well with the employed neural network-based evaluation function.

In my next article, I’ll explain more about this neural network and especially the fascinating way it learns, on its own, how to evaluate chess positions. I’ll also describe the hardware AlphaZero runs on, and make some predictions about how all this will impact chess as we know it.

jaywheel · Feb 9, 2018

Part two:

In the first part of this article I described how AlphaZero calculates variations. In this part I’ll cover how it learns, by itself, to play chess.

I’ll have to gloss over some details, but hopefully there’s enough to give you a better understanding of how AlphaZero works.

Inside AlphaZero

Let’s jump right in to the middle of this. AlphaZero’s learning happens using a neural network, which can be visualized like this:

A neural network is our attempt at making a computer system more like the human brain and less like, well, a computer. The input, i.e., the current position on the chessboard, comes in on the left. It gets processed by the first layer of neurons, each of which then sends its output to each neuron in the next layer and so on, until the rightmost layer of neurons do their thing and produce the final output. In AlphaZero, this output has two parts:

An evaluation of the chess position it was given.
An evaluation of each legal move in the position.
Hey, AlphaZero sounds like a chess player already: “White’s a bit better here, and Bg5 or h4 look like good moves!”

So these neurons must be smart little devils, right? A neuron is actually a very simple processing unit (it can be in software or hardware) that accepts a number of inputs, multiplies each one by a particular weight, sums the answers and then applies a so-called activation function that gives an output, typically in the range of 0 to 1. One thing to notice is that what a neuron outputs potentially depends on every other neuron in the network before it, which allows the network to capture subtleties, like in chess where White’s castled king is safe, but after h3 the assessment changes as Black can open the g-file with g7-g5-g4.

Based on the data published for AlphaGo Zero (AlphaZero’s Go-playing predecessor) AlphaZero’s neural network probably has up to 80 layers, and hundreds of thousands of neurons. Do the math and realize that this means hundreds of millions of weights. Weights are important because training the network (also called learning) is a matter of giving the weights values so that the network plays chess well. Imagine there’s a neuron that during training has taken on the role of assessing king safety. It takes input from all preceding neurons in the network and learns what weights to give them. If AlphaZero gets mated after moving all its pawns in front of its king, it will adjust its weights to reduce the possibility of making this error again.

How AlphaZero Learns

AlphaZero starts out as a blank slate, a big neural network with random weights. It has been engineered to learn how to play two-player, alternate-move games, but knows absolutely nothing about any particular game at all, much as we are born with a vast capacity to learn language, but with no knowledge of any particular language.

The first step was to give AlphaZero the rules of chess. This meant it can now play random, but at least legal, moves. The natural next step would seem to be to give it master games to learn from, a technique called supervised learning. However this would have resulted in AlphaZero only learning how we play chess, with all its flaws, so the Google team chose instead to use a more ambitious approach called reinforcement learning. This means that AlphaZero was left to play millions of games against itself. After each game it would tweak some of its weights to try to encode (i.e., remember) what worked well and what didn’t.

When it started this learning process, AlphaZero could only play random moves and all it knew was that checkmate is the goal of the game. Imagine trying to learn principles like central control or the minority attack, simply from who checkmated whom at the end of the game! During this learning period, AlphaZero’s progress was measured by playing second-a-move tournaments with Stockfish, and the previous versions of itself. It seems utterly incredible, but AlphaZero after four hours of self-play had learned enough about chess to exceed Stockfish’s rating, while examining only about 0.1 percent of the number of positions Stockfish examined.

While this is pretty mind-blowing, remember humankind learned chess in a similar way. For centuries, millions of humans have being playing chess, using our brains to learn more about this game, like a giant multi-processor carbon-based computer. We learned the hard way to play in the center, put rooks on open files, attack pawn chains at the base, etc.. This is what AlphaZero had to do too. It would be fascinating to see its 44 million games of self-play. I wonder in which one did it discover the minority attack?

How AlphaZero Plays Chess

So far we’ve seen how AlphaZero trains its neural network so it can evaluate a given chess position and assess which moves are likely to be good (without calculating anything).

Here’s some more terminology: the part of the network that evaluates positions is called the value network, while the "move recommender" part is called the policy network. Now let’s see how these networks help AlphaZero in actually playing chess.

Recall that the big problem in chess is the explosion of variations. Just to calculate two moves ahead from the opening position involves looking at about 150,000 positions, and this number grows exponentially for every move deeper you go. AlphaZero reduces the number of variations to look at by only considering those moves that its policy network recommends. It also uses its value network to stop looking further down lines whose evaluation suggests that they are clearly decided (won/lost).

Say there’s an average of three decent possible moves available, according to the policy network. Then at the very modest rate of 70,000 positions per second employed by AlphaZero, it could look about seven full moves ahead in a minute. Couple this with applying its instinctual evaluation provided by its value network to the leaf positions at the end of the variations it looks at, and you have a very powerful chess-playing machine indeed.

The Hardware AlphaZero Runs On

Unsurprisingly the AlphaZero’s neural network runs on specialist hardware, namely Google’s tensor processing units (TPU). AlphaZero uses 5,000 first-generation TPUs to generate the self-play games, which are used to train the network, and 64 second-generation TPUs to do the actual training. This is a gigantic amount of computing power. For actually playing chess, only four TPUs were used.

Why didn’t Google use the other 5,060 TPUs as well? Probably to show that AlphaZero doesn’t need massive hardware to run effectively.

Google’s tensor processing units (TPU).

As a neural network propagates values between its layers, each input to each neuron is multiplied by a certain weight, in what is essentially matrix multiplication (remember that from school?). The TPU was designed by Google purely for training and running neural networks, and so it specializes in doing matrix multiplication. A matrix multiplication operation that would take a regular CPU in your laptop a long series of calculations, a TPU can do in a single clock cycle (and the first generation TPU does 700 million cycles per second). Think of a machine in a factory that can put caps on 100 bottles of soda at once, vs some poor soul putting them on one-by-one.

AlphaZero vs Stockfish

After four hours training, AlphaZero’s rating had exceeded that of Stockfish. It trained for five more hours, but made little or no improvement in this time (yes, this is interesting in itself, but the researchers didn’t provide more information to enable an interpretation). At this point the two played a 100-game match, one minute a move, which AlphaZero won with 64-36 with no losses. This seems like a hammering, but in fact corresponds to only a 100-point difference in Elo rating. Much has been made of the fairness or otherwise of this match. Stockfish was denied its opening book, which is typical of computer matches. Stockfish used 64 threads, which suggests it was running on a very powerful PC, but only used a modest hash size of 1 GB. As against that, AlphaZero had 5,064 TPUs at its disposal, but only used four of them in the match.

Many people have proposed how to make a fair match between the two, but this is not really possible, as they rely on radically different hardware. A race between a person and a horse would not be made “fair” by only permitting the use of two legs.

What is undeniable, and amazing, is that AlphaZero’s combination of hardware and software could learn, in four hours, how to evaluate chess positions and moves better than the highly-refined Stockfish.

If you’re still thinking about the size of Stockfish’s hash table, you’re really missing the point of what’s happened. Put it this way: AlphaZero’s achievement would have been only a shade less amazing had it instead lost to Stockfish by a similar score.

I Want AlphaZero On My Laptop!

Oh no you don’t! AlphaZero trains neural networks. What you want is the neural network that AlphaZero trained to play chess. The same way you want a doctor to look at your swollen finger, not a medical university. No doubt this neural network was dumped to disk while the TPUs that trained it were assigned other duties. If the network structure and weights were published, it would, in theory at least, be possible to recreate AlphaZero’s chess-playing network on a laptop, but its performance would not be up to what can be achieved with Google’s specialist hardware.

How close would it come? Let’s do some very rough sums. The part of your computer that is best-suited to the calculations that AlphaZero performs is the GPU or graphics processing unit. The may seem odd, but graphics are all about matrix multiplication, which is just what a neural network needs too. Google estimates its TPU to be about 20 times faster than a contemporary GPU, so the the 4-TPU machine that defeated Stockfish is relying on about 80 times more oomph than a regular PC. So for the moment, this will be beyond the home user.

A revolution is taking place in the artificial intelligence field, where neural networks are being used to tackle problems that were previously seen as too complex for a computational approach. AlphaZero’s general-purpose approach enabled it to teach itself to understand chess (not just calculate variations) far better than any chess-specific approach has been able to. Oh, and it did this for Go and Shogi as well, board games with higher computational complexity than chess. It’s very unlikely that Google will be interested in progressing the chess project any further—it will be setting its sights on more challenging and worthwhile problems.

So what does this all mean for chess as we know it? It is a gigantic step that a computer can now teach itself chess to such a high level, relying more on human-like learning than on traditional, brute-force calculation. This creates a dent in our notion of human supremacy. It will undoubtedly have an impact on how chess engines will evolve in the future, and we may have to accept reluctantly that the most insightful chess is played by machines.

The hardware AlphaZero runs on won’t be available to the chess-playing public anytime soon, but don’t forget that when the custom-built Deep Thought (predecessor to Deep Blue) beat Bent Larsen in 1988, its creators hardly imagined that future school kids would carry that kind of computing power in their pocket. Watch this space, as they say.

S1FAP · Feb 9, 2018

in before 42

Jeremi01 · Feb 9, 2018

Plus long que les TLDR de LG...

Red96EK · Feb 9, 2018

Goes to show that given a certain set of limitations(rules) and parameters(playing board). Computers will always win, will always be more efficient and will almost never make a mistake. Hence why robots/AI are going to replace alot of jobs in 10 years. From cashier to flipping burgers and stocking up isles in supermarket.

The difference still comes when things dont go as expected. As we see in most sci-fi movies. However the mitigation for those is getting better and better as we go.

Spaceman Spiff · Feb 9, 2018

Red96EK said:
Goes to show that given a certain set of limitations(rules) and parameters(playing board). Computers will always win, will always be more efficient and will almost never make a mistake. Hence why robots/AI are going to replace alot of jobs in 10 years. From cashier to flipping burgers and stocking up isles in supermarket.

The difference still comes when things dont go as expected. As we see in most sci-fi movies. However the mitigation for those is getting better and better as we go.

For medical diagnostics, IBM's Watson is already outperforming doctors. Imagine the savings if we replace doctors with AIs.

m_falafel · Feb 9, 2018

Spaceman Spiff said:
For medical diagnostics, IBM's Watson is already outperforming doctors. Imagine the savings if we replace doctors with AIs.

Yeah, nice savings. But how will we all find a way to earn money? We'll live in some kind of communistic utopia with all the same basic universal income, living in tiny as shit apartments with no joy?

Snail · Feb 9, 2018

Snail · Feb 9, 2018

m_falafel said:
Yeah, nice savings. But how will we all find a way to earn money? We'll live in some kind of communistic utopia with all the same basic universal income, living in tiny as shit apartments with no joy?

better yet, imagine if we only needed a small fraction of the humans we have today to run tomorrow's world? i'm not saying castrate EVERYONE...but I mean, imagine if the world's population was 300-500 million. No more food or polution problem, consumption goes way down, yet all the robots are doing the dirty repetitive jobs and we get to focus on being creative, discovering and learning new things. It makes us more intelligent beings and better problem solvers if we don't have to burn so much time, money and ressources on things that are done by a thing.

it would work regardless of socialism or capitalism.

m_falafel · Feb 9, 2018

Snail said:
better yet, imagine if we only needed a small fraction of the humans we have today to run tomorrow's world? i'm not saying castrate EVERYONE...but I mean, imagine if the world's population was 300-500 million. No more food or polution problem, consumption goes way down, yet all the robots are doing the dirty repetitive jobs and we get to focus on being creative, discovering and learning new things. It makes us more intelligent beings and better problem solvers if we don't have to burn so much time, money and ressources on things that are done by a thing.

it would work regardless of socialism or capitalism.

No sure if serious... This is exactly what the elites want yo to think. They want you to think that eugenics are ok, that sterilizing the general population is the only way not to destroy the planet. They want you to think that you are on the ''good side'' of the gun, that if you get aboard their movement, that they wouldn't kill you, your family and your potential offsprings.

The reality is that you are (would be), a useful and dispensable idiot to them. They'd use you and everybody else until they don't need you. Then they'll dispose of you without even batting an eye about it.

Snail · Feb 9, 2018

i...am...the elite.

vildric · Feb 9, 2018

On appel ça du deep learning, en simple : des gens programmes des ordinateurs pour faire une tâche spécifique, leur donne des milliards d'exemple, et ensuite l'ordinateur est capable de faire des choix ou reconnaître quelque chose.

Exemple: on écrit à l'ordinateur

"ceci est une image de radiographie"
"ceci est un poumon sur une radio"
"ceci est un cancer du poumon"

On fait ça 10^1000x et après l'ordinateur peut reconnaître un cancer du poumon.

C'est la même chose ici.

TLDR: Pour les non-initiés, ça RESSEMBLE à de l'intelligence, mais ce n'en ai pas du tout.

L'ia fait de grandes avancées ces dernieres années, mais on est encore à 100 ans de skynet.

jaywheel · Feb 9, 2018

It's actually reinforcement learning, with deep learning incorporated.

allroad · Feb 9, 2018

Tuning in, we have some MR Deep Learning specialists??? I can't wait to see what gets posted...

Dibbs · Feb 9, 2018

allroad said:
Tuning in, we have some MR Deep Learning specialists??? I can't wait to see what gets posted...

I did a lot of probability and stochastic calculus in my master degree, coupled with simulation techniques such as the ones mentioned in the paper (monte-carlo, trees, etc.). At the end of the day, all this stuff is bayesian calculus.

I think what's fantastic about this piece is that the machine was able to build its own "prior probability distribution" by playing against itself.

the sheer computing power to canvass all these probability distribution is incredible. 5000+ TPUs for the learning stage vs. 4 for playing.

CRNKY · Feb 9, 2018

It's ok because the next generation is eating tide pods so we need something intelligent to run everything in the future.

Snail · Feb 10, 2018

CRNKY said:
It's ok because the next generation is eating tide pods so we need something intelligent to run everything in the future.

we can build, today, a computer that will identify and eat a tide pod while filming itself.

we've already replaced the newest generation of humans.

Google's "Learning Computer" Alpha-Zero: Real life Skynet?

jaywheel

Active member

jaywheel

Active member

jaywheel

Active member

S1FAP

New member

Jeremi01

-- STATISTICIEN --

Red96EK

Legacy Member

Spaceman Spiff

Well-known member

m_falafel

New member

Snail

Banned

Snail

Banned

m_falafel

New member

Snail

Banned

vildric

New member

jaywheel

Active member

allroad

New member

Dibbs

Active member

CRNKY

New member

Snail

Banned