Super MarI/O

SuperMarI/O is a project aimed at developing the best possible Super Mario Bros. player using machine learning. During 2016, we've tried two methods of creating players (also called "agents") using neural networks.

What's a Neural Network?

A Math-y Formula

While some people may tell you that neural networks simulate the human brain (they really don't), a neural network is basically just a mathematical formula. It simply takes some numbers in and spits some numbers out.

But wait a minute, aren't neural networks machine learning? Where's the learning?

Supervised Learning using a Neural Network

Usually, you start out by making a random neural network (a formula with random numbers). It'll take some inputs and they produce some outputs (that are almost certainly very very wrong). Then you would compare these against the "right answers" and you get the error (how far off the network's guesses are). Then, using a method like backpropagation, you adjust your neural network's formula so that it'll guess closer to the right answer. It's kind of like someone playing a game for the first time, you just have to let the network try a few things out and you lightly nudge it in the right direction.

You need to do this for tons of samples so that the network will be able to guess accurately for all situations. Each time, you just nudge it a little bit closer to the right answer. If you change the formula too much, it'll only guess the right answers for inputs that it's seen before (this is called "overfitting").

So, what kinds of things can you learn with a neural network? Well, you can use it in any scenario where you have some inputs and you want to know some outputs. A good example is the weather. Let's say that you know a bunch of things (inputs) about today's weather (such as current temperature, humidity, etc.) and you want to know what the temperature will be like tomorrow (an output). Well, you can go back into historical weather data and start "training" a neural network to guess correctly at the next day's temperature. Once you're satisfied with how well the network is performing, you can run it on today's inputs and find out what the weather might be tomorrow!

So how does that apply to Mario? The inputs could be things like Mario's position, the positions of enemies, the positions of coins, etc. Or maybe the inputs are just the color of each pixel on the screen (that's a lot of inputs). Easy enough! So what about the outputs? Well, that's which buttons Mario should press (A, B, Left, or, Right). Swell! We have all that we need to make a neural network that can play a whole game of Mario! We're going to be rich!

Let's Play Mario!

But wait... we need to know the "right answers" to be able to calculate error and train our network. What are these "right answers"? We could train it off of a human player by saying that the right answer is whatever buttons the human presses. But if we do that, our agent will never be better than the human player (and that's no fun).

Neural networks can play Mario, but we just need a better way to train them. Instead of giving our player the right answers for some inputs (called "supervised learning"), we need to him to figure out the right answers for himself with a little bit of guidance ("reinforcement learning").

Reinforcement Learning with NEAT

The NeuroEvolution of Augmenting Topologies (NEAT for short) is a genetic algorithm which tests and "breeds" neural networks.

We start with a tiny network that basically does nothing and we generate a bunch of new networks from it; a whole batch of these new networks is called a "generation." Then we mutate them (muahaha!) by making the formula a little more complex.

Once we have a bunch of little players, we make them all play Super Mario Bros. We select the best few networks (called the "elite") and we cross them over. This is kind of like breeding in the sense that we take parts from the parent networks to make the offspring. Again, we mutate all of the new networks slightly, just to keep it interesting.

"Better Agents"

So how do you know which networks play "better" than others? In order to score these players, we had to come up with a formula that comes up with a score! This "fitness function" is then used to select the elite agents from the generation.

Our fitness function takes into account:

how far the player got in the level,
whether or not the player beat the level,
how long it took them to finish,
and how difficult the level was.

Any player can be scored using our little function, even a human player!

The idea behind NEAT is that, over time, we should get better and better players because they fight to the death in Battle Royale. One neat (no pun intended) thing about this algorithm is that most of the work is letting all of the agents play their levels. Because all of these plays are independent, we can run this computation on a bunch of different computers to get crazy-good speeds.

Reinforcement Learning with DQN

A little while go, a project called "DeepMind" came out with a new twist on an older algorithm called "deep Q learning" which uses "deep Q networks" (DQN). Using this method, they were able to train a player to play Breakout on a simulated Atari! The idea behind the algorithm is a bit different from NEAT.

The Q Function

Pretend that there's a magical function, Q. You give it the state of a game (what you see on the screen right now) and a game action (a button to press) and it spits out the next game state (the very next screen) and a reward (how good that action was).

If you had a magical function like this, you could just look at all of your actions, say "Q, give me all of the rewards for all of these actions" and pick the best one.