Welcome to the Machine Learning (ML) track. This track can also be called the Artificial Intelligence (AI) track or Deep Learning track. The end goal of this track is to be able to design, build and test a self made machine learning algorithm by using a deep learning framework. Now that we know what the goal is, let us take a look at how we get there. The road might be long, but you probably know the saying.
“A Journey of a Thousand Miles Begins with a Single Step” (Lao Tzu)
“The journey of a thousand lines begins with a single character”
The chapters are written in the following way:
The whole track is currently based on Python, since it is one of the easier languages to get the machine learning packages working. In case you have no experience yet, there will be some material explaining the basic syntax of Python further on in this tutorial.
To get you started, make sure that you have everything ready. This means that you have installed Python, Pycharm and possibly Git. You can use the following guides to help you install everything:
If you are not yet familiar with Python you can take a detour by going over the Rule based
section first. There is a track about Pommerman
that can teach you python at the same time. For specific reminders or full external provided lessons these materials can help you get familiar with Python:
With this new setup we are going to introduce you to a new package called gym
. This package is a good starting point for machine learning, since it contains a unified interface for a lot of different games. What gym
is and how to work with is is show on the following page:
With gym installed it is time to build an AI algorithm for the game CartPole
. It is going to be an imitation network using the Deep Learning Framework Tensorflow. The game CartPole is considered solved when the average score of the last 100 games is higher than 195 points. So let's dive in!
Now let's try to do the same thing on the gym game Taxi-v3 and see how that works out. Check the following tutorial:
Now you might have noticed how this technique that worked great for one game completely broke for another game. This is one of the reasons why ML can have such a steep learning curve, often we have to rethink the model based on the problem we have at hand. It is important to learn about your environment and see when certain kinds of techniques can be used. We have seen that for continuous problems the imitiation method is very applicable. However for discrete states represented by a number, such as Taxi it is unable to solve the game. In the next chapter we are going to take a look at the concept of a Q-table to solve Taxi-v3.
In order to move on to the next part it is useful to understand classes
and inheritance
. As you have seen from the first two examples, the code to run Taxi was very similar to running CartPole. To prevent having to redo this work every time, we will sometimes provide you with a starting template or testing framework. In order to make sure your code is compatible we will provide you with an Abstract base class that you have to inherit and implement. This is explained in more depth in the following tutorial.
The first network we build was an imitation network, now we are going to take a look at Q-tables. This is the easier variant of Deep Q-Learning Networks (DQN), which is a very famous technique that helped Google DeepMind solve a lot of Atari Games in their papers [1][2]. The following tutorial helps you understand what a Q-table is and how you can use it to solve Taxi-v3.
Now that we have learned how to solve it using numpy, we are also going to solve it using Keras. The steps are very similar, but now instead of having the Q-table be exact, we are going to approximate the q-values by using a model, don't worry if you do not understand it yet, this will be explained in:
Padam, solved it. Now that we have talked about Q-learning, you can apply this algorithm in a lot of different scenarios. Try and solve CartPole using this technique, for questions feel free to contact the education committee via the slack channel #ec-help-me or email education@serpentineai.nl.
In the previous lessons we have talked about imitation learning and q-learning, which are showing some implementations of AI. Both of these make use of Gradient Descent Policies. To understand what is happening inside these algorithms we are are going to implement a gradient descent policy ourselfs.
Our first implementation into Policy Gradient Descent is going to be on the game of CartPole. A good reason to start on a small problem is that it gives a quick indication if the game is converging or not, if you have to run the program for a day to see if it works it will slow down testing quite a bit.
Now in order for this technique to work on more complicated games such as Pong we have to perform some pre processing on the image. This pre processing reduces the complexity of the input image. In order to make the game of Pong compatible with our CartPole code, we will have to change a few more things. In the next tutorial we are going to show you what you have to change in order to make the CartPole code work for a very different game, namely Pong.
This is an intermediate chapter, meant to provide you with effective tools for code reusability and proving that your code is doing what it is supossed to do. We are going to implement a preprocessing wrapper for Pong and use unit testing to prove that they are indeed correct.
The wrapper is a good technique to split up the image/observation processing from an agent. By providing several wrappers they can easily be reused for different gym games. A second advantage of using wrappers is that they can be made (somewhat) independent of the game. This makes them prime targets for unit testing. By performing unit tests you can show that your code is doing what it should do under specified circumstances. If you would later refactor the code, you can run the unit tests again to see if the functionality did not change. In larger projects this helps to track bugs across different modules more efficiently.
For now this is planned to be the last introductionary lessons into the AI field. The goal of this chapter is to implement one of the famous algorithms from Google DeepMind, a DQN from the paper ''Playing Atari with Deep Reinforcement Learning'' (which was mentioned as a footnote earlier[1:1]). For this implementation it is handy to have the GPU version of Tensorflow installed if you have not installed it yet (GPU Tensorflow).
Another thing that might be handy is to run multiple instances of the game at the same time to speed up the training process. For this there is the AI-training Framework, which has already got a prebuild (and tested) multiprocessing setup. This is optional, but it also contains some out of the box implementations of replay memories, base models and wrappers, which will be referenced in the next tutorial.
With all these new setups let us get started with breaking down breakout:
That is all for now. You have finished the Machine Learning introduction track. Thank you for bearing with us and hopefully you have learned a lot about using machine learning in games. In order to keep improving our lessons, please tell us what you though of it and contact the education comittee via the slack channel #ec-helpme or email education@serpentineai.nl.
Welcome to the miscellaneous reading section. When you are reading this it probabaly means that you cannot believe that the tutorial serie ended there, and you are absolutely right. There is so much more to explore in machine learning, but we wanted to give a small basis on which you can expend your knowledge.
From here on out we need your help to create new content! These sections are constantly under development and are updated based on suggestions that you make to us. Pick a subject you want to known more about, and ask us for material, there is a big chance there is already some tutorial about it. And if there is something unclear feel free to contact the education committee.
For a selective view of possible knowledge deepening, this short list has been constructed:
In case you are having questions or lesson ideas, please contact the education comittee via the slack channel #ec-helpme or email education@serpentineai.nl.