Welcome to the first lesson of Pommerman. In case you had any problems with setting up Python, Pycharm or the Pommerman environment please inform the education committee so we can adjust the guides.
In case you have any remarks or questions on these tutorials they are always welcome, preferably via the slack channel wiki-content-feedback. (You can notify the education committee specifically by adding @educo to your message.) You may also send us an email at education@serpentineai.nl.
In this first lesson we are going to talk about the following things:
serpentine/run.py
and how to add different bots.All the code examples presented in this tutorial are hosted on a GitHub repository. Downloading the code from GitHub can save you a lot of typing, but I strongly recommend that you type the code yourself.
At the beginning of each lesson, there will be three GitHub links that can be useful while you work through the lessons. The Browse link will open the GitHub repository for Pommerman at the place where the changes for the lesson you are reading were added, without including any changes introduced in future chapters. The Zip link is a download link for a zip file including the entire game up to and including the changes in this lesson. The Diff link will open a graphical view of all the changes that were made in the lesson you are about to read.
The GitHub links for this lesson are: Browse, Zip, Diff.
The links are currently only available for members.
Pommerman is stylistically similar to Bomberman, the famous game from Nintendo. Every battle starts on a randomly drawn symmetric 11x11 grid (or 8x8 for 1 vs 1). There are four agents, one in each corner. If applicable, the agent's teammate will be on the kitty corner (diagonal).
Besides the agents, the board contains wood walls and rigid walls. The agents will have accessible paths to each other. Rigid walls are indestructible and impassable. Wooden walls can be destroyed by bombs. Until they are destroyed, they are impassable. After they are destroyed, they become either a passage or a power-up.
The agent starts with one bomb. Every time it lays a bomb, its count decreases by one. After that bomb explodes, its count will increase by one. The agent also has a blast strength that starts at three. Every bomb it lays is imbued with the current blast strength, which is how far in the vertical and horizontal directions that bomb will effect.
A bomb has a life of 10 time steps. After its life expires, it explodes and any wooden walls, agents, power-ups or other bombs in its range (given by the blast strength) are destroyed. Half of the wooden walls have hidden power-ups that are revealed when the wall is destroyed:
The game ends when only players from one team remain. Ties can happen when the game does not end before the max steps or if both teams' last agents are destroyed on the same turn.
In any given turn, an agent can choose from one of six actions:
Action | info | int |
---|---|---|
Stop | This action is a pass. | 0 |
Up | Move up on the board. | 1 |
Left | Move left on the board. | 2 |
Down | Move down on the board. | 3 |
Right | Move right on the board. | 4 |
Bomb | Lay a bomb. | 5 |
Power up | Info |
---|---|
Extra Bomb | Picking this up increases the agent's ammo by one. |
Increase Range | Picking this up increases the agent's blast strength by one. |
Can Kick | Picking this up allows an agent to kick bombs. It does this by running into them. The bomb then travels in the direction that the agent was moving at a speed of one unit per time step until they are impeded either by a player, a bomb, or a wall. |
The observations that you will get from the game is a dictionary, a key which stores a value. Every observations each agent receives includes at least:
Key value | Type | info |
---|---|---|
board | np.array(11, 11) | The game board as a two dimensional array |
position | [int, int] | The agents x, y (row, col) position in the grid, the values are 0 up to and including 10. |
ammo | int | The agent's current ammo (number of boms) |
blast_strength | int | The range of the bomb fire (in x and y) |
can_kick | int (0 or 1) | Whether the agent can kick or not. |
teammates | int | Which agent is the teammate, if there is no teammate the value is -1 |
enemies | [int, int, int] | Which agents are the enemy, if there is a teammate the value is -1 |
For now that is enough information to start with, the complete list can be found here:
Before we go on programming our bot it might be nice to understandhow the game is run.
The next tabs will take a look at the serpentine/run.py
and walk through the main
function.
TL;DR:
# Print all possible environments in the Pommerman registry
print("Possible game modes:\n\t- " + '\n\t- '.join(pommerman.REGISTRY), end='\n\n')
# Create a set of agents (exactly 2)
agent_list = [
MyAgent(),
agents.SimpleAgent(),
]
# Make the "One vs One" environment using the agent list
env = pommerman.make('OneVsOne-v0', agent_list)
BaseAgent()
- This is the class that all agents inherit from.RandomAgent()
- This randomly selects an action and plays it out.SimpleAgent()
- This is an agent based on a non-ML approach (This agent is prone to killing itself).PlayerAgent
- This is an agent controlled by a keyboard.
PlayerAgent()
Arrows = Move and Space = Bomb.PlayerAgent(agent_control="wasd")
for W,A,S,D = Move, E = Bomb.# Run the episodes just like OpenAI Gym
for episode in range(1):
state = env.reset()
done = False
while not done:
# This renders the game
env.render(do_sleep=False)
# This is where we give an action to the environment
actions = env.act(state)
# This performs the step and gives back the new information
state, reward, done, info = env.step(actions)
print(f"Episode: {episode + 1:2d} finished, result: {'Win' if 0 in info.get('winners', []) else 'Lose'}")
env.close()
env.render()
shows the environment on the screen, and the do_sleep=False
speeds up the game, otherwise there is a 1 second delay between steps (if do_sleep=True
). If the entire line is removed the game would play (silently) in the background.act
of MyAgent
in my_agent.py
is asked to give back an action. See the actions above for valid replies.Now note that the indents of 4 spaces (tabs) are an integral part of the Python language. Anything that is indented more to the right is a sub part of a
for
,while
,if
,class
,def
, etc.. statement. For example thefor
on line 2 is terminated after line 15. And thewhile
loop on line 5 is terminated after line 13.
So that was a lot of information about the environment, so now let's go back to our bot serpentine/my_agent.py
and start testing some actions with it. In the act
method, there is a return statement Action.Stop
, this is the action that will be executed by the agent on every turn. Taking a look at the action
table above, the agent is performing a pass, not doing anything. This might win us the game against random bots, but not against a simple bot. So lets change it to a different action, for example to the right:
def act(self, obs, action_space):
# Main event that is being called on every turn.
return Action.Right
To test it run the serpentine/run.py
file, either by right clicking and pressing run in Pycharm or using the terminal:
python -m serpentine/run.py
The agent is now moving at least, but by going only in one direction is still not a good strategy. In order to chain our actions we are going to use an event queue.
The idea is as follows, we will add our movement actions to a queue and we will return the first item in the queue. And as long as there is still an action in the queue we are going to keep returning items from the queue.
In order to initialize our queue we are going to write it in the __init__
function. If we would do this in the act
function the value will be reset on every call. For a detailed explanation you can click here to check the python tutorial about classes.
To make the queue available to all methods inside the class we are going to prefix it with self.
, now we can call the queue using self.queue
. An implementation of a queue in python is a list, so let's create an empty list with the name queue below the super statement (don't forget the prefix).
Now that there is a queue we can move to the act
part, where we first need to check if the queue is empty. If the queue is empty we will append the movement to the right and left. If you then run the program you should see the agent move back and forth.
For the final step try and let the agent move in a square, with the top left being it's start position.
You can compare your answer with ours using the specific code tab.
def __init__(self, character=characters.Bomber):
super().__init__(character)
self.queue = []
def act(self, obs, action_space):
# Main event that is being called on every turn.
if not self.queue:
self.queue.append(Action.Right)
self.queue.append(Action.Left)
return self.queue.pop()
def act(self, obs, action_space):
# Main event that is being called on every turn.
if not self.queue:
self.queue.append(Action.Right)
self.queue.append(Action.Down)
self.queue.append(Action.Left)
self.queue.append(Action.Up)
return self.queue.pop()
Now that we have are finished implementing this we are going to commit this to the file management system. In the terminal (which you can find within PyCharm on the lower left) we are going to type the following commands:
git add *
git commit -m "I made my first changes to the code! :)"
git push
Git push is only for the users that have created a repository on GitHub. Now if you would go and check it out in your own repository on GitHub.com you will see that the file
serpentine/my_agent.py
has been updated with the above code.
In this lesson we have taken a look at the game description and how our agent is interacting with the game. After which we run through the code in main.py
and seen that our agent receives an observation and returns an action (int).