My quest to real A.I.

Tuesday, October 13, 2015

Reinforcement Learning - Flappy Bird. Brian Gatt. + 3D Flappy Bird with Reinforcement Learning

source: http://www.brian-gatt.com/portfolio/flappy/documentation.pdf

Department of Computing - MSc. Computer Games and Entertainment

IS71027B – AI for Games

Dr. Jeremy Gow

Term 2 - Assignment 1 (2013-2014)

Reinforcement Learning - Flappy Bird

Brian Gatt (ma301bg)

28 March 2014

Brian Gatt	IS71021A – Splines, Interpolation and Quaternions
Table of Contents
Reinforcement Learning - Flappy Bird	..........................................................................................................1
Introduction ..............................................................................................................................................	3
Aims and Objectives..................................................................................................................................	3
Background ...............................................................................................................................................	3
Reinforcement Learning........................................................................................................................	3
Design........................................................................................................................................................	4
Implementation ........................................................................................................................................	4
Scene.....................................................................................................................................................	4
Configuration ........................................................................................................................................	5
Classes and Interfaces...........................................................................................................................	6
User Manual..............................................................................................................................................	7
Prerequisites .........................................................................................................................................	7
Launching the Project ...........................................................................................................................	7
Initiating the Simulation........................................................................................................................	7
Evaluation .................................................................................................................................................	7
Conclusion.................................................................................................................................................	8
References ................................................................................................................................................	8
Appendices................................................................................................................................................	9
List of Included Log files........................................................................................................................	9
Source Repository.................................................................................................................................	9
Video Demonstration............................................................................................................................	9
Screenshots...........................................................................................................................................	9

Page 2 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Introduction

The following is the documentation for an application of reinforcement learning on the popular game Dong Nguyen’s ‘Flappy Bird’. An overview of the attached deliverable is provided explaining the aims, background and implementation details. The appendices section contains proof of the resulting program.

Aims and Objectives

The main aim of this project is to apply and implement the AI technique of reinforcement learning. Dong Nguyen’s popular ‘Flappy Bird’ game was chosen as a test bed to implement reinforcement learning in a simple game context. The following quote is the original project proposal which summarises the intent of the project:

The project will consist of a ‘Flappy Bird’ implementation using the Unity game engine and C#. The bird will be automatically controlled using reinforcement learning, in particular Q-Learning. The deliverable will provide a visualization of the learning process (on-line learning) with accompanying statistics in a text file format. Certain attributes will be made available for modification via the Unity editor interface.

In the end, all of the intended objectives were achieved while also allowing for on-line learning from player input.

Background

Reinforcement Learning

Reinforcement learning is a machine learning technique which is prevalent in today’s modern games.

Agents learn from their past experience which in turn allows them to better judge future actions. This is achieved by providing the agent with a reward for their actions in relation to the current world state.

Reinforcement learning is based on three important factors:

1.The reward function – A function which rewards or punishes (perceived as a negative reward) the agent for the action he just performed.

2.The learning rule – A rule which reinforces the agent’s memory based on the current experience and reward.

3.The exploration strategy – The strategy which the agent employs in order to select actions.

The reward function, learning rule and exploration strategy generally depend on the current world state; a set of variables which define what the learning rule will take into consideration to adapt.

One popular implementation of reinforcement learning is Q-learning. It is a learning rule which evaluates Q-values, values which represent the quality score of a state-action tuple. The learning rule returns a Q-value by blending the prior state-action Q-value and the best Q-value of the current state. This is represented as:

Page 3 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

( , ) = (1 − )( , ) + ( + max( (′, ′)))

The variable is the learning rate which linearly blends the Q-values, is the perceived reward and is the discount factor which defines “how much an action’s Q-value depends on the Q-value at the state (or states) it leads to.” (Millington & Funge, 2009). Both and are bound between 0 and 1 inclusive.

Reinforcement Learning requires time in order to train to a state where agents can react in reasonable manners. This is all dependant on the parameters chosen for the learning rule, the rewards and how the world is encoded (discretizing the world state in a reasonable manner). The data structures used to store this information can also hinder the experience.

Design

The design for this implementation closely followed GitHub user’s SarvagyaVaish implementation (Vaish, 2014).

Vaish uses Q-Learning in order to train the ‘Flappy Bird’ character using a reward function which rewards the bird with a score of 1 on each frame the bird is alive. Once the bird dies, the reward function heavily punishes with a score of -1000. The learning state is defined as the horizontal and vertical proximity between the bird and the upcoming pipe hazard. The bird is allowed to do any of the available actions (Do Nothing, Jump) at any time during the simulation.

We took some liberties in relation to Vaish’s implementation in order to make the implementation easier and adapt it to the Unity engine. One clear example being the data structures used to store the character’s experience. Vaish uses a multi-dimensional fixed size array based on the maximum distances between the bird and the pipes on the horizontal and vertical axis. He then creates a basic hash mechanism and stores Q-values in this data structure, overwriting values which exceed the minimum and maximum in the lowest or greatest element respectively. In our case, we use a simple map or dictionary and store the experiences accordingly.

Implementation

Following are some details on how to configure the AI parameters and how the implementation is defined in terms of the major Unity game objects, components, and behaviours.

Scene

The scene mainly consists of the main camera, the ‘Flappy Bird’ character and hazard spawn points. The ‘GameController’ game object is an empty game object which is used to control the overall game. The ‘Destroyer’ object is used to destroy previously instantiated hazards in order to keep the game object instance count to a minimum. The ‘Cover Up’ game object is essentially a cover layer which hides what is occurring behind the scenes so that users are not able to see hazard spawn points and destruction points. Finally, the ‘ceiling’ game object was necessary to avoid cheating. During implementation, there were cases where the bird was exploring areas beyond its intended space, allowing him to stay alive yet not achieving a better score since he was, literally, jumping over the pipes.

Page 4 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Configuration

AI parameter configuration is achieved by modifying the variables within the ‘Player Character’ script component attached to the ‘Flappy Bird’ game object. Below is a screenshot of the mentioned component.

Figure 1: ‘Flappy Bird’ Configuration

The ‘Controller Type’ variable lets the user choose between ‘InputController’, ‘AIController’ and

‘HybridController’. The ‘InputController’ is an implementation artifact which was used to test the game mechanics. It does not learn and only reacts to user input. The ‘AIController’ is the reinforcement learning controller strategy which learns and acts on its own. The ‘HybridController’ is an extension of the ‘AIController’ which learns but also allows user to provide his own input to allow the learning algorithm to learn from a different source.

The Intelligence properties expose the learning parameters previously mentioned in the ‘Background’ and ‘Design’ sections. Note that the ‘Saved State’ field is an implementation artifact. It was intended to allow learning to be resumable from different sessions, alas, encoding the state of the learning algorithm within the log file became unwieldy due to excessive file sizes so it was abandoned. The ‘Precision’ field specifies the scale for how many decimal places for the bird-pipe proximity are taken into consideration.

The ‘GameController’ game object exists solely to host the ‘Game Controller’ script controller which hosts minor configuration parameters for the overall game. Below is a screenshot of the mentioned component:

Page 5 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Figure 2: ’GameController’ Overall Configuration

Please note that the ‘Best Score’ field does not affect the game per se but is only there as a visual cue to keep up with the best score recorded.

To speed up the simulation, simply modify the Unity time engine parameter from ‘Edit’ – ‘Project Settings’ – ‘Time’ and modify the ‘Time Scale’ parameter.

Classes and Interfaces

Following is a brief overview of the major classes and interfaces which compose the overall implementation.

GameController

Manages the overall game flow by managing the state of the game and restarting accordingly. It is based on the singleton pattern and is persisted across scene loads. It stores the state of the AI algorithms on death of the player and restores them once the level is initiated.

PlayerCharacter

Contains the behaviour of the player character. The strategy pattern is used to represent the controller implementations and is switched on scene load accordingly. The update event delegates to the underlying controller implementation.

AIController

A Controller implementation which uses the Q-Learning algorithm to store and base the actions of the player character.

CoalescedQValueStore

An IQValueStore implementation. This follows Millington’s and Funge’s (Millington & Funge, 2009) recommendations by coupling the state and the action as one entity. This implementation follows the QValueStore implementation which was used in earlier stages of development. We were initially afraid that the original version had multiple hash collisions so we implemented the coalesced version which provided better, more stable, results.

QLearning

The implementation of the Q-Learning algorithm according to Millington and Funge.

Page 6 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

User Manual

Prerequisites

Please ensure that Unity is installed on your system. This project was developed and tested using Unity version 4.3.

Launching the Project

In order to launch the project, double-click on the ‘MainScene’ scene file located in ‘Assets/Scenes/’. Alternatively, open Unity and via the ‘Open Project…’ menu item, navigate to the top-level directory of the project and launch it from there.

Initiating the Simulation

In order to start the simulation, simply click the ‘play’ button in the Unity editor. Ensure that prior to starting the simulation, the parameters are set up correctly. Certain parameters can also be modified mid-run. It is recommended that for mid-run parameter modification, the simulation is paused via the ‘pause’ button in the Unity editor so that it is easier to modify AI parameters accordingly.

Evaluation

Based on our implementation and the generated log files (refer to ‘Appendices’), it takes time for the character to learn the problem. Important elements which defines the learning algorithm are the space quantization parameters and the underlying data structures used to store the experiences. Imprecise space quantization precision can lead to faster results but the character will start to generalize quickly. On the other hand precise values lead to longer training but more fine-tuned results. Following is a chart which shows the best run we managed to achieve (refer to the attached ‘FlappyBirdRL.best.1x.csv’ for a detailed overview):

		Final Scores VS Run Instances
200
180
160
140
120
100
80
60
40
20
0
0	200	400	600	800	1000	1200	1400	1600	1800

Page 7 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Whether or not reinforcement learning is useful for this type of application is debatable. According to the one of the comments on SarvagyaVaish repository, ‘Flappy Bird’ uses a deterministic physics model where both the height jump and width can easily be computed.

Figure 3: Flappy Bird's deterministic physics model (Vaish, 2014)

Based on these two values, a simpler scheme can be used. Another project (Jou, 2014) shows another implementation of this concept but focusing on computer vision. We believe that this project exploits the deterministic physics model in order to achieve its results.

Conclusion

Reinforcement learning is an AI technique which allows agents to learn and adapt via a reward- punishment system. This technique is implemented and applied on Dong Nguyen’s ‘Flappy Bird’ and its implications are evaluated. During development, testing was continuously performed in order to ensure that the requirements are met and the deliverable is of a high quality. The appendices section shows a running demonstration of the artefact.

References

Jou, E. (2014, February 24). Chinese Robot Will Decimate Your Flappy Bird Score. Retrieved from Kotaku: http://kotaku.com/chinese-robot-will-decimate-your-flappy-bird-score-1529530681

Millington, I., & Funge, J. (2009). Artificial Intelligence for Games. Morgan Kaufmann.

Vaish, S. (2014, February 15). Flappy Bird RL. Retrieved February 21, 2014, from Github Pages: http://sarvagyavaish.github.io/FlappyBirdRL/

Page 8 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Appendices

List of Included Log files

FlappyBirdRL.3.x1 – Sample log file (using non-coalesced QValueStore).

FlappyBirdRL.4.x1 – Sample log file using different parameters (using non-coalesced QValueStore).

FlappyBirdRL.coalesced.x4 – The log file generated from a 4x speed up when using the

CoalescedQValue store.

FlappyBirdRL.video.x2 – The log file generated by the run which is shown in the video linked below.

FlappyBirdRL.best.x1 – The log file generated by what we consider, the best (and longest) run, in which 176 pipes are recorded as the best score when using the CoalescedQValue store.

Source Repository https://bitbucket.org/briangatt/flappy-bird-rl-unity

Video Demonstration

http://youtu.be/H9y9sfFHen0

The video shows a demonstration of the attached deliverable. The game was sped up by a factor of 2 in order to keep footage short while showing the concept of reinforcement learning applied to ‘Flappy Bird’.

Screenshots

Figure 4: Screenshot

Page 9 of 10

Brian Gatt

IS71021A – Splines, Interpolation and Quaternions

Figure 5: Screenshot

Figure 6: Screenshot

Page 10 of 10

================================================================

3D Flappy Bird with Reinforcement Learning

Hang Qi	Jian Gong	Lunbo Xu
hangqi@cs.ucla.edu	jiangong@cs.ucla.edu	lunbo_xu@cs.ucla.edu

ABSTRACT

In this work, we present a reinforcement learning method to allow a arti cial bird to survive in a randomly generated environment without collision with any obstacles by deciding when to ap the wings. The setting is based on a very popular mobile game called FlappyBird. We implemented it in 3D with WebGL, making it easy for public access and experiments with browsers. The bird is given two diﬀerent perception models: (1) instant fovy, (2) fovy with short-term memory. In both models, the perception state is represented by a 16 1 vector, consisting of coordinates of the two pillars in front of the bird. A reinforcement learning algorithm Q- learning is used to learn a decision policy, powered by which the bird will be able to survive by repeatedly deciding which action to perform given its current perception state. The experiments show our Q-learning algorithm allows the bird to survive for more than 200 pillars.

Keywords

Arti cial Life, Reinforcement Learning, Flappy Bird

1. INTRODUCTION

Recently, as the mobile game app Flappy Bird hitting the market and becoming popular, people complaint about how hard it is to accomplish the task preventing the bird from crashing into pillars. As human are trying hard to adapt the physics in the game, we found it is promissing to use reinforcement learning [8] algorithms to let the bird learns a decision rule by itself through interactions with the environment.

In this work, we modeled the physics in a graphics environ- ment, simulate the perception of the bird by feeding in the relative coordinates of selected key points of the pillars, and quantize the relative position between the bird and pillars into a state space with reasonable size. A decision policy be- tween to jump or not to jump at each state is learned using Q-learning algorithm[10].

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee.

In this report, we will briey talked about the motivation and related work in section 2. Section 3 will discuss our formulation of the world and the learning problem. Section 4 will present the learning algorithm we used to learn the decision policy. Tools and implementation details will be presented win section 5. Finally, section 6 and section 7 will include our experiments and discussions on future directions.

2. RELATED WORK

Recently a mobile game app called Flappy Bird, hit the market and became very popular on Android and iOS plat- form. This game was originally created by Nguyen Ha Dong, a Vietnamese developer. In this game, a player can control a bird to make a jump by tapping the touch screen. The goal is to let the bird y as far as possible between columns of green pipes without coming into contact with them. A plethora of Flappy Bird copies and clones emerged, such as Squishy Bird [4], Clumsy Bird [5], and even MMO version [1]. The popularity of the game not only originates from its simplicity, but also from the challenges it raises to human's adaptiveness to its dynamics.

Although the rule of the game is simple, it is agreed to be a challenging to make the bird y over more than ten columns of pipes. It requires the player to keep sending commands to the bird precisely at the correct moments with high concentration and fast action.

Some works have been done to let the bird become intelli- gent enough to y by itself. FlappyBirdRL [9] is one of the works which used a 2D model and reinforcement learning to teach the virtual bird y. It assumes that the bird fully knows the con guration of the environment, including ob- stacles out of reach from the bird's sight, which is obviously not true for a real bird.

Thus, in our project, we try to overcome this limitation by explictly modeling its eld of view. In our 2.5D world, we de ned the eld of view of the bird in vertical direction, which means the bird can only see things that is within a certain degrees of its sight. Particularly, anything column that is too high or too low will fall outside of the bird's eld of view. Detailed modeling method is discussed in the following section.

Figure 1: Occluded points or those not in fovy (red points) are unknown to the bird. The bird know only what it saw (green points) with instant fovy perception. The eld of view of the bird is repre- sented by white rays. RGB lines are the axises of the world coordinate system.

3. FORMULATION

In this section, we introduce our modeling and formulation of the world.

3.1 World

Although the bird is ying in a 3D world, the space is essentially represented in 2D. In our 2D model, the bird is represented as its bounding box, a Wb Hb rectangle. The bird is given a constant horizontal velocity vx to y, whereas the vertical velocity vy is controlled by gravity by default. Whenever the bird chose to perform a jump action, a vertical velocity increment ∆vy is added.

Cylinder pillars are represented by rectangles as well. The y positions of the gap between two pillars in the same col- umn are generated randomly. The bird is dead and therfore game is over when the bounding box of the bird overlaps with pillars. To pervent the game to be too diﬃcult to ac- complish, however, the gap between pillars in the same col- umn, the distance between two adjacent columns, and the width of the pillars are xed to constant values (yellow lines in Figure 1).

3.2 Perception

To learn a decision policy, the input to the learning algo- rithm is the the perception of the bird of course. However, to simulate the real vision as a full-size image directly would have introduced too much complexities in segmentation, tri- angulation, 3D reconstruction, and vision feature extraction. Given that we want to focus our work in reinforcement learn- ing, we consider the perception in two scenarios based on diﬀerent assumptions.

3.2.1 Instant Fovy

In the rst scenario, we assume that the bird can only see two nearest columns of pillars and cannot see occluded corners. Each column is abstracted as four key points (i.e. corners) as shown in Figure 1. Hence, eight coordinates rela- tive to bird's eys position f(xi; yi)g8i=1 are used to represent the vision. For those occluded points or points not in the fovy (i.e. eld of view y), they are marked unknown to the bird by setting x = 1 explicitly. (1; 1) is used to repre- sent the occluded corner above the bird, whereas ( 1; 1) is used for points below the bird.

Figure 2: With short-term memory, the bird not only knows what it sees at the current posi- tion (green points), but also remembers what it saw(yellow points).

Figure 3: From a given state s, the bird can perform two actions (1) jump and (2) not jump, each of which leads the bird to a new state.

3.2.2Fovy with Short-term Memory

In the second scenario, in addition to perception, we as- sume the bird has a short-term memory so that it will re- member the existance of the corner once he saw it. In this case, we can simply give all the four corners to the bird with- out marking any occlusion explicitly since the bird will see all the points anyway as it jumps up and down. 2.

3.3States and actions

Given the above discussion on perception, we can repre- sent the state of the bird by a 16 1 vector

s = (x1; y1; x2; y2; x3; y3; x4; y4)

consisting of coordinates (relative to the bird's eyes) of the eight key points of the nearest two pillars. Real numbers are rounded to their nearest tenth to limit the size of the state space. It is clear that in the case of instant fovy, the state space are smaller due to the occlusion marks.

The action leads to the transition between one state to another. In our con guration, the bird can only perform two actions: (1) jump, (2) not jump as shown in Figure 3. Give the current state st, the next state st+1 can be easily obtained by computing the movement of the bird according to the action at performed.

4. LEARNING

To let the bird survive in the game, we want to learn a decision policy at = (st) which decides an action at for each state st that maximize the total reward accumulated over time t:

∑1

tRat (st; st+1);

t=0

where is a discount factor, Rat (st; st+1) is the reward gained from action at which causes the state transition from st to st+1.

If we know the exact transition probability P(s; s′) and the reward of the states V (s), we can solve the optimal pol- icy that maximize the expected reward from the following

equation:		{∑s′			))}:
	max		P(s; s′)(R (s; s′) +	′
(s) = arg	a		a	V (s

However, in our game environment, we assume the bird has no prior knowledge about the world, so that the transition probability and rewards are unknown to us. This leads to the use of reinforcement learning[8].

In the framework of Q-Learning[10], a speci c reinforce- ment learning algorithm, we want to learn a policy in the form of a quality table

Q : S A ! R;

which maps a state s 2 S and an action a 2 A to a real number r 2 R that measures the reward of performing action a at state s. Once this quality table is learned, the decision can be easily made at any given state s as follows:

a = arg max Q(s; a):

a2A

Starting from a arbitary Q, the algorithm of Q-learning update the table iteratively using equation

Qt+1(st; at) = Qt(st; at) + [Rt+1 + F Qt(st; at)] ;

where F = maxa Qt(st+1; a) is the maximum expected fu- ture reward at state st+1, is the learning rate controlling the extend to which we want to forget the history, is the discount factor for the expected future reward. This equa- tion ensentially says an action at will get higher reward if it lands to a state st+1 that has high reward and we are not expected to die fast at that state. It has been proved that Q-learning algorithm is guaranteed to converge to the optimal decision policy.

5. GRAPHICS

This project is implemented with WebGL technique, which enables to render complex graphics on HTML5 canvas with JavaScript. Our project can run directly in the browser without the help of any other plugins eﬃciently. This makes it easy for public to access our project.

In this section, we'll introduce the major development tools, and some graphics techniques used in this project. All the graphics related functions and APIs are wrapped in the graphics.js le.

5.1 Tools

Three.js [2] is a lightweight cross-browser JavaScript li- brary/API used to create and display animated 3D com- puter graphics on a Web browser . Three.js scripts may be

Figure 4: Background scene is built with skybox. It consists of six properly aligned images.

used in conjunction with the HTML5 canvas element, SVG or WebGL. It greatly speeds up our development process.

Blender [3] is a free and open source 3D creation software, which is used in this project for building and converting 3D models, including the greek columns1 and the apping bird2.

Chrome Developer Tools are used to assist the interface design and most importantly, to debug JavaScript functions and APIs.

5.2 Scene Rendering

The background scene in the game is created with Sky- Box technique [6]. Cube mapping is performed to project images on to diﬀerent faces of the cube. As shown in Fig- ure 4, images were properly aligned on the box. As the cube moves with the camera, it creates an illusion of distant three-dimensional surroundings. In our implementation, a separate camera is used to build the skybox3.

The animation in the scene is mainly controled by the enforcement learning module to decide the exact moment the bird need to make a jump. Whenever the bird jumps, the model of bird will play an prede ned animation to ap.

The point light is placed close to the setting sun in the background. We also adjusted the colors of both point light and ambient light to render the scene more realistically.

6. RESULTS

Under the rst senario where we assumpe the bird only aware points in the fovy, the bird's intelligence is evolving faster due to the limited size of the state space. Figure 5 plots the score against generation. Although the training speed is fast, after generation 16, it starts to converge to a non-promissing results. We believe this "degeneration" is due to the limitation of the features for reinforcement learning and the possible contradiction in the underlying randomly generated world.

In the second senario where the bird is given both fovy and short-term memory, the score is ploted against generation in Figure 6. In this case, our bird is able to y further at its best performance. However, the training process takes much more time due to the large state space.

For the sake of easy debugging, we also implemented a quick training mdoe in which we were able to train without visualize the ying bird. Quick training take much less time

1The greek column model is downloaded from http:// archive3d.net/?category=555

2The bird model is created by mirada from ro.me. [7] 3Images in skybox are created by Jochum Skoglund.

Figure 5: Reinforcement learning result if the bird aware only unocluded points in the fovy.

Figure 6: Reinforcement learning result if the bird aware all eight points by its sight and short-term memory.

since it cut oﬀ the time used for real-time graphics render- ing. It only takes ve minutes to quick training the bird to generation 18,000. However, it is interesting to witness the bird's growth from scratch (generation 0). Related video clips can be found on our project webpage.

7. FUTURE WORK

Our future work includes implementing real 3D environ- ment and modi cation of the training method. First, in- stead of ying up and down only, we want to enable the bird turning left and right. Features extracted for 3D game environment are more complicated than the one we use for 2D game model.

Besides, we also want to improve the feature extraction for reinforcement learning. First, we need to add the verticle ve- locity vy of the bird into the state space, since it is nature that the bird aware its own velocity and we strongly believe that some failure results of training were due to unaware- ness of this parameter. In addition, we would like to set a constant time interval, e.g. one second, between two consec- utive decision to prevent the bird to ying up straightly.

8. ACKNOWLEDGEMENT

We want to give special thanks to Professor Terzopoulos who gives us a great course about arti cial life and overview of related techniques. We had enough freedom when do- ing this interesting project and have learnt a lot during the process.

9.REFERENCES

[1]Flappy Bird Massively Multiplayer Online. http://flapmmo.com/, Feb. 2014.

[2]R. Cabello. Three.js. http://www.threejs.org/, Apr. 2010.

[3]B. Foundation. Blender: Open source 3D graphics and animation software. http://www.blender.org/, 1995.

[4]R. Games. Splashy Fish. https://play.google.com/store/apps/details?id= it.junglestudios.splashyfish, Jan. 2014.

[5]E. Le~ao. A MelonJS port of the famous Flappy Bird Game.

https://github.com/ellisonleao/clumsy-bird, Jan. 2014.

[6]E. Meiri. Tutorial 25 - SkyBox. http://ogldev. atspace.co.uk/www/tutorial25/tutorial25.html, Oct. 2010.

[7]Mirada. Dynamic Procedural Terrain Using 3D Simple Noise. http://alteredqualia.com/three/examples/ webgl_terrain_dynamic.html.

[8]R. S. Sutton and A. G. Barto. Introduction to reinforcement learning. MIT Press, 1998.

[9]S. Vaish. Flappy Bird hack using Reinforcement Learning. https://github.com/SarvagyaVaish/FlappyBirdRL, Feb. 2014.

[10]C. J. Watkins and P. Dayan. Q-learning. Machine learning, 8(3-4):279{292, 1992.

Saturday, October 10, 2015

UN delay could open door to robot wars, say experts

http://www.theguardian.com/science/2015/oct/06/autonomous-weapons-un-delay-robot-wars

UN delay could open door to robot wars, say experts.

Observers say UK and US are seeking to water down agreement so that any autonomous weapons deployed before talks conclude will be beyond reach of ban

A Sentry Tech remote-controlled weapon station
A Sentry Tech remote-controlled weapon station. Photograph: Supplied
Harriet Grant
Tuesday 6 October 2015 06.00 BST Last modified on Tuesday 6 October 2015 19.20 BST

Share on Pinterest Share on LinkedIn Share on Google+ Share on WhatsApp
The United Nations has been warned that its protracted negotiations over the future of lethal autonomous weapons – or “killer robots” – are moving too slowly to stop robot wars becoming a reality.

Lobbying for a pre-emptive ban on the weapons is intensifying at the UN general assembly in New York, but a deal may not emerge quickly enough to prevent devices from being deployed, experts say.

“There is indeed a danger now that [the process] may get stuck,” said Christof Heyns, the UN special rapporteur on extrajudicial, summary or arbitrary executions.

0:00

Robot swarms: will they bomb us or save us? A look at the evolution and development of intelligent robots that can ‘think’ and coordinate themselves. Link to video
“A lot of money is going into development and people will want a return on their investment,” he said. “If there is not a pre-emptive ban on the high-level autonomous weapons then once the genie is out of the bottle it will be extremely difficult to get it back in.”

Observers say the UK and US are seeking to water down an agreement so that it only includes emerging technology, meaning that any weapons put into practice while discussions continue are beyond the reach of a ban.

“China wanted to discuss ‘existing and emerging technologies’ but the wording insisted on by the US and the UK is that it is only about emerging technologies,” said Noel Sharkey, a professor of artificial intelligence and robotics at the University of Sheffield and co-founder of the International Committee for Robot Arms Control, a coalition of robotics experts who are campaigning against the military use of robots.

“The UK and US are both insisting that the wording for any mandate about autonomous weapons should discuss only emerging technologies. Ostensibly this is because there is concern that … we will want to ban some of their current defensive weapons like the Phalanx or the Iron Dome.

“However, if the discussions go on for several years as they seem to be doing, many of the weapons that we are concerned about will already have been developed and potentially used.”

Iron Dome
A rocket is launched from Israel’s Iron Dome system. Photograph: Dan Balilty/AP
Sharkey said current developments made that a likely scenario. “Governments are continuing to test autonomous weapons systems, for example with the X49B, which is a fighter jet that can fly on its own, and there are contracts already out for swarms of autonomous gun ships. So if we are tied up [discussing a ban] for a long time then the word ‘emerging’ is worrying.”

“People say that the convention on conventional weapons is a graveyard for good ideas because it’s notoriously slow moving. If we see the brakes being applied now that would take discussions into a fourth year.”

No fully autonomous weapons are yet in use, but many semi-autonomous lethal precursors are in development. One such weapon is the South Korean sentry robot SGR-1, which patrols the country’s border with North Korea and detects intruders as far as two miles away using heat and light sensors.

The robots are armed with machine guns and although currently controlled by humans from a distance, they are reportedly capable of making a decision to kill without human intervention.

Israel is deploying machine-gun turrets along its border with the Gaza Strip to target Palestinian infiltrators automatically. And the UK’s Taranis fighter jet flies autonomously and can identify and locate enemies. Although it does not yet act completely autonomously, it was described by a defence procurement minister as having “almost no need for operator input”.

Campaigners from across the fields of robotics and technology have made several high-profile pleas for a pre-emptive ban on offensive autonomous weapons in the past two years, including in a letter in July that was signed by more than 1,000 artificial intelligence researchers. The letter said offensive weapons that operate on their own would lower the threshold of going to battle and result in greater loss of human life.

Sharkey has repeatedly argued at the UN that allowing robots to make the decision to kill is wrong and fraught with risk. “We shouldn’t delegate the decision to kill to a machine full stop,” he said. “Having met with the UN, the Red Cross, roboticists and many groups across civil society, we have a general agreement that these weapons could not comply with the laws of war. There is a problem with them being able to discriminate between civilian and military targets, there is no software that can do that.

“The concern that exercises me most is that people like the US government keep talking about gaining a military edge. So the talk is of using large numbers – swarms – of robots.”

The UN convention on conventional weapons (CCW) examines whether certain weapons might be “excessively injurious or have indiscriminate effects”. This same convention was where the discussions that led to a ban on landmines and sight-blinding laser weapons began.

Only five countries have backed a ban so far, with countries such as the US, the UK and France arguing that a human will always have “meaningful control” over a robots decision to kill – a concept that is much debated.

There is high-level support within the UN for serious restrictions on lethal autonomous weapons. One possible route to a ban if the CCW cannot reach agreement on the next stage would be for states who want a ban to reach an agreement outside the UN, as happened with cluster bombs. But this would mean going ahead without the agreement of some of the major producers of the weapons.

Heyns believes the UN is a vital step in any global decision on banning certain weapons. “Some people say the CCW is where ideas go to die but that is not necessarily true,” he said. “They get aired there and we shouldn’t underestimate what has already been done. It’s not insignificant. At both meetings the states sent very high-level representatives. It’s not the end of the road, it’s a step and you can’t skip it.”
.

Wednesday, September 16, 2015

Deep Learning Machine Teaches Itself Chess in 72 Hours, Plays at International Master Level

http://www.technologyreview.com/view/541276/deep-learning-machine-teaches-itself-chess-in-72-hours-plays-at-international-master/

Tuesday, June 23, 2015

A Deep Learning Tutorial: From Perceptrons to Deep Networks.

BY IVAN VASILEV - JAVA DEVELOPER @ TOPTAL.
source link

In recent years, there’s been a resurgence in the field of Artificial Intelligence. It’s spread beyond the academic world with major players like Google, Microsoft, and Facebook creating their own research teams and making some impressive acquisitions.

Some this can be attributed to the abundance of raw data generated by social network users, much of which needs to be analyzed, as well as to the cheap computational power available via GPGPUs.

But beyond these phenomena, this resurgence has been powered in no small part by a new trend in AI, specifically in machine learning, known as “Deep Learning”. In this tutorial, I’ll introduce you to the key concepts and algorithms behind deep learning, beginning with the simplest unit of composition and building to the concepts of machine learning in Java.

(For full disclosure: I’m also the author of a Java deep learning library, available here, and the examples in this article are implemented using the above library. If you like it, you can support it by giving it a star on GitHub, for which I would be grateful. Usage instructions are available on the homepage.)

A Thirty Second Primer on Machine Learning

In case you’re not familiar, check out this introduction to machine learning:

The general procedure is as follows:

We have some algorithm that’s given a handful of labeled examples, say 10 images of dogs with the label 1 (“Dog”) and 10 images of other things with the label 0 (“Not dog”)—note that we’re mainly sticking to supervised, binary classification for this post.
The algorithm “learns” to identify images of dogs and, when fed a new image, hopes to produce the correct label (1 if it’s an image of a dog, and 0 otherwise).
This setting is incredibly general: your data could be symptoms and your labels illnesses; or your data could be images of handwritten characters and your labels the actual characters they represent.

Perceptrons: Early Deep Learning Algorithms

One of the earliest supervised training algorithms is that of the perceptron, a basic neural network building block.

Say we have n points in the plane, labeled ‘0’ and ‘1’. We’re given a new point and we want to guess its label (this is akin to the “Dog” and “Not dog” scenario above). How do we do it?

One approach might be to look at the closest neighbor and return that point’s label. But a slightly more intelligent way of going about it would be to pick a line that best separates the labeled data and use that as your classifier.

A depiction of input data in relation to a linear classifier is a basic approach to deep learning.

In this case, each piece of input data would be represented as a vector x = (x_1, x_2) and our function would be something like “‘0’ if below the line, ‘1’ if above”.

To represent this mathematically, let our separator be defined by a vector of weights w and a vertical offset (or bias) b. Then, our function would combine the inputs and weights with a weighted sum transfer function:

The result of this transfer function would then be fed into an activation function to produce a labeling. In the example above, our activation function was a threshold cutoff (e.g., 1 if greater than some value):

Training the Perceptron

The training of the perceptron consists of feeding it multiple training samples and calculating the output for each of them. After each sample, the weights w are adjusted in such a way so as to minimize the output error, defined as the difference between the desired (target) and the actual outputs. There are other error functions, like the mean square error, but the basic principle of training remains the same.

Single Perceptron Drawbacks

The single perceptron approach to deep learning has one major drawback: it can only learn linearly separable functions. How major is this drawback? Take XOR, a relatively simple function, and notice that it can’t be classified by a linear separator (notice the failed attempt, below):

The drawback to this deep learning approach is that some functions cannot be classified by a linear separator.

To address this problem, we’ll need to use a multilayer perceptron, also known as feedforward neural network: in effect, we’ll compose a bunch of these perceptrons together to create a more powerful mechanism for learning.

Feedforward Neural Networks for Deep Learning

A neural network is really just a composition of perceptrons, connected in different ways and operating on different activation functions.

Feedforward neutral network deep learning is a more complex approach than single perceptrons.

For starters, we’ll look at the feedforward neural network, which has the following properties:

An input, output, and one or more hidden layers. The figure above shows a network with a 3-unit input layer, 4-unit hidden layer and an output layer with 2 units (the terms units and neurons are interchangeable).
Each unit is a single perceptron like the one described above.
The units of the input layer serve as inputs for the units of the hidden layer, while the hidden layer units are inputs to the output layer.
Each connection between two neurons has a weight w (similar to the perceptron weights).
Each unit of layer t is typically connected to every unit of the previous layer t - 1 (although you could disconnect them by setting their weight to 0).
To process input data, you “clamp” the input vector to the input layer, setting the values of the vector as “outputs” for each of the input units. In this particular case, the network can process a 3-dimensional input vector (because of the 3 input units). For example, if your input vector is [7, 1, 2], then you’d set the output of the top input unit to 7, the middle unit to 1, and so on. These values are then propagated forward to the hidden units using the weighted sum transfer function for each hidden unit (hence the term forward propagation), which in turn calculate their outputs (activation function).
The output layer calculates it’s outputs in the same way as the hidden layer. The result of the output layer is the output of the network.
Beyond Linearity

What if each of our perceptrons is only allowed to use a linear activation function? Then, the final output of our network will still be some linear function of the inputs, just adjusted with a ton of different weights that it’s collected throughout the network. In other words, a linear composition of a bunch of linear functions is still just a linear function. If we’re restricted to linear activation functions, then the feedforward neural network is no more powerful than the perceptron, no matter how many layers it has.

A linear composition of a bunch of linear functions is still just a linear function, so most neural networks use non-linear activation functions.
Because of this, most neural networks use non-linear activation functions like the logistic, tanh, binary or rectifier. Without them the network can only learn functions which are linear combinations of its inputs.

Training Perceptrons

The most common deep learning algorithm for supervised training of the multilayer perceptrons is known as backpropagation. The basic procedure:

A training sample is presented and propagated forward through the network.
The output error is calculated, typically the mean squared error:

Where t is the target value and y is the actual network output. Other error calculations are also acceptable, but the MSE is a good choice.
Network error is minimized using a method called stochastic gradient descent. Gradient descent

Gradient descent is universal, but in the case of neural networks, this would be a graph of the training error as a function of the input parameters. The optimal value for each weight is that at which the error achieves a global minimum. During the training phase, the weights are updated in small steps (after each training sample or a mini-batch of several samples) in such a way that they are always trying to reach the global minimum—but this is no easy task, as you often end up in local minima, like the one on the right. For example, if the weight has a value of 0.6, it needs to be changed towards 0.4.

This figure represents the simplest case, that in which error depends on a single parameter. However, network error depends on every network weight and the error function is much, much more complex.

Thankfully, backpropagation provides a method for updating each weight between two neurons with respect to the output error. The derivation itself is quite complicated, but the weight update for a given node has the following (simple) form:

Where E is the output error, and w_i is the weight of input i to the neuron.

Essentially, the goal is to move in the direction of the gradient with respect to weight i. The key term is, of course, the derivative of the error, which isn’t always easy to calculate: how would you find this derivative for a random weight of a random hidden node in the middle of a large network?

The answer: through backpropagation. The errors are first calculated at the output units where the formula is quite simple (based on the difference between the target and predicted values), and then propagated back through the network in a clever fashion, allowing us to efficiently update our weights during training and (hopefully) reach a minimum.
Hidden Layer

The hidden layer is of particular interest. By the universal approximation theorem, a single hidden layer network with a finite number of neurons can be trained to approximate an arbitrarily random function. In other words, a single hidden layer is powerful enough to learn any function. That said, we often learn better in practice with multiple hidden layers (i.e., deeper nets).

The hidden layer is where the network stores it's internal abstract representation of the training data.
The hidden layer is where the network stores it’s internal abstract representation of the training data, similar to the way that a human brain (greatly simplified analogy) has an internal representation of the real world. Going forward in the tutorial, we’ll look at different ways to play around with the hidden layer.

An Example Network

You can see a simple (4-2-3 layer) feedforward neural network that classifies the IRIS dataset implemented in Java here through the testMLPSigmoidBP method. The dataset contains three classes of iris plants with features like sepal length, petal length, etc. The network is provided 50 samples per class. The features are clamped to the input units, while each output unit corresponds to a single class of the dataset: “1/0/0” indicates that the plant is of class Setosa, “0/1/0” indicates Versicolour, and “0/0/1” indicates Virginica). The classification error is 2/150 (i.e., it misclassifies 2 samples out of 150).

The Problem with Large Networks

A neural network can have more than one hidden layer: in that case, the higher layers are “building” new abstractions on top of previous layers. And as we mentioned before, you can often learn better in-practice with larger networks.

However, increasing the number of hidden layers leads to two known issues:

Vanishing gradients: as we add more and more hidden layers, backpropagation becomes less and less useful in passing information to the lower layers. In effect, as information is passed back, the gradients begin to vanish and become small relative to the weights of the networks.
Overfitting: perhaps the central problem in machine learning. Briefly, overfitting describes the phenomenon of fitting the training data too closely, maybe with hypotheses that are too complex. In such a case, your learner ends up fitting the training data really well, but will perform much, much more poorly on real examples.
Let’s look at some deep learning algorithms to address these issues.

Autoencoders

Most introductory machine learning classes tend to stop with feedforward neural networks. But the space of possible nets is far richer—so let’s continue.

An autoencoder is typically a feedforward neural network which aims to learn a compressed, distributed representation (encoding) of a dataset.

An autoencoder is a neural deep learning network that aims to learn a certain representation of a dataset.

Conceptually, the network is trained to “recreate” the input, i.e., the input and the target data are the same. In other words: you’re trying to output the same thing you were input, but compressed in some way. This is a confusing approach, so let’s look at an example.

Compressing the Input: Grayscale Images

Say that the training data consists of 28x28 grayscale images and the value of each pixel is clamped to one input layer neuron (i.e., the input layer will have 784 neurons). Then, the output layer would have the same number of units (784) as the input layer and the target value for each output unit would be the grayscale value of one pixel of the image.

The intuition behind this architecture is that the network will not learn a “mapping” between the training data and its labels, but will instead learn the internal structure and features of the data itself. (Because of this, the hidden layer is also called feature detector.) Usually, the number of hidden units is smaller than the input/output layers, which forces the network to learn only the most important features and achieves a dimensionality reduction.

We want a few small nodes in the middle to learn the data at a conceptual level, producing a compact representation.
In effect, we want a few small nodes in the middle to really learn the data at a conceptual level, producing a compact representation that in some way captures the core features of our input.

Flu Illness

To further demonstrate autoencoders, let’s look at one more application.

In this case, we’ll use a simple dataset consisting of flu symptoms (credit to this blog post for the idea). If you’re interested, the code for this example can be found in the testAEBackpropagation method.

Here’s how the data set breaks down:

There are six binary input features.
The first three are symptoms of the illness. For example, 1 0 0 0 0 0 indicates that this patient has a high temperature, while 0 1 0 0 0 0 indicates coughing, 1 1 0 0 0 0 indicates coughing and high temperature, etc.
The final three features are “counter” symptoms; when a patient has one of these, it’s less likely that he or she is sick. For example, 0 0 0 1 0 0 indicates that this patient has a flu vaccine. It’s possible to have combinations of the two sets of features: 0 1 0 1 0 0 indicates a vaccines patient with a cough, and so forth.
We’ll consider a patient to be sick when he or she has at least two of the first three features and healthy if he or she has at least two of the second three (with ties breaking in favor of the healthy patients), e.g.:

111000, 101000, 110000, 011000, 011100 = sick
000111, 001110, 000101, 000011, 000110 = healthy
We’ll train an autoencoder (using backpropagation) with six input and six output units, but only two hidden units.

After several hundred iterations, we observe that when each of the “sick” samples is presented to the machine learning network, one of the two the hidden units (the same unit for each “sick” sample) always exhibits a higher activation value than the other. On the contrary, when a “healthy” sample is presented, the other hidden unit has a higher activation.

Going Back to Machine Learning

Essentially, our two hidden units have learned a compact representation of the flu symptom data set. To see how this relates to learning, we return to the problem of overfitting. By training our net to learn a compact representation of the data, we’re favoring a simpler representation rather than a highly complex hypothesis that overfits the training data.

In a way, by favoring these simpler representations, we’re attempting to learn the data in a truer sense.

Like what you're reading?Get the latest updates first.

No spam. Just great engineering posts.
Restricted Boltzmann Machines

The next logical step is to look at a Restricted Boltzmann machines (RBM), a generative stochastic neural network that can learn a probability distribution over its set of inputs.

In machine learning, Restricted Botlzmann Machines are composed of visible and hidden units.

RBMs are composed of a hidden, visible, and bias layer. Unlike the feedforward networks, the connections between the visible and hidden layers are undirected (the values can be propagated in both the visible-to-hidden and hidden-to-visible directions) and fully connected (each unit from a given layer is connected to each unit in the next—if we allowed any unit in any layer to connect to any other layer, then we’d have a Boltzmann (rather than a restricted Boltzmann) machine).

The standard RBM has binary hidden and visible units: that is, the unit activation is 0 or 1 under a Bernoulli distribution, but there are variants with other non-linearities.

While researchers have known about RBMs for some time now, the recent introduction of the contrastive divergence unsupervised training algorithm has renewed interest.

Contrastive Divergence

The single-step contrastive divergence algorithm (CD-1) works like this:

Positive phase:
An input sample v is clamped to the input layer.
v is propagated to the hidden layer in a similar manner to the feedforward networks. The result of the hidden layer activations is h.
Negative phase:
Propagate h back to the visible layer with result v’ (the connections between the visible and hidden layers are undirected and thus allow movement in both directions).
Propagate the new v’ back to the hidden layer with activations result h’.
Weight update:

Where a is the learning rate and v, v’, h, h’, and w are vectors.
The intuition behind the algorithm is that the positive phase (h given v) reflects the network’s internal representation of the real world data. Meanwhile, the negative phase represents an attempt to recreate the data based on this internal representation (v’ given h). The main goal is for the generated data to be as close as possible to the real world and this is reflected in the weight update formula.

In other words, the net has some perception of how the input data can be represented, so it tries to reproduce the data based on this perception. If its reproduction isn’t close enough to reality, it makes an adjustment and tries again.

Returning to the Flu

To demonstrate contrastive divergence, we’ll use the same symptoms data set as before. The test network is an RBM with six visible and two hidden units. We’ll train the network using contrastive divergence with the symptoms v clamped to the visible layer. During testing, the symptoms are again presented to the visible layer; then, the data is propagated to the hidden layer. The hidden units represent the sick/healthy state, a very similar architecture to the autoencoder (propagating data from the visible to the hidden layer).

After several hundred iterations, we can observe the same result as with autoencoders: one of the hidden units has a higher activation value when any of the “sick” samples is presented, while the other is always more active for the “healthy” samples.

You can see this example in action in the testContrastiveDivergence method.

Deep Networks

We’ve now demonstrated that the hidden layers of autoencoders and RBMs act as effective feature detectors; but it’s rare that we can use these features directly. In fact, the data set above is more an exception than a rule. Instead, we need to find some way to use these detected features indirectly.

Luckily, it was discovered that these structures can be stacked to form deep networks. These networks can be trained greedily, one layer at a time, to help to overcome the vanishing gradient and overfitting problems associated with classic backpropagation.

The resulting structures are often quite powerful, producing impressive results. Take, for example, Google’s famous “cat” paper in which they use special kind of deep autoencoders to “learn” human and cat face detection based on unlabeled data.

Let’s take a closer look.

Stacked Autoencoders

As the name suggests, this network consists of multiple stacked autoencoders.

Stacked Autoencoders have a series of inputs, outputs, and hidden layers that contribute to machine learning outcomes.

The hidden layer of autoencoder t acts as an input layer to autoencoder t + 1. The input layer of the first autoencoder is the input layer for the whole network. The greedy layer-wise training procedure works like this:

Train the first autoencoder (t=1, or the red connections in the figure above, but with an additional output layer) individually using the backpropagation method with all available training data.
Train the second autoencoder t=2 (green connections). Since the input layer for t=2 is the hidden layer of t=1 we are no longer interested in the output layer of t=1 and we remove it from the network. Training begins by clamping an input sample to the input layer of t=1, which is propagated forward to the output layer of t=2. Next, the weights (input-hidden and hidden-output) of t=2 are updated using backpropagation. t=2 uses all the training samples, similar to t=1.
Repeat the previous procedure for all the layers (i.e., remove the output layer of the previous autoencoder, replace it with yet another autoencoder, and train with back propagation).
Steps 1-3 are called pre-training and leave the weights properly initialized. However, there’s no mapping between the input data and the output labels. For example, if the network is trained to recognize images of handwritten digits it’s still not possible to map the units from the last feature detector (i.e., the hidden layer of the last autoencoder) to the digit type of the image. In that case, the most common solution is to add one or more fully connected layer(s) to the last layer (blue connections). The whole network can now be viewed as a multilayer perceptron and is trained using backpropagation (this step is also called fine-tuning).
Stacked auto encoders, then, are all about providing an effective pre-training method for initializing the weights of a network, leaving you with a complex, multi-layer perceptron that’s ready to train (or fine-tune).

Deep Belief Networks

As with autoencoders, we can also stack Boltzmann machines to create a class known as deep belief networks (DBNs).

Deep belief networks are comprised of a stack of Boltzmann machines.

In this case, the hidden layer of RBM t acts as a visible layer for RBM t+1. The input layer of the first RBM is the input layer for the whole network, and the greedy layer-wise pre-training works like this:

Train the first RBM t=1 using contrastive divergence with all the training samples.
Train the second RBM t=2. Since the visible layer for t=2 is the hidden layer of t=1, training begins by clamping the input sample to the visible layer of t=1, which is propagated forward to the hidden layer of t=1. This data then serves to initiate contrastive divergence training for t=2.
Repeat the previous procedure for all the layers.
Similar to the stacked autoencoders, after pre-training the network can be extended by connecting one or more fully connected layers to the final RBM hidden layer. This forms a multi-layer perceptron which can then be fine tuned using backpropagation.
This procedure is akin to that of stacked autoencoders, but with the autoencoders replaced by RBMs and backpropagation replaced with the contrastive divergence algorithm.

(Note: for more on constructing and training stacked autoencoders or deep belief networks, check out the sample code here.)

Convolutional Networks

As a final deep learning architecture, let’s take a look at convolutional networks, a particularly interesting and special class of feedforward networks that are very well-suited to image recognition.

Convolutional networks are a special class of deep learning feedforward networks.Image via DeepLearning.net

Before we look at the actual structure of convolutional networks, we first define an image filter, or a square region with associated weights. A filter is applied across an entire input image, and you will often apply multiple filters. For example, you could apply four 6x6 filters to a given input image. Then, the output pixel with coordinates 1,1 is the weighted sum of a 6x6 square of input pixels with top left corner 1,1 and the weights of the filter (which is also 6x6 square). Output pixel 2,1 is the result of input square with top left corner 2,1 and so on.

With that covered, these networks are defined by the following properties:

Convolutional layers apply a number of filters to the input. For example, the first convolutional layer of the image could have four 6x6 filters. The result of one filter applied across the image is called feature map (FM) and the number feature maps is equal to the number of filters. If the previous layer is also convolutional, the filters are applied across all of it’s FMs with different weights, so each input FM is connected to each output FM. The intuition behind the shared weights across the image is that the features will be detected regardless of their location, while the multiplicity of filters allows each of them to detect different set of features.
Subsampling layers reduce the size of the input. For example, if the input consists of a 32x32 image and the layer has a subsampling region of 2x2, the output value would be a 16x16 image, which means that 4 pixels (each 2x2 square) of the input image are combined into a single output pixel. There are multiple ways to subsample, but the most popular are max pooling, average pooling, and stochastic pooling.
The last subsampling (or convolutional) layer is usually connected to one or more fully connected layers, the last of which represents the target data.
Training is performed using modified backpropagation that takes the subsampling layers into account and updates the convolutional filter weights based on all values to which that filter is applied.
You can see several examples of convolutional networks trained (with backpropagation) on the MNIST data set (grayscale images of handwritten letters) here, specifically in the the testLeNet* methods (I would recommend testLeNetTiny2 as it achieves a low error rate of about 2% in a relatively short period of time). There’s also a nice JavaScript visualization of a similar network here.

Implementation

Now that we’ve covered the most common neural network variants, I thought I’d write a bit about the challenges posed during implementation of these deep learning structures.

Broadly speaking, my goal in creating a Deep Learning library was (and still is) to build a neural network-based framework that satisfied the following criteria:

A common architecture that is able to represent diverse models (all the variants on neural networks that we’ve seen above, for example).
The ability to use diverse training algorithms (back propagation, contrastive divergence, etc.).
Decent performance.
To satisfy these requirements, I took a tiered (or modular) approach to the design of the software.

Structure

Let’s start with the basics:

NeuralNetworkImpl is the base class for all neural network models.
Each network contains a set of layers.
Each layer has a list of connections, where a connection is a link between two layers such that the network is a directed acyclic graph.
This structure is agile enough to be used for classic feedforward networks, as well as for RBMs and more complex architectures like ImageNet.

It also allows a layer to be part of more than one network. For example, the layers in a Deep Belief Network are also layers in their corresponding RBMs.

In addition, this architecture allows a DBN to be viewed as a list of stacked RBMs during the pre-training phase and a feedforward network during the fine-tuning phase, which is both intuitively nice and programmatically convenient.

Data Propagation

The next module takes care of propagating data through the network, a two-step process:

Determine the order of the layers. For example, to get the results from a multilayer perceptron, the data is “clamped” to the input layer (hence, this is the first layer to be calculated) and propagated all the way to the output layer. In order to update the weights during backpropagation, the output error has to be propagated through every layer in breadth-first order, starting from the output layer. This is achieved using various implementations of LayerOrderStrategy, which takes advantage of the graph structure of the network, employing different graph traversal methods. Some examples include the breadth-first strategy and the targeting of a specific layer. The order is actually determined by the connections between the layers, so the strategies return an ordered list of connections.
Calculate the activation value. Each layer has an associated ConnectionCalculator which takes it’s list of connections (from the previous step) and input values (from other layers) and calculates the resulting activation. For example, in a simple sigmoidal feedforward network, the hidden layer’s ConnectionCalculator takes the values of the input and bias layers (which are, respectively, the input data and an array of 1s) and the weights between the units (in case of fully connected layers, the weights are actually stored in a FullyConnected connection as a Matrix), calculates the weighted sum, and feeds the result into the sigmoid function. The connection calculators implement a variety of transfer (e.g., weighted sum, convolutional) and activation (e.g., logistic and tanh for multilayer perceptron, binary for RBM) functions. Most of them can be executed on a GPU using Aparapi and usable with mini-batch training.
GPU Computation with Aparapi

As I mentioned earlier, one of the reasons that neural networks have made a resurgence in recent years is that their training methods are highly conducive to parallelism, allowing you to speed up training significantly with the use of a GPGPU. In this case, I chose to work with the Aparapi library to add GPU support.

Aparapi imposes some important restrictions on the connection calculators:

Only one-dimensional arrays (and variables) of primitive data types are allowed.
Only member-methods of the Aparapi Kernel class itself are allowed to be called from the GPU executable code.
As such, most of the data (weights, input, and output arrays) is stored in Matrix instances, which use one-dimensional float arrays internally. All Aparapi connection calculators use either AparapiWeightedSum (for fully connected layers and weighted sum input functions), AparapiSubsampling2D (for subsampling layers), or AparapiConv2D (for convolutional layers). Some of these limitations can be overcome with the introduction of Heterogeneous System Architecture. Aparapi also allows to run the same code on both CPU and GPU.

Training

The training module implements various training algorithms. It relies on the previous two modules. For example, BackPropagationTrainer (all the trainers are using the Trainer base class) uses feedforward layer calculator for the feedforward phase and a special breadth-first layer calculator for propagating the error and updating the weights.

My latest work is on Java 8 support and some other improvements, which are available in this branch and will soon be merged into master.

Conclusion

The aim of this Java deep learning tutorial was to give you a brief introduction to the field of deep learning algorithms, beginning with the most basic unit of composition (the perceptron) and progressing through various effective and popular architectures, like that of the restricted Boltzmann machine.

The ideas behind neural networks have been around for a long time; but today, you can’t step foot in the machine learning community without hearing about deep networks or some other take on deep learning. Hype shouldn’t be mistaken for justification, but with the advances of GPGPU computing and the impressive progress made by researchers like Geoffrey Hinton, Yoshua Bengio, Yann LeCun and Andrew Ng, the field certainly shows a lot of promise. There’s no better time to get familiar and get involved like the present.

Appendix: Resources

If you’re interested in learning more, I found the following resources quite helpful during my work:

DeepLearning.net: a portal for all things deep learning. It has some nice tutorials, software library and a great reading list.
An active Google+ community.
Two very good courses: Machine Learning and Neural Networks for Machine Learning, both offered on Coursera.
The Stanford neural networks tutorial.
About the author

View full profile →
Hire the Author
Ivan Vasilev, Bulgaria
MEMBER SINCE OCTOBER 24, 2012
HTML5SQLJavaJavaScript+3 more
Ivan is an enthusiastic senior developer with an entrepreneurial spirit. His experiences range across a number of fields and technologies, but his primary focuses are in Java and JavaScript, as well as Machine Learning. [click to continue...]
Hiring? Meet the Top 10 Machine Learning engineers for Hire in June 2015