Reinforcement Learning


There are three basic machine learning paradigms: supervised learning, unsupervised learning, and reinforcement learning (RL). We talked about supervised and unsupervised learning in our previous article. Let’s now explore reinforcement learning!

What is reinforcement learning (RL)?

In short, RL is a specialized application of machine/deep learning techniques, designed to solve problems in a particular way. Unlike supervised and unsupervised learning, reinforcement learning is a type of learning that is based on the interaction with environments. That is to say, algorithms learn to react to an environment on their own. Thus, the most part of RL is the trial-and-error process.

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them.

— Sutton and Barto, Reinforcement Learning: An Introduction

RL models consist of algorithms that use the estimated errors as rewards or penalties. If the error is big, then the penalty is high and the reward low. If the error is small, then the penalty is low and the reward high. Figure 1 is a simple illustration of RL. The way reinforcement learning solves problems is by allowing a piece of software called an “agent” to explore, interact with, and learn from the environment.

Reinforcement Learning
Figure 1: Pipeline of reinforcement learning

In RL, an agent takes actions. For example, an agent is trying to manipulate the environment by going from one state to another (making trials). An agent will get a reward when it takes a few successful trials but will not receive any reward or appreciation if it gets an error. This is a simplified description of a RL problem. The mathematical detail is complicated and we won’t talk here. If you are interested, please check the following textbook.

Reinforcement learning workflow overview

To better illustrate how RL works, Figure 2 provides a general RL workflow overview.

RL pipeline
Figure 2: A general RL workflow

First, the agent is able to observe the current state of the environment. Based on the observed state, the agent decides which action to take. The environment will then change state and produce a reward for that action. Both of the changed state and the reward will be received by the agent. Finally, using the new received information, the agent will be able to determine whether that action was good and should be repeated, or if it was bad and should be avoided. This observation-action-reward cycle (trial-and-error process) continues until learning is complete.

Within the RL workflow, five different areas need to be addressed: environment, reward, policy, training algorithm, and the deployment of RL in the targeted field.

  • Environment: simulated or physical; what should exist within the environment.
  • Reward: based on what ultimately the agent need to do, craft a reward function that will incentivize the agent to do just that.
  • Policy (or mapping): specify a way to represent the policy, which might be structured parameters and logic that found the decision-making part of the agent.
  • Training algorithm: the agent need to be trained by an algorithm in order to find the optimal policy parameters/logic.
  • Deployment: to verify the results and evaluate how RL performs, the policy need to be exploited by deploying it in the targeted field.

Characteristics and challenges

  • Unlike supervised and unsupervised learning, “time” plays a crucial role in RL problems.
  • There is no supervisor in RL, it improves using the reward signal.
  • Feedback will not be instantaneous. This delay may cause difficulties when facing the realistic non-stationary environments.
  • The decisions are made sequentially (Sequential Decision Making.) The output is depending on the state of the current input; while the next input will be depending on the output of the previous input, not on the input given at the start.
  • RL is often time-consuming and needs heavy computing. The parameters may also affect the speed of learning.
  • When facing realistic environments, partial observability might be a crucial challenge.

Reinforcement learning applications

Due to its generality, RL is widely studied in the following disciplines:

  • Game theory
  • Control theory
  • Operations research
  • Information theory
  • Simulation-based optimization
  • Multi-agent systems
  • Swarm intelligence
  • Statistics
  • … and more

Besides these, RL is also a popular technique which used in building AI for playing computer games. The work on learning ATARI games by Google DeepMind increased attention to deep reinforcement learning, which extends RL by using deep neural network.

RL is most applicable in domains where simulated data is readily available like gameplay, robotics. Examples of RL include creating an efficient adaptive control system for robot by itself, driverless cars, stunts in helicopter, learning optimal treatment policies in healthcare, etc.

In a nutshell …

Along with our previous blog article, we go through the three basic categories of machine learning. To sum up, a supervised learning model learns from a labeled dataset with guidance; while unsupervised learning is where the machine is given training based on unlabeled data without any guidance. On the other hand, reinforcement learning is when a machine or an agent interacts with its environment, performs actions, and learns by a trial-and-error method.

Hope you liked our articles, and we will have more articles about machine learning applications!


Related articles:

Editor: Chieh-Feng Cheng
Ph.D. in ECE, Georgia Tech
Technical Writer, inwinSTACK


Select list(s)*