Introduction to Deep Reinforment Learning (DRL)

Table of Contents

Background: Machine Learning Method
Methodology
- Reinforcement Learning
  - Q-Learning
- Deep Q-Learning (DQL)
Example: DQL Implementation
Comment
Reference

Background: Machine Learning Method

There are three typical types of machine leanring methods:

Supervised Learning: given labeled data, train the model for predicting the correct result

Unsupervised Learning: given unlabeled data, train the model to find underlying patterns between data

Reinforcement Learning: get feedback (state and reward) from interacting with the environment, and adjust new action to the environment to get maximum expected reward.

Method Comparison	Input Data	Output Result	Types of Problem	Application
Supervised Learning	Labeled data	Prediction result	Classification; Regression	Risk Evaluation; Forecasting
Unsupervised Learning	Unlabeled data	Underlying pattern	Clustering	Recommendation; Anomaly detection
Reinforcement Learning	Learn from environment	Action to the enviroment	Exploration and Exploitation	Self driving cars; Gaming

Methodology

Reinforcement Learning

As mensioned previously, Reinforcement Learning get feedback from interacting with the environment without having predefined data. It is a goal-oriented method that an agent tries to come up with the best action given a state. One of the most important issue in Reinforcement Learning is the design of reward function, which influence how fast the agent learns from interacting with the environment.

For example, an utimate goal for a dog (agent) is to catch a frisbee thrown by a kid. The closer the dog to the frisbee, the more reward it will get. This reward function will affect the dog's subsequent action. The dog will know where it is (state) and how much reward it gets in the previous action. All these result will be saved as the dog's experience for deciding the next action.

Q-Learning

Q-learning is a model-free Reinforcement Learning algorithm. In Reinforcement Learning, agent will learn from experience. In Q-Learning, each state and action are viewed as inputs to a Q-function which outputs a corresponding Q-value (Expected future reward). Besides, these expereinces will be saved to a Q-table as a reference for agent to decide a best action.

Deep Q-Learning (DQL)

In Q-Learning, the experience learned by the agent will be save to Q-table; however, when the problem scale is larger, Q-table will be ineffcient. Take playing games as an example, the action space and the state space are too large to handle. To deal with this problem, Neural Networks method is used to approximate the Q-value for each action when given a state.

Example: DQL Implementation

Environment
OpenAI Gym: MountainCar-v0
Description: The agent (a car) is started at the bottom of a valley. For any given state the agent may choose to accelerate to the left, right or cease any acceleration.

Code shown below is adjusted from Reinforcement Learning 進階篇：Deep Q-Learning

Import Module: Pytorch is used for building neural network

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import gym
import matplotlib.pyplot as plt

Neural Network Structure:

class Net(nn.Module):
    def __init__(self, n_states, n_actions, n_hidden):
        super(Net, self).__init__()

        # input: state to hidden layer, hidden layer to output: action
        self.fc1 = nn.Linear(n_states, n_hidden)
        self.out = nn.Linear(n_hidden, n_actions)

    def forward(self, x):
        x = self.fc1(x)
        x = F.relu(x) # ReLU activation
        actions_value = self.out(x)
        return actions_value

Deep Q-Network module (simple version): For more detail: Deep Q-Learning

class DQN(object):
    def __init__(self):
        # Create target network, evaluation network and memory
        
    def choose_action(self):
    	# Choose action according to the state

    def store_transition(self):
    	# Store experience to memory

    def learn(self):
    	# Update tatget network

Since the default reward setting is too simple, I revise it to accelerate the training process.

Default Reward
Agent reached the flag (position = 0.5): 0
Position of the agent is less than 0.5: -1
Adjust Reward Distance between agent and the flag = pos-0.5 (negative) Velocity of the agent = vel (positive if the agent is toward to the flag) Reward = (pos-0.5)+vel

if __name__ == '__main__' :        
    env = gym.make('MountainCar-v0')
    
    # Environment parameters
    n_actions = env.action_space.n
    n_states = env.observation_space.shape[0]
    
    # Hyper parameters
    n_hidden = 20
    batch_size = 32
    lr = 0.1                 # learning rate
    epsilon = 0.1             # epsilon-greedy
    gamma = 0.9               # reward discount factor
    target_replace_iter = 100 # target network update frequency
    memory_capacity = 2000
    n_episodes = 200
    
    # create DQN
    dqn = DQN(n_states, n_actions, n_hidden, batch_size, lr, epsilon, gamma, target_replace_iter, memory_capacity)
    pos_his, reward_his = [], []
    
    # train DQN
    for i_episode in range(n_episodes):
        t = 0
        rewards = 0
        best_pos = -1.2 # min position defined in 'MountainCar-v0'
        state = env.reset()
        while True:
            env.render()
    
            # choose action
            action = dqn.choose_action(state)
            next_state, reward, done, info = env.step(action)
            
            # revise reward to accelerate training process
            pos, vel = next_state
            r1 = pos-0.5 # better to make the car closer to the flag
            r2 = vel
            reward = r1+r2
            
            # save experience
            dqn.store_transition(state, action, reward, next_state)
            
            # record best position happened during steps
            best_pos = pos if (pos > best_pos) else best_pos
    
            # accumulate reward
            rewards += reward
    
            # train the model after having enough expereince
            if dqn.memory_counter > memory_capacity:
                dqn.learn()
    
            # go to next state
            state = next_state
    
            if done:
                pos_his.append(best_pos)
                reward_his.append(rewards)
                print(f'{i_episode+1} Episode finished after {t+1} timesteps, total rewards {rewards}')
                break
    
            t += 1
            
    env.close()

Result Plots:

Default Reward

Since there are only two values in reward space (-1; 0), the total reward in each episode is the same if the car didn't achieve the flag.

Adjusted Reward

Notice that the reward functions are different in these two cases, we can't compare the value of their Total Reward.

Comment

According to the result in DQL Implementation, it is clear that the reward function has a great effect on the agent's action. A better designed reward function will lead to better effiency for an agent to learn from the environment.
In Deep Q-Learning, a neural network structure is used for dealing with large-scale problems. However, the model will become more complex which decrease its explanation. Besides, there are more hyper parameters needed to be tuned in the Deep Q-Network model.

Reference

Machine Learning Method: Supervised vs. Unsupervised vs. Reinforcement
Reinforcement Learning: Reinforcement Learning 健身房：OpenAI Gym
Deep Reinforcement Learning: A Beginner's Guide to Deep Reinforcement Learning
Deep Q-Learning: Reinforcement Learning 進階篇：Deep Q-Learning
OpenAI Gym: MountainCarEnv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction-to-Deep-Reinforcement-Learning.md

Introduction-to-Deep-Reinforcement-Learning.md

Introduction to Deep Reinforment Learning (DRL)

Background: Machine Learning Method

Methodology

Reinforcement Learning

Q-Learning

Deep Q-Learning (DQL)

Example: DQL Implementation

Comment

Reference

Files

Introduction-to-Deep-Reinforcement-Learning.md

Latest commit

History

Introduction-to-Deep-Reinforcement-Learning.md

File metadata and controls

Introduction to Deep Reinforment Learning (DRL)

Background: Machine Learning Method

Methodology

Reinforcement Learning

Q-Learning

Deep Q-Learning (DQL)

Example: DQL Implementation

Comment

Reference