Skip to content

MohammadWasil/Smart-Cab-with-Qlearning-and-SARSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Smart-Cab-with-Qlearning-and-SARSA

Hits

This project was associated with "[MAT-DSAM3A] Advanced Data Assimilation and Modeling A: Reinforcement Learning", Summer Semester 2021, for my Masters of Science: Data Science, University of Potsdam, Germany.

The folder, "Smart Cab RL" contains 4 subfolder: "Smart Cab Q-learning", "Smart Cab Q-Learning REWARD=0", "Smart Cab SARSA", "Smart Cab SARSA REWARD=0". Each of these folder contains the file for running or training the smart cab using Q-learning and SARSA algorithm with different reward function.

To run/train the smart cab:

  1. Open command prompt, cmd.
  2. Change the directory to whichever smart cab you want to run/train.
  3. type: "python RL.py", and hit enter.
  4. It will the ask for your input: "Enter 'TRAIN' to train or 'RUN' to run the game:"
  5. type 'RUN' or 'TRAIN' and it will do the required task.

MOTIVATION

• To study and compare Q Learning and SARSA algorithm.
• Model free and Model based RL algorithm.
• Approach Temporal difference learning learns how to predict a quantity that depends on future values of a given signal learns from experience.
• Temporal difference update step:

  𝑁𝑒𝑤𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ← 𝑂𝑙𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 + 𝑆𝑡𝑒𝑝𝑠𝑖𝑧𝑒[𝑇𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑙𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒]

SMART CAB GAME

• Inspired from OpenAI gym environment.
• 2D grid 5x5 cells.
• Agent Cab
• Drop off and pick up locations.
• Objective of the game:

  1. pick up the passenger.
  2. drop off the passenger at the right location.
  3. take as minimum time as possible.

• Coordinate System: See Screenshot.
• Pickup positions: [0, 0], [0, 4], [4, 0] and [4, 3].
• Dropoff positions: [0, 0], [0, 4], [4, 0] and [4, 3].
• Rules:

  1. Drop off location should not be equal to Pickup location in one episode.
  2. Cab cannot go through the walls.
  3. Cab can move “UP”, “DOWN”, “LEFT”, and “RIGHT. No Diagonal
  4. Cannot go beyond the extreme rows and columns.

Grid

Grid

STATE SPACES

State space 4
Total number of States = 52 + 336 = 388

State Space 2
21x16 = 336 States

State Space
13x4 = 52 States

SARSA

• On policy learning.
• Learning rate, 𝛼=0.1
• Discount factor, 𝛾=1
• 𝜀 greedy algorithm, 𝜀=0.4(Slightly more chances for exploitation than exploration).
• Balances exploitation and exploration.
• Tries to go to each states.
• Trained for 500,000 episodes.
• Total average cumulative reward, 25.12 , with reward = 0 for picking up from the right location.
• Total average cumulative reward, 5.72 , with reward = 30 for picking up from the right location.

Cumulative Reward per episode SARSA (REWARD=0) Top 1000

Cumulative Reward per episode SARSA Top 1000

Q-LEARNING

• Off policy learning.
• Learning rate, 𝛼=0.1.
• Discount factor, 𝛾=1.
• Trained for 500,000 episodes.
• Total average cumulative reward 10.92 , with reward = 0 for picking up from the right location.
• Total average cumulative reward 0.9812 , with reward = 30 for picking up from the right location.

Cumulative Reward per episode Q-Learning (Reward=0) Top 1000

Cumulative Reward per episode Q-Learning Top 1000

SARSA Vs. Q-LEARNING

Average Cumulative  reward per 500 episode - SARSA Vs  Q Learning (REWARD = 0)

Average Cumulative  reward per 500 episode - SARSA Vs  Q Learning

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages