Smart-Cab-with-Qlearning-and-SARSA

This project was associated with "[MAT-DSAM3A] Advanced Data Assimilation and Modeling A: Reinforcement Learning", Summer Semester 2021, for my Masters of Science: Data Science, University of Potsdam, Germany.

The folder, "Smart Cab RL" contains 4 subfolder: "Smart Cab Q-learning", "Smart Cab Q-Learning REWARD=0", "Smart Cab SARSA", "Smart Cab SARSA REWARD=0". Each of these folder contains the file for running or training the smart cab using Q-learning and SARSA algorithm with different reward function.

To run/train the smart cab:

Open command prompt, cmd.
Change the directory to whichever smart cab you want to run/train.
type: "python RL.py", and hit enter.
It will the ask for your input: "Enter 'TRAIN' to train or 'RUN' to run the game:"
type 'RUN' or 'TRAIN' and it will do the required task.

MOTIVATION

• To study and compare Q Learning and SARSA algorithm.
• Model free and Model based RL algorithm.
• Approach Temporal difference learning learns how to predict a quantity that depends on future values of a given signal learns from experience.
• Temporal difference update step:

  𝑁𝑒𝑤𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ← 𝑂𝑙𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 + 𝑆𝑡𝑒𝑝𝑠𝑖𝑧𝑒[𝑇𝑎𝑟𝑔𝑒𝑡 − 𝑜𝑙𝑑𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒]

SMART CAB GAME

• Inspired from OpenAI gym environment.
• 2D grid 5x5 cells.
• Agent Cab
• Drop off and pick up locations.
• Objective of the game:

pick up the passenger.
drop off the passenger at the right location.
take as minimum time as possible.

• Coordinate System: See Screenshot.
• Pickup positions: [0, 0], [0, 4], [4, 0] and [4, 3].
• Dropoff positions: [0, 0], [0, 4], [4, 0] and [4, 3].
• Rules:

Drop off location should not be equal to Pickup location in one episode.
Cab cannot go through the walls.
Cab can move “UP”, “DOWN”, “LEFT”, and “RIGHT. No Diagonal
Cannot go beyond the extreme rows and columns.

Grid

STATE SPACES

Total number of States = 52 + 336 = 388

21x16 = 336 States

13x4 = 52 States

SARSA

• On policy learning.
• Learning rate, 𝛼=0.1
• Discount factor, 𝛾=1
• 𝜀 greedy algorithm, 𝜀=0.4(Slightly more chances for exploitation than exploration).
• Balances exploitation and exploration.
• Tries to go to each states.
• Trained for 500,000 episodes.
• Total average cumulative reward, 25.12 , with reward = 0 for picking up from the right location.
• Total average cumulative reward, 5.72 , with reward = 30 for picking up from the right location.

Q-LEARNING

• Off policy learning.
• Learning rate, 𝛼=0.1.
• Discount factor, 𝛾=1.
• Trained for 500,000 episodes.
• Total average cumulative reward 10.92 , with reward = 0 for picking up from the right location.
• Total average cumulative reward 0.9812 , with reward = 30 for picking up from the right location.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Smart Cab RL		Smart Cab RL
LICENSE		LICENSE
README.md		README.md
Smart Cab Project Report.pdf		Smart Cab Project Report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart-Cab-with-Qlearning-and-SARSA

MOTIVATION

SMART CAB GAME

Grid

STATE SPACES

SARSA

Q-LEARNING

SARSA Vs. Q-LEARNING

About

Releases

Packages

Languages

License

MohammadWasil/Smart-Cab-with-Qlearning-and-SARSA

Folders and files

Latest commit

History

Repository files navigation

Smart-Cab-with-Qlearning-and-SARSA

MOTIVATION

SMART CAB GAME

Grid

STATE SPACES

SARSA

Q-LEARNING

SARSA Vs. Q-LEARNING

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages