Skip to content

Latest commit

 

History

History
61 lines (47 loc) · 1.84 KB

README.md

File metadata and controls

61 lines (47 loc) · 1.84 KB

cs4341-a4

Mayank Govilla, Mago Sheehy, Molly Sunray

Group 30

To run the program, either open the project in IntelliJ and use the run configurations to pass in arguments or use the jar file: java -jar qlearn-1.0.jar "boards/sample.txt" 1.3 0.9 -0.05

the first argument is the file path to the board, the second argument is the time in seconds to learn, the third argument is the probability of moving in the desired direction, and the fourth argument is the reward for each action.




Pseudocode:

class Position {x: int , y: int} Enum ACTION: {UP, DOWN, LEFT, RIGHT}

Main:

  • read input parameters

GridWorld

  • World : int[][] // mostly 0s with some non-zero values that represent the terminal states
  • startingLocation:
  • getValue(Position): int
  • getRandomPos(): Position
  • getNextState(action, previousAgent) : Agent

Agent

  • Position : position
  • chanceOf!Deflection: double // probability of deflection
  • getActualDirection (based on probability)

Q Learning Algo static class QInput: (Position, action)

  • QTable: Map<QInput, double>
  • chooseNextMove(): ACTION
  • train(Board: GridWorld, time: int, rewardFunction: (s)->int)
  • returnPolicy()

chooseNextMove(Position): - get the 4 values from the Q table that correspond to - [Q(s, UP), Q(s, RIGHT), Q(s, DOWN), Q(s, LEFT)] - return best action

train: - loop while time - a = new Agent(randomPos) - while not terminal - move = chooseNextMove() - s_prime = a.getNextState(move) - val = getValue(s_prime) - r(s) = rewardFunction(val) - alpha = 0.1 or something to do with time - QTable[{a.position, move}] = QTable[{a.position, move}] + alpha(r(s) + discount*max(QTable[{s_prime.position, m}] for m in moves) - QTable[{a.position, move}]) - if val != 0: break - a = s_prime