cs4341-a4

Mayank Govilla, Mago Sheehy, Molly Sunray

Group 30

To run the program, either open the project in IntelliJ and use the run configurations to pass in arguments or use the jar file: java -jar qlearn-1.0.jar "boards/sample.txt" 1.3 0.9 -0.05

the first argument is the file path to the board, the second argument is the time in seconds to learn, the third argument is the probability of moving in the desired direction, and the fourth argument is the reward for each action.

Pseudocode:

class Position {x: int , y: int} Enum ACTION: {UP, DOWN, LEFT, RIGHT}

Main:

read input parameters

GridWorld

World : int[][] // mostly 0s with some non-zero values that represent the terminal states
startingLocation:
getValue(Position): int
getRandomPos(): Position
getNextState(action, previousAgent) : Agent

Agent

Position : position
chanceOf!Deflection: double // probability of deflection
getActualDirection (based on probability)

Q Learning Algo static class QInput: (Position, action)

QTable: Map<QInput, double>
chooseNextMove(): ACTION
train(Board: GridWorld, time: int, rewardFunction: (s)->int)
returnPolicy()

chooseNextMove(Position): - get the 4 values from the Q table that correspond to - [Q(s, UP), Q(s, RIGHT), Q(s, DOWN), Q(s, LEFT)] - return best action

train: - loop while time - a = new Agent(randomPos) - while not terminal - move = chooseNextMove() - s_prime = a.getNextState(move) - val = getValue(s_prime) - r(s) = rewardFunction(val) - alpha = 0.1 or something to do with time - QTable[{a.position, move}] = QTable[{a.position, move}] + alpha(r(s) + discount*max(QTable[{s_prime.position, m}] for m in moves) - QTable[{a.position, move}]) - if val != 0: break - a = s_prime

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

cs4341-a4

Files

README.md

Latest commit

History

README.md

File metadata and controls

cs4341-a4