Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 1.79 KB

zhao2018.md

File metadata and controls

26 lines (18 loc) · 1.79 KB
date tags
2020-03-23
paper, rl, large

Handling Large-Scale Action Space in Deep Q Network

Link to the paper

Zhiheng Zhao, Yi Liang, Xiaoming Ji

2018 International Conference on Artificial Intelligence and Big Data

Year: 2018

  • The authors highlight the issues that arise when tackling a high dimensional action-space problem with DQN: the neural network layer has to have as many outputs as actions in the environment. This has some undesirable effects (memory needed, sparse training labels, etc.)
  • Classical DQN receives the state as an input and predicts the expected Q-value for all the actions

  • The first proposal is to move the actions to the input of the network.
  • The proposal consists of a model where states and actions represent an input, and the expected Q-value is given as a scalar output by the network.
  • The authors claim that the proposed setting can be problematic given that (S-A1) and (S-A2) can lead to very similar expected Q-Values; in other words, it may be difficult for the network to learn that different actions applied over the same state can have quite different results.
  • The second proposal tries to tackle the flaws of the first proposal by restricting the application to deterministic environments where (S-A always lead to S'). They propose using S' as input instead of S-A, which solves the problem of the impact of the actions in the output given the same state.
  • The authors don't tackle the problem of how to select the maximum achievable Q-Value, needed by the Q-Learning algorithm. This is in fact a significant miss given that finding the action that maximizes the output of the network can be computationally expensive, if not unfeasible.