Add c51 for dqn and dqfd #115

Curt-Park · 2019-03-16T07:21:39Z

Tested on lunarlander-v2.
performance is not so good :(

@MrSyee You don't have to review this now. This PR is for code review with external contributors.

kkweon

LGTM

kkweon · 2019-03-16T14:32:35Z

algorithms/dqn/agent.py

@@ -252,7 +251,7 @@ def write_log(self, i: int, loss: np.ndarray, score: int):
        """Write log about loss and score"""
        print(
            "[INFO] episode %d, episode step: %d, total step: %d, total score: %d\n"
-            "epsilon: %f, loss: %f, avg q-value: %f at %s\n"
+            "epsilon: %f, loss: %f, avg_q_value: %f at %s\n"


Why no f-string?

Suggested change

"epsilon: %f, loss: %f, avg_q_value: %f at %s\n"

f"epsilon: {i}, loss: {loss[0]}, avg_q_value: {loss[1]} at {now()}"

I didn't use f-string because it is not compatible with the python versions lower than 3.6.

kkweon · 2019-03-16T16:34:19Z

algorithms/dqn/agent.py

-        curr_q_value = q_values.gather(1, actions.long().unsqueeze(1))
-        next_q_value = next_target_q_values.gather(  # Double DQN
-            1, next_q_values.argmax(1).unsqueeze(1)
+        batch_size = self.hyper_params["BATCH_SIZE"]


Create a key instead of using str?

Suggested change

batch_size = self.hyper_params["BATCH_SIZE"]

from params.keys import BATCH_SIZE

batch_size = self.hyper_params[BATCH_SIZE]

# params/keys.py BATCH_SIZE = "BATCH_SIZE"

I am looking for any way not to use strings as keys, like enum in c.
It would be better if I don't have to make a new .py to define keys.

I made an issue: #116

kkweon · 2019-03-16T16:36:45Z

algorithms/dqn/networks.py

+        atom_size: int = 51,
+        v_min: int = -10,
+        v_max: int = 10,
+        hidden_activation: Callable = F.relu,


Suggested change

hidden_activation: Callable = F.relu,

hidden_activation: Callable[[torch.Tensor], torch.Tensor] = F.relu,

I will open an issue for it. Thanks.

I opened an issue: #117

Curt-Park · 2019-03-19T02:41:54Z

The test is ongoing: https://app.wandb.ai/curt-park/dqn/reports?view=curt-park%2FPong%20%28C51%20%2F%20Dueling%29

medipixel · 2019-03-19T03:08:29Z

This pull request introduces 1 alert when merging 75a4ac7 into 94d9fd8 - view on LGTM.com

new alerts:

1 for Variable defined multiple times

Comment posted by LGTM.com

MrSyee · 2019-03-19T08:24:36Z

algorithms/dqn/utils.py

+    return dq_loss_element_wise, q_values
+
+
+def get_dqn_loss(


should change method name (duplicate name)

curt-park added 2 commits March 15, 2019 18:32

Add c51 draft

6a40f27

Fix nan issue on dqn and add c51 on dqfd

fd3a91b

Curt-Park self-assigned this Mar 16, 2019

Curt-Park requested a review from MrSyee March 16, 2019 07:21

kkweon reviewed Mar 16, 2019

View reviewed changes

curt-park and others added 2 commits March 19, 2019 11:37

Add C51 option (C51 is enabled by default on pong)

777693c

Merge branch 'master' into feature/c51

75a4ac7

curt-park added 3 commits March 19, 2019 13:05

Remove lgtm warning: multiple model definition

181a866

Change batch size 128 to 32

bfeb011

Change methods name (by kh)

4e4d5d5

MrSyee reviewed Mar 19, 2019

View reviewed changes

MrSyee approved these changes Mar 19, 2019

View reviewed changes

MrSyee merged commit 0269595 into master Mar 19, 2019

Curt-Park deleted the feature/c51 branch March 19, 2019 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add c51 for dqn and dqfd #115

Add c51 for dqn and dqfd #115

Curt-Park commented Mar 16, 2019

kkweon left a comment

kkweon Mar 16, 2019

Curt-Park Mar 18, 2019

kkweon Mar 16, 2019

Curt-Park Mar 18, 2019

Curt-Park Mar 18, 2019

kkweon Mar 16, 2019

Curt-Park Mar 18, 2019

Curt-Park Mar 18, 2019

Curt-Park commented Mar 19, 2019

medipixel commented Mar 19, 2019

MrSyee Mar 19, 2019

	"epsilon: %f, loss: %f, avg_q_value: %f at %s\n"
	f"epsilon: {i}, loss: {loss[0]}, avg_q_value: {loss[1]} at {now()}"

	batch_size = self.hyper_params["BATCH_SIZE"]
	from params.keys import BATCH_SIZE
	batch_size = self.hyper_params[BATCH_SIZE]

	hidden_activation: Callable = F.relu,
	hidden_activation: Callable[[torch.Tensor], torch.Tensor] = F.relu,

Add c51 for dqn and dqfd #115

Add c51 for dqn and dqfd #115

Conversation

Curt-Park commented Mar 16, 2019

kkweon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Curt-Park commented Mar 19, 2019

medipixel commented Mar 19, 2019

Choose a reason for hiding this comment