Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

actor 和 critic 网络没有forward 是如何训练的?看不懂 #8

Open
whwyhwy opened this issue Jan 5, 2024 · 1 comment
Open

Comments

@whwyhwy
Copy link

whwyhwy commented Jan 5, 2024

请问在PPO_model.py 文件里,forward 是空的,为什么可以通过evaluate 函数实现呢? 实在没搞懂 这样的话HGNNScheduler 网络里的 actor 和 critic 是怎么训练的?
evaluate 函数,里面使用了 actor 和 critic ,那actor 网络的含义是什么啊? 初始化的输出只有1维,如何输出 action的 分布呢? 是通过里面的 act 和 get_action_prob 函数实现的?
那在test的时候还是通过 函数实现而不是 通过神经网络interfence得到结果的啊?
菜鸡懵逼ing

@ZhaoKe1024
Copy link

虽然PPO_model文件里没有forward,但是evaluate()里面调用了“self.get_machines[i]”,“self.get_machines[i]”, “self.actor", "self.critic",这几个都是nn.Module的子类。actor是输出动作的,逼近”状态-动作“函数,critic计算值函数,计算优势函数用。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants