DataPredict™ [Release 2.6] (Mature + Maintenance Mode) - Machine Learning, Deep Learning And Reinforcement Learning Library - 45+ Models!

I think you should’ve kept it. I don’t think this library is going to be used in real-life scenarios and in a game-development context I think most tasks are episodic, not non-episodic. You could just include support for both and have the user specify whether the task is episodic or not.

The target network is an improvement to DQN which helps stabilize its performance. DQN is prone to divergence as noted by the deadly triad issue Why is there a Deadly Triad issue and how to handle it ? | by Quentin Delfosse | Medium where when combining TD-Learning / Bootstrapping, off-policy learning, and function approximation leads to instability and divergence. It works by creating a separate neural network which is a copy of the policy network called the target network. The target network is used for calculating the target q value when training the DQN instead of the policy network. The target network is not trained however. Instead, every x steps the weights from the policy network are copied to the target network. This helps reduce divergence and increases stability by creating a stationary target which is periodically updated.

Essentially, you can think of the target network as a “frozen” target which changes periodically.
Edit: This article: Why is there a Deadly Triad issue and how to handle it ? | by Quentin Delfosse | Medium is much better at explaining the deadly triad issue than the one I linked before.
Also this one: Bootcamp Summer 2020 Week 7 – DQN And The Deadly Triad, Or, Why DQN Shouldn’t Work But Still Does (gatech.edu)

2 Likes