Abstract: It has been observed that “Stereoscopic Portfolio Optimisation Frameworks” introduce the concept of bottom-up optimisation through the utilization of machine learning ensembles applied to some market micro-structure element. But in contrast to the normal belief, it doesn’t always pan out as expected. One of the popular and widely used “Deep Q-Learning” algorithms is quite unstable due to the shake in the Q-values and also due to the fact that over-estimation action values under certain conditions. These issues tend to affect their performance adversely. Inspired by the breakthroughs in DQN and DRQN, we suggest a modification to the last layers, to handle pseudo-continuous action spaces, as required for the portfolio management task. The implementation used currently, called as the “Deep Soft Recurrent Q-Network (DSRQN)” is dependent on a fixed and implicit policy. In this paper, we have described and developed an Ensembled Deep Reinforcement Learning architecture based on implementation of temporal ensemble, in order to stabilize the training process, achieved by reducing the variance of target approximation error. As a result of ensembling the target values, overestimation is reduced and it also makes the performance better by estimating more accurate Q-value. Our aggregate architecture leads to more accurate and optimized statistical results for this classical portfolio management and optimization problem.
Keywords: Temporal Ensemble, Reinforcement Learning, Deep Learning, Finance Technology, Algorithmic Trading
| DOI: 10.17148/IJARCCE.2020.9313