Feeding “parallel” dataset during the training phase

neural networksreinforcement learning

I have some plans in working with Reinforcement Learning in order to predict the stock price movement. For a stock like TSLA, some training features might be the pivot price values and the set of the difference between two consecutive pivot points.

I would be interested that my model captures the general essence of the stock market. In other words, if I want my model to predict the stock price movement for TSLA, then my dataset will be built only on TSLA stock. If I try to predict the price movement on FB stock using that model, then it won't work for many reasons. So if I want my model to predict the price movement of any stock, then I have to build a dataset using all type of stock prices. For the purpose of this quesiton, instead of taking an example of dataset using all the stocks, I will use only three stocks, i.e. TSLA, FB and AMZN. So I will generate the dataset for two years for TSLA, two years of FB and two years of AMZN and passing it back to back to my model. So in this example, I pass 6 years of data to my model for training purpose. If start with FB, then the model will learn and memorize some patterns from the FB features. The problem is when the model is made to train on the AMZN features, it already starts to forget the information of the training on the FB dataset.

Is there a way to parallelised the training on several stock in the same time to avoid the memory issue? Instead that my action being a real value, it will be a action vector where the size is depending on the number of parallel stocks.

enter image description here

Best Answer

You should let your agent know in some way which environment it's currently playing. For example, pass in an additional one-hot vector e.g. [0,0,1] when it's playing the "FB" environment, [0,1,0] when it's playing "TSLA", etc.

Furthermore, it's good practice to store your rollouts in a replay buffer, and instead of training on one game at a time, to sample batches of (state, action, reward, next-state) tuples from the replay buffer to train on. This way, you can train on multiple environments simultaneously.

Related Question