Refinement of biologically inspired models of reinforcement learning
MetadataShow full item record
Reinforcement learning occurs when organisms adapt the propensities of given behaviours on the basis of associations with reward and punishment. Currently, reinforcement learning models have been validated in minimalist environments in which only 1-2 environmental stimuli are present as possible predictors of reward. The exception to this is two studies in which the responses of the dopamine system to configurations of multiple stimuli were investigated, however, in both cases the stimuli were presented simultaneously rather than in a sequence. Therefore, we set out to understand how current models of reinforcement learning would respond under more complex conditions in which sequences of events are predictors of reward. In the two experimental chapters of this thesis, we attempted to understand whether midbrain dopaminergic neurons would respond to occasion setters (Chapter 3), and to the overexpectation effect (Chapter 4). In addition, we ran simulations of the behavioural paradigms using temporal difference models of reinforcement learning (Chapter 2) and compared the predictions of the model with the behavioural and neurophysiological data. In Chapter 3, by performing single-neuron recording from VTA and SNpc dopaminergic cells, we demonstrated that our population of neurons were most responsive to the latest predictor of reward, the conditioned stimulus (CS) and not the earliest, the occasion setter (the OS). This is in stark contrast with the predictions of the model (Chapter 2), where the greatest response is seen at the OS onset. We also showed at a neural level that there was only a weak enhancement of the response to the discriminative stimulus (SD) when this was preceded by the OS. On the other hand, at a behavioural level, bar pressing was greatest when the SD was preceded by the OS, demonstrating that rats could use the information provided by the OS, but that dopamine was not controlling the conditioned response. In Chapter 4, our population of dopaminergic neurons showed that they would preferentially respond to only one of the two conditioned stimuli (CSA, CSB) in the overexpectation paradigm. The predictions of the model (Chapter 2) suggested that when the two stimuli would be presented in compound, there would be an inhibitory response if the reward magnitude was kept constant and an excitatory response if the reward magnitude was doubled. The lack of neural firing to one of the two conditioned stimuli, however, does not make for easy interpretation of the data. Perhaps, one of the conditioned stimuli acted as if it were overshadowing the other, resulting in no response to the second CS. Interestingly, at a behavioural level, we did not see increased licking frequency to the compound stimuli presentation, a result that is somewhat at odds with the previous literature. Overall, the results of our experimental chapters suggest that the role that midbrain dopaminergic neurons play in reinforcement learning is more complex than that envisaged by previous investigations.
Thesis, PhD Doctor of Philosophy