Refinement of biologically inspired models of reinforcement learning
Abstract
Reinforcement learning occurs when organisms adapt the propensities of
given behaviours on the basis of associations with reward and punishment. Currently,
reinforcement learning models have been validated in minimalist environments in
which only 1-2 environmental stimuli are present as possible predictors of reward.
The exception to this is two studies in which the responses of the dopamine system to
configurations of multiple stimuli were investigated, however, in both cases the
stimuli were presented simultaneously rather than in a sequence.
Therefore, we set out to understand how current models of reinforcement
learning would respond under more complex conditions in which sequences of events
are predictors of reward. In the two experimental chapters of this thesis, we attempted
to understand whether midbrain dopaminergic neurons would respond to occasion
setters (Chapter 3), and to the overexpectation effect (Chapter 4). In addition, we ran
simulations of the behavioural paradigms using temporal difference models of
reinforcement learning (Chapter 2) and compared the predictions of the model with
the behavioural and neurophysiological data.
In Chapter 3, by performing single-neuron recording from VTA and SNpc
dopaminergic cells, we demonstrated that our population of neurons were most
responsive to the latest predictor of reward, the conditioned stimulus (CS) and not the
earliest, the occasion setter (the OS). This is in stark contrast with the predictions of
the model (Chapter 2), where the greatest response is seen at the OS onset. We also
showed at a neural level that there was only a weak enhancement of the response to
the discriminative stimulus (SD) when this was preceded by the OS. On the other
hand, at a behavioural level, bar pressing was greatest when the SD was preceded by
the OS, demonstrating that rats could use the information provided by the OS, but that
dopamine was not controlling the conditioned response.
In Chapter 4, our population of dopaminergic neurons showed that they would
preferentially respond to only one of the two conditioned stimuli (CSA, CSB) in the
overexpectation paradigm. The predictions of the model (Chapter 2) suggested that
when the two stimuli would be presented in compound, there would be an inhibitory
response if the reward magnitude was kept constant and an excitatory response if the
reward magnitude was doubled. The lack of neural firing to one of the two
conditioned stimuli, however, does not make for easy interpretation of the data.
Perhaps, one of the conditioned stimuli acted as if it were overshadowing the
other, resulting in no response to the second CS. Interestingly, at a behavioural level,
we did not see increased licking frequency to the compound stimuli presentation, a
result that is somewhat at odds with the previous literature.
Overall, the results of our experimental chapters suggest that the role that
midbrain dopaminergic neurons play in reinforcement learning is more complex than
that envisaged by previous investigations.
Type
Thesis, PhD Doctor of Philosophy
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.