A Neural Signature of Hierarchical Reinforcement Learning

J.J.F. Ribas-Fernandes, A. Solway, C. Diuk, J.T. McGuire, A.G. Barto, Y. Niv, and M.M. Botvinick
Neuron, Volume 71, Issue 2, 370-379, 28 July 2011

Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.



Shinsuke Suzuki* and Hiromichi Kimura, "Oscillatory dynamics in the coevolution of cooperation and mobility"
がJournal of Theoretical Biology誌にアクセプトされました。

「5月4日に投稿→7月7日に査読結果返送(minor revision)→7月21日に改訂版を再投稿→7月22日にアクセプト」とすこぶる順調に行きました。


Excitatory transmission from the amygdala to nucleus accumbens facilitates reward seeking

Garret D. Stuber, Dennis R. Sparta, Alice M. Stamatakis, Wieke A. van Leeuwen, Juanita E. Hardjoprajitno, Saemi Cho, Kay M. Tye, Kimberly A. Kempadoo, Feng Zhang, Karl Deisseroth & Antonello Bonci
Nature 475, 377?380 (21 July 2011)

The basolateral amygdala (BLA) has a crucial role in emotional learning irrespective of valence1, 2, 3, 4, 5, 21, 22, 23. The BLA projection to the nucleus accumbens (NAc) is thought to modulate cue-triggered motivated behaviours4, 6, 7, 24, 25, but our understanding of the interaction between these two brain regions has been limited by the inability to manipulate neural-circuit elements of this pathway selectively during behaviour. To circumvent this limitation, we used in vivo optogenetic stimulation or inhibition of glutamatergic fibres from the BLA to the NAc, coupled with intracranial pharmacology and ex vivo electrophysiology. Here we show that optical stimulation of the pathway from the BLA to the NAc in mice reinforces behavioural responding to earn additional optical stimulation of these synaptic inputs. Optical stimulation of these glutamatergic fibres required intra-NAc dopamine D1-type receptor signalling, but not D2-type receptor signalling. Brief optical inhibition of fibres from the BLA to the NAc reduced cue-evoked intake of sucrose, demonstrating an important role of this specific pathway in controlling naturally occurring reward-related behaviour. Moreover, although optical stimulation of glutamatergic fibres from the medial prefrontal cortex to the NAc also elicited reliable excitatory synaptic responses, optical self-stimulation behaviour was not observed by activation of this pathway. These data indicate that whereas the BLA is important for processing both positive and negative affect, the glutamatergic pathway from the BLA to the NAc, in conjunction with dopamine signalling in the NAc, promotes motivated behavioural responding. Thus, optogenetic manipulation of anatomically distinct synaptic inputs to the NAc reveals functionally distinct properties of these inputs in controlling reward-seeking behaviours.


Dissociable Effects of Subtotal Lesions within the Macaque Orbital Prefrontal Cortex on Reward-Guided Behavior

Peter H. Rudebeck and Elisabeth A. Murray
J. Neurosci. 2011;31 10569-10578

The macaque orbital prefrontal cortex (PFo) has been implicated in a wide range of reward-guided behaviors essential for efficient foraging. The PFo, however, is not a homogeneous structure. Two major subregions, distinct by their cytoarchitecture and connections to other brain structures, compose the PFo. One subregion encompasses Walker's areas 11 and 13 and the other centers on Walker's area 14. Although it has been suggested that these subregions play dissociable roles in reward-guided behavior, direct neuropsychological evidence for this hypothesis is limited. To explore the independent contributions of PFo subregions to behavior, we studied rhesus monkeys (Macaca mulatta) with restricted excitotoxic lesions targeting either Walker's areas 11/13 or area 14. The performance of these two groups was compared to that of a group of unoperated controls on a series of reward-based tasks that has been shown to be sensitive to lesions of the PFo as a whole (Walker's areas 11, 13, and 14). Lesions of areas 11/13, but not area 14, disrupted the rapid updating of object value during selective satiation. In contrast, lesions targeting area 14, but not areas 11/13, impaired the ability of monkeys to learn to stop responding to a previously rewarded object. Somewhat surprisingly, neither lesion disrupted performance on a serial object reversal learning task, although aspiration lesions of the entire PFo produce severe deficits on this task. Our data indicate that anatomically defined subregions within macaque PFo make dissociable contributions to reward-guided behavior.

Functional Connectivity of the Striatum Links Motivation to Action Control in Humans

Helga A. Harsay, Michael X. Cohen, Nikolaas N. Oosterhof, Birte U. Forstmann, Rogier B. Mars, and K. Richard Ridderinkhof
J. Neurosci. 2011;31 10701-10711

Motivation improves the efficiency of intentional behavior, but how this performance modulation is instantiated in the human brain remains unclear. We used a reward-cued antisaccade paradigm to investigate how motivational goals (the expectation of a reward for good performance) modulate patterns of neural activation and functional connectivity to improve preparation for antisaccade performance. Behaviorally, subjects performed better (faster and more accurate antisaccades) when they knew they would be rewarded for good performance. Reward anticipation was associated with increased activation in the ventral and dorsal striatum, and cortical oculomotor regions. Functional connectivity between the caudate nucleus and cortical oculomotor control structures predicted individual differences in the behavioral benefit of reward anticipation. We conclude that although both dorsal and ventral striatal circuitry are involved in the anticipation of reward, only the dorsal striatum and its connected cortical network is involved in the direct modulation of oculomotor behavior by motivational incentive.

Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex

Kenway Louie, Lauren E. Grattan, and Paul W. Glimcher
J. Neurosci. 2011;31 10627-10639

The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.


Dorsolateral Prefrontal Cortex Drives Mesolimbic Dopaminergic Regions to Initiate Motivated Behavior

Ian C. Ballard, Vishnu P. Murty, R. McKell Carter, Jeffrey J. MacInnes, Scott A. Huettel, and R. Alison Adcock
J. Neurosci. 2011;31 10340-10346

How does the brain translate information signaling potential rewards into motivation to get them? Motivation to obtain reward is thought to depend on the midbrain [particularly the ventral tegmental area (VTA)], the nucleus accumbens (NAcc), and the dorsolateral prefrontal cortex (dlPFC), but it is not clear how the interactions among these regions relate to reward-motivated behavior. To study the influence of motivation on these reward-responsive regions and on their interactions, we used dynamic causal modeling to analyze functional magnetic resonance imaging (fMRI) data from humans performing a simple task designed to isolate reward anticipation. The use of fMRI permitted the simultaneous measurement of multiple brain regions while human participants anticipated and prepared for opportunities to obtain reward, thus allowing characterization of how information about reward changes physiology underlying motivational drive. Furthermore, we modeled the impact of external reward cues on causal relationships within this network, thus elaborating a link between physiology, connectivity, and motivation. Specifically, our results indicated that dlPFC was the exclusive entry point of information about reward in this network, and that anticipated reward availability caused VTA activation only via its effect on the dlPFC. Anticipated reward thus increased dlPFC activation directly, whereas it influenced VTA and NAcc only indirectly, by enhancing intrinsically weak or inactive pathways from the dlPFC. Our findings of a directional prefrontal influence on dopaminergic regions during reward anticipation suggest a model in which the dlPFC integrates and transmits representations of reward to the mesolimbic and mesocortical dopamine systems, thereby initiating motivated behavior.


Neural and computational mechanisms of postponed decisions

Marina Martínez-García, Edmund T. Rolls, Gustavo Deco and Ranulfo Romo
PNAS July 12, 2011 vol. 108 no. 28 11626-11631


We consider the mechanisms that enable decisions to be postponed for a period after the evidence has been provided. Using an information theoretic approach, we show that information about the forthcoming action becomes available from the activity of neurons in the medial premotor cortex in a sequential decision-making task after the second stimulus is applied, providing the information for a decision about whether the first or second stimulus is higher in vibrotactile frequency. The information then decays in a 3-s delay period in which the neuronal activity declines before the behavioral response can be made. The information then increases again when the behavioral response is required. We model this neuronal activity using an attractor decision-making network in which information reflecting the decision is maintained at a low level during the delay period, and is then selectively restored by a nonspecific input when the response is required. One mechanism for the short-term memory is synaptic facilitation, which can implement a mechanism for postponed decisions that can be correct even when there is little neuronal firing during the delay period before the postponed decision. Another mechanism is graded firing rates by different neurons in the delay period, with restoration by the nonspecific input of the low-rate activity from the higher-rate neurons still firing in the delay period. These mechanisms can account for the decision making and for the memory of the decision before a response can be made, which are evident in the activity of neurons in the medial premotor cortex.

Punishment sustains large-scale cooperation in prestate warfare

Sarah Mathew and Robert Boyd
PNAS July 12, 2011 vol. 108 no. 28 11375-11380


Understanding cooperation and punishment in small-scale societies is crucial for explaining the origins of human cooperation. We studied warfare among the Turkana, a politically uncentralized, egalitarian, nomadic pastoral society in East Africa. Based on a representative sample of 88 recent raids, we show that the Turkana sustain costly cooperation in combat at a remarkably large scale, at least in part, through punishment of free-riders. Raiding parties comprised several hundred warriors and participants are not kin or day-to-day interactants. Warriors incur substantial risk of death and produce collective benefits. Cowardice and desertions occur, and are punished by community-imposed sanctions, including collective corporal punishment and fines. Furthermore, Turkana norms governing warfare benefit the ethnolinguistic group, a population of a half-million people, at the expense of smaller social groupings. These results challenge current views that punishment is unimportant in small-scale societies and that human cooperation evolved in small groups of kin and familiar individuals. Instead, these results suggest that cooperation at the larger scale of ethnolinguistic units enforced by third-party sanctions could have a deep evolutionary history in the human species.