Activity in Inferior Parietal and Medial Prefrontal Cortex Signals the Accumulation of Evidence in a Probability Learning Task
Mathieu d'Acremont, Eleonora Fornari, Peter Bossaerts
PLOS Computational Biology: published 31 Jan 2013 | info:doi/10.1371/journal.pcbi.1002895
「確率」の学習。価値（報酬や罰）とは独立に確率を学習する実験課題になっている。学習（更新）された確率はmedial prefrontal cortexとinferior parietal cortexにコードされている。
In an uncertain environment, probabilities are key to predicting future events and making adaptive choices. However, little is known about how humans learn such probabilities and where and how they are encoded in the brain, especially when they concern more than two outcomes. During functional magnetic resonance imaging (fMRI), young adults learned the probabilities of uncertain stimuli through repetitive sampling. Stimuli represented payoffs and participants had to predict their occurrence to maximize their earnings. Choices indicated loss and risk aversion but unbiased estimation of probabilities. BOLD response in medial prefrontal cortex and angular gyri increased linearly with the probability of the currently observed stimulus, untainted by its value. Connectivity analyses during rest and task revealed that these regions belonged to the default mode network. The activation of past outcomes in memory is evoked as a possible mechanism to explain the engagement of the default mode network in probability learning. A BOLD response relating to value was detected only at decision time, mainly in striatum. It is concluded that activity in inferior parietal and medial prefrontal cortex reflects the amount of evidence accumulated in favor of competing and uncertain outcomes.
Wouter van den Bos, Arjun Talwar, and Samuel M. McClure
The Journal of Neuroscience, 30 January 2013, 33(5): 2137-2146; doi: 10.1523/JNEUROSCI.3095-12.2013
In competitive social environments, people often deviate from what rational choice theory prescribes, resulting in losses or suboptimal monetary gains. We investigate how competition affects learning and decision-making in a common value auction task. During the experiment, groups of five human participants were simultaneously scanned using MRI while playing the auction task. We first demonstrate that bidding is well characterized by reinforcement learning with biased reward representations dependent on social preferences. Indicative of reinforcement learning, we found that estimated trial-by-trial prediction errors correlated with activity in the striatum and ventromedial prefrontal cortex. Additionally, we found that individual differences in social preferences were related to activity in the temporal-parietal junction and anterior insula. Connectivity analyses suggest that monetary and social value signals are integrated in the ventromedial prefrontal cortex and striatum. Based on these results, we argue for a novel mechanistic account for the integration of reinforcement history and social preferences in competitive decision-making.
Jaime S. Ide, Pradeep Shenoy, Angela J. Yu, and Chiang-shan R. Li
The Journal of Neuroscience, 30 January 2013, 33(5): 2039-2047; doi: 10.1523/JNEUROSCI.2201-12.2013
Stop-signal課題における被験者の行動（正解／不正解、反応時間）はベイズ予測でモデル化できる。また、dACCは「報酬予測誤差」と「報酬予測誤差の絶対値（unsigned prediction error）」の両方にモジュレートされる。
The dorsal anterior cingulate cortex (dACC) has been implicated in a variety of cognitive control functions, among them the monitoring of conflict, error, and volatility, error anticipation, reward learning, and reward prediction errors. In this work, we used a Bayesian ideal observer model, which predicts trial-by-trial probabilistic expectation of stop trials and response errors in the stop-signal task, to differentiate these proposed functions quantitatively. We found that dACC hemodynamic response, as measured by functional magnetic resonance imaging, encodes both the absolute prediction error between stimulus expectation and outcome, and the signed prediction error related to response outcome. After accounting for these factors, dACC has no residual correlation with conflict or error likelihood in the stop-signal task. Consistent with recent monkey neural recording studies, and in contrast with other neuroimaging studies, our work demonstrates that dACC reports at least two different types of prediction errors, and beyond contexts that are limited to reward processing.
Chung-Hay Luk and Jonathan D. Wallis
The Journal of Neuroscience, 30 January 2013, 33(5): 1864-1871; doi: 10.1523/JNEUROSCI.4920-12.2013
To optimally obtain desirable outcomes, organisms must track outcomes predicted by stimuli in the environment (stimulus–outcome or SO associations) and outcomes predicted by their own actions (action–outcome or AO associations). Anterior cingulate cortex (ACC) and orbitofrontal cortex (OFC) are implicated in tracking outcomes, but anatomical and functional studies suggest a dissociation, with ACC and OFC responsible for encoding AO and SO associations, respectively. To examine whether this dissociation held at the single neuron level, we trained two subjects to perform choice tasks that required using AO or SO associations. OFC and ACC neurons encoded the action that the subject used to indicate its choice, but this encoding was stronger in OFC during the SO task and stronger in ACC during the AO task. These results are consistent with a division of labor between the two areas in terms of using rewards associated with either stimuli or actions to guide decision-making.
Interaction Between Orbital Prefrontal and Rhinal Cortex Is Required for Normal Estimates of Expected Value
Andrew M. Clark, Sebastien Bouret, Adrienne M. Young, Elisabeth A. Murray, and Barry J. Richmond
The Journal of Neuroscience, 30 January 2013, 33(5): 1833-1845; doi: 10.1523/JNEUROSCI.3605-12.2013
Predicting and valuing potential rewards requires integrating sensory, associative, and contextual information with subjective reward preferences. Previous work has identified regions in the prefrontal cortex and medial temporal lobe believed to be important for each of these functions. For example, activity in the orbital prefrontal cortex (PFo) encodes the specific sensory properties of and preferences for rewards, while activity in the rhinal cortex (Rh) encodes stimulus-stimulus and stimulus–reward associations. Lesions of either structure impair the ability to use visual cues or the history of previous reinforcement to value expected rewards. These areas are linked via reciprocal connections, suggesting it might be their interaction that is critical for estimating expected value. To test this hypothesis, we interrupted direct, intra-hemispheric PFo-Rh interaction in monkeys by performing crossed unilateral ablations of these regions (functional disconnection). We asked whether this circuit is crucial primarily for cue–reward association or for estimating expected value per se, by testing these monkeys, as well as intact controls, on tasks in which expected value was either visually cued or had to be inferred from block-wise changes in reward size in uncued trials. Functional disconnection significantly affected performance in both tasks. Specifically, monkeys with functional disconnection showed less of a difference in error rates and reaction times across reward sizes, in some cases behaving as if they expected rewards to be of equal magnitude. These results support a model whereby information about rewards signaled in PFo is combined with associative and contextual information signaled within Rh to estimate expected value.
Christopher J. Burke, Christian Brunger, Thorsten Kahnt, Soyoung Q. Park, and Philippe N. Tobler
J. Neurosci. 2013;33 1706-1713
Rewards in real life are rarely received without incurring costs and successful reward harvesting often involves weighing and minimizing different types of costs. In the natural environment, such costs often include the physical effort required to obtain rewards and potential risks attached to them. Costs may also include potential risks. In this study, we applied fMRI to explore the neural coding of physical effort costs as opposed to costs associated with risky rewards. Using an incentive-compatible valuation mechanism, we separately measured the subjective costs associated with effortful and risky options. As expected, subjective costs of options increased with both increasing effort and increasing risk. Despite the similar nature of behavioral discounting of effort and risk, distinct regions of the brain coded these two cost types separately, with anterior insula primarily processing risk costs and midcingulate and supplementary motor area (SMA) processing effort costs. To investigate integration of the two cost types, we also presented participants with options that combined effortful and risky elements. We found that the frontal pole integrates effort and risk costs through functional coupling with the SMA and insula. The degree to which the latter two regions influenced frontal pole activity correlated with participant-specific behavioral sensitivity to effort and risk costs. These data support the notion that, although physical effort costs may appear to be behaviorally similar to other types of costs, such as risk, they are treated separately at the neural level and are integrated only if there is a need to do so.
Masaaki Ogawa, Matthijs A.A. van der Meer, Guillem R. Esber, Domenic H. Cerri, Thomas A. Stalnaker, Geoffrey Schoenbaum
Neuron, Volume 77, Issue 2, 251-258, 23 January 2013
Decision making is impacted by uncertainty and risk (i.e., variance). Activity in the orbitofrontal cortex, an area implicated in decision making, covaries with these quantities. However, this activity could reflect the heightened salience of situations in which multiple outcomes—reward and reward omission—are expected. To resolve these accounts, rats were trained to respond to cues predicting 100%, 67%, 33%, or 0% reward. Consistent with prior reports, some orbitofrontal neurons fired differently in anticipation of uncertain (33% and 67%) versus certain (100% and 0%) reward. However, over 90% of these neurons also fired differently prior to 100% versus 0% reward (or baseline) or prior to 33% versus 67% reward. These responses are inconsistent with risk but fit well with the representation of acquired salience linked to the sum of cue-outcome and cue-no-outcome associative strengths. These results expand our understanding of how the orbitofrontal cortex might regulate learning and behavior.
Ting Xiang, Terry Lohrenz, and P. Read Montague
J. Neurosci. 2013;33 1099-1108
Social norms in humans constrain individual behaviors to establish shared expectations within a social group. Previous work has probed social norm violations and the feelings that such violations engender; however, a computational rendering of the underlying neural and emotional responses has been lacking. We probed norm violations using a two-party, repeated fairness game (ultimatum game) where proposers offer a split of a monetary resource to a responder who either accepts or rejects the offer. Using a norm-training paradigm where subject groups are preadapted to either high or low offers, we demonstrate that unpredictable shifts in expected offers creates a difference in rejection rates exhibited by the two responder groups for otherwise identical offers. We constructed an ideal observer model that identified neural correlates of norm prediction errors in the ventral striatum and anterior insula, regions that also showed strong responses to variance-prediction errors generated by the same model. Subjective feelings about offers correlated with these norm prediction errors, and the two signals displayed overlapping, but not identical, neural correlates in striatum, insula, and medial orbitofrontal cortex. These results provide evidence for the hypothesis that responses in anterior insula can encode information about social norm violations that correlate with changes in overt behavior (changes in rejection rates). Together, these results demonstrate that the brain regions involved in reward prediction and risk prediction are also recruited in signaling social norm violations.
Thorsten Kahnt and Philippe N. Tobler
J. Neurosci. 2013;33 863-869
Value-based decisions optimize the relation of costs and benefits. Costs and benefits confer not only value but also salience, which may influence decision making through attentional mechanisms. However, the computational and neurobiological role of salience in value-based decisions remains elusive. Here we develop and contrast two formal concepts of salience for value-based choices involving costs and benefits. Specifically, global salience (GS) first integrates costs and benefits and then determines salience based on this overall sum, whereas elemental salience (ES) first determines the salience of costs and benefits before integrating them. We dissociate the behavioral and neural effects of GS and ES from those of value using a value-based decision-making task and fMRI in humans. Specifically, we show that value guides choices and correlates with neural signals in the striatum. In contrast, only ES but not GS impacts decision making by speeding up reaction times. Moreover, activity in the right temporoparietal junction (RTPJ) reflects only ES and correlates with its response-accelerating behavioral effects. Finally, we report an ES-dependent change in functional connectivity between the RTPJ and the locus ceruleus, suggesting noradrenergic processes underlying the response-facilitating effects of ES on decision making. Together, these results support a novel concept of salience in value-based decision making and suggest a computational, anatomical, and neurochemical dissociation of value- and salience-based factors supporting value-based choices.
Emmanuele Tidoni, Sara Borgomaneri, Giuseppe di Pellegrino, and Alessio Avenanti
J. Neurosci. 2013;33 611-623
The ability to infer deceptive intents from nonverbal behavior is critical for social interactions. By combining single-pulse and repetitive transcranial magnetic stimulation (TMS) in healthy humans, we provide both correlational and causative evidence that action simulation is actively involved in the ability to recognize deceptive body movements. We recorded motor-evoked potentials during a faked-action discrimination (FAD) task: participants watched videos of actors lifting a cube and judged whether the actors were trying to deceive them concerning the real weight of the cube. Seeing faked actions facilitated the observers' motor system more than truthful actions in a body-part-specific manner, suggesting that motor resonance was sensitive to deceptive movements. Furthermore, we found that TMS virtual lesion to the anterior node of the action observation network, namely the left inferior frontal cortex (IFC), reduced perceptual sensitivity in the FAD task. In contrast, no change in FAD task performance was found after virtual lesions to the left temporoparietal junction (control site). Moreover, virtual lesion to the IFC failed to affect performance in a difficulty-matched spatial-control task that did not require processing of spatiotemporal (acceleration) and configurational (limb displacement) features of seen actions, which are critical to detecting deceptive intent in the actions of others. These findings indicate that the human IFC is critical for recognizing deceptive body movements and suggest that FAD relies on the simulation of subtle changes in action kinematics within the motor system.
Steve W C Chang, Jean-François Gariépy, Michael L Platt
Nature Neuroscience (2012) doi:10.1038/nn.3287
Received 01 October 2012 Accepted 20 November 2012 Published online 23 December 2012
ちなみに、ACCsは前帯状「溝（sulcus）」で、ACCgは前帯状「回（gyrus）」でACCsと脳梁の間の部分。またOFCは「lateral OFC」で、所謂vmPFCとは違う場所。この辺は http://www.ncbi.nlm.nih.gov/pubmed/21689594 に詳しい。
Social decisions are crucial for the success of individuals and the groups that they comprise. Group members respond vicariously to benefits obtained by others, and impairments in this capacity contribute to neuropsychiatric disorders such as autism and sociopathy. We examined the manner in which neurons in three frontal cortical areas encoded the outcomes of social decisions as monkeys performed a reward-allocation task. Neurons in the orbitofrontal cortex (OFC) predominantly encoded rewards that were delivered to oneself. Neurons in the anterior cingulate gyrus (ACCg) encoded reward allocations to the other monkey, to oneself or to both. Neurons in the anterior cingulate sulcus (ACCs) signaled reward allocations to the other monkey or to no one. In this network of received (OFC) and foregone (ACCs) reward signaling, ACCg emerged as an important nexus for the computation of shared experience and social reward. Individual and species-specific variations in social decision-making might result from the relative activation and influence of these areas.
Impulsivity and Self-Control during Intertemporal Decision Making Linked to the Neural Dynamics of Reward Value Representation
Koji Jimura, Maria S. Chushak, and Todd S. Braver
The Journal of Neuroscience, 2 January 2013, 33(1):344-357; doi:10.1523/JNEUROSCI.0919-12.2013
A characteristic marker of impulsive decision making is the discounting of delayed rewards, demonstrated via choice preferences and choice-related brain activity. However, delay discounting may also arise from how subjective reward value is dynamically represented in the brain when anticipating an upcoming chosen reward. In the current study, brain activity was continuously monitored as human participants freely selected an immediate or delayed primary liquid reward and then waited for the specified delay before consuming it. The ventromedial prefrontal cortex (vmPFC) exhibited a characteristic pattern of activity dynamics during the delay period, as well as modulation during choice, that is consistent with the time-discounted coding of subjective value. The ventral striatum (VS) exhibited a similar activity pattern, but preferentially in impulsive individuals. A contrasting profile of delay-related and choice activation was observed in the anterior PFC (aPFC), but selectively in patient individuals. Functional connectivity analyses indicated that both vmPFC and aPFC exerted modulatory, but opposite, influences on VS activation. These results link behavioral impulsivity and self-control to dynamically evolving neural representations of future reward value, not just during choice, but also during postchoice delay periods.
Curr Opin Neurobiol. 2012 Dec 22
Recent work has advanced our knowledge of phasic dopamine reward prediction error signals. The error signal is bidirectional, reflects well the higher order prediction error described by temporal difference learning models, is compatible with model-free and model-based reinforcement learning, reports the subjective rather than physical reward value during temporal discounting and reflects subjective stimulus perception rather than physical stimulus aspects. Dopamine activations are primarily driven by reward, and to some extent risk, whereas punishment and salience have only limited activating effects when appropriate controls are respected. The signal is homogeneous in terms of time course but heterogeneous in many other aspects. It is essential for synaptic plasticity and a range of behavioural learning situations.
Hoseok Kim, Daeyeol Lee, and Min Whan Jung
The Journal of Neuroscience, 2 January 2013, 33(1):52-63; doi:10.1523/JNEUROSCI.2422-12.2013
The cortico-basal ganglia network has been proposed to consist of parallel loops serving distinct functions. However, it is still uncertain how the content of processed information varies across different loops and how it is related to the functions of each loop. We investigated this issue by comparing neuronal activity in the dorsolateral (sensorimotor) and dorsomedial (associative) striatum, which have been linked to habitual and goal-directed action selection, respectively, in rats performing a dynamic foraging task. Both regions conveyed significant neural signals for the animal's goal choice and its outcome. Moreover, both regions conveyed similar levels of neural signals for action value before the animal's goal choice and chosen value after the outcome of the animal's choice was revealed. However, a striking difference was found in the persistence of neural signals for the animal's chosen action. Signals for the animal's goal choice persisted in the dorsomedial striatum until the outcome of the animal's next goal choice was revealed, whereas they dissipated rapidly in the dorsolateral striatum. These persistent choice signals might be used for causally linking temporally discontiguous responses and their outcomes in the dorsomedial striatum, thereby contributing to its role in goal-directed action selection.