2011年5月25日水曜日

Distributed Coding of Actual and Hypothetical Outcomes in the Orbital and Dorsolateral Prefrontal Cortex

H. Abe and D. Lee
Neuron, Volume 70, Issue 4, 731-741, 26 May 2011

チョキで負けたらチョキが悪かったことが分かる(実際の結果からの学習)。一方、じゃんけんのルールを知ってれば、パーを出せば勝っていたはずと分かる(仮想的な結果からの学習)。サルもこのような学習ができ、dlPFC、OFCのニューロンが実際の結果・仮想的結果の両方をコード(dlPFCの方が主に仮想的結果をコード)。

How are decision-making strategies altered by hypothetical outcomes resulting from unchosen actions? Abe and Lee find that monkeys adjust their strategies in a rock-paper-scissors task according to both actual and hypothetical outcomes. Neurons in the prefrontal cortex modulated their activity related to actual and hypothetical outcomes differently depending on the animal's choices, thereby encoding choice-outcome conjunctions for both experienced and hypothetical outcomes.

Action Dominates Valence in Anticipatory Representations in the Human Striatum and Dopaminergic Midbrain

Marc Guitart-Masip, Lluis Fuentemilla, Dominik R. Bach, Quentin J. M. Huys, Peter Dayan, Raymond J. Dolan, and Emrah Duzel
J. Neurosci. 2011;31 7867-7875

The acquisition of reward and the avoidance of punishment could logically be contingent on either emitting or withholding particular actions. However, the separate pathways in the striatum for go and no-go appear to violate this independence, instead coupling affect and effect. Respect for this interdependence has biased many studies of reward and punishment, so potential action?outcome valence interactions during anticipatory phases remain unexplored. In a functional magnetic resonance imaging study with healthy human volunteers, we manipulated subjects' requirement to emit or withhold an action independent from subsequent receipt of reward or avoidance of punishment. During anticipation, in the striatum and a lateral region within the substantia nigra/ventral tegmental area (SN/VTA), action representations dominated over valence representations. Moreover, we did not observe any representation associated with different state values through accumulation of outcomes, challenging a conventional and dominant association between these areas and state value representations. In contrast, a more medial sector of the SN/VTA responded preferentially to valence, with opposite signs depending on whether action was anticipated to be emitted or withheld. This dominant influence of action requires an enriched notion of opponency between reward and punishment.

2011年5月19日木曜日

Ventromedial Frontal Lobe Damage Disrupts Value Maximization in Humans

Nathalie Camille, Cathryn A. Griffiths, Khoi Vo, Lesley K. Fellows, and Joseph W. Kable
The Journal of Neuroscience, 18 May 2011, 31(20): 7527-7532

Recent work in neuroeconomics has shown that regions in orbitofrontal and medial prefrontal cortex encode the subjective value of different options during choice. However, these electrophysiological and neuroimaging studies cannot demonstrate whether such signals are necessary for value-maximizing choices. Here we used a paradigm developed in experimental economics to empirically measure and quantify violations of utility theory in humans with damage to the ventromedial frontal lobe (VMF). We show that people with such damage are more likely to make choices that violate the generalized axiom of revealed preference, which is the one necessary and sufficient condition for choices to be consistent with value maximization. These results demonstrate that the VMF plays a critical role in value-maximizing choice.

2011年5月4日水曜日

Elapsed Decision Time Affects the Weighting of Prior Probability in a Perceptual Decision Task

Timothy D. Hanks, Mark E. Mazurek, Roozbeh Kiani, Elisabeth Hopp, and Michael N. Shadlen
The Journal of Neuroscience, 27 April 2011, 31(17): 6339-6352

Decisions are often based on a combination of new evidence with prior knowledge of the probable best choice. Optimal combination requires knowledge about the reliability of evidence, but in many realistic situations, this is unknown. Here we propose and test a novel theory: the brain exploits elapsed time during decision formation to combine sensory evidence with prior probability. Elapsed time is useful because (1) decisions that linger tend to arise from less reliable evidence, and (2) the expected accuracy at a given decision time depends on the reliability of the evidence gathered up to that point. These regularities allow the brain to combine prior information with sensory evidence by weighting the latter in accordance with reliability. To test this theory, we manipulated the prior probability of the rewarded choice while subjects performed a reaction-time discrimination of motion direction using a range of stimulus reliabilities that varied from trial to trial. The theory explains the effect of prior probability on choice and reaction time over a wide range of stimulus strengths. We found that prior probability was incorporated into the decision process as a dynamic bias signal that increases as a function of decision time. This bias signal depends on the speed–accuracy setting of human subjects, and it is reflected in the firing rates of neurons in the lateral intraparietal area (LIP) of rhesus monkeys performing this task.

Human Dorsal Striatal Activity during Choice Discriminates Reinforcement Learning Behavior from the Gambler's Fallacy

Ryan K. Jessup and John P. O'Doherty
The Journal of Neuroscience, 27 April 2011, 31(17): 6296-6304

Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor–critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum—as predicted by an actor–critic instantiation—is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor–critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus–response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

Dissociable Effects of Lesions to Orbitofrontal Cortex Subregions on Impulsive Choice in the Rat

Adam C. Mar, Alice L. J. Walker, David E. Theobald, Dawn M. Eagle, and Trevor W. Robbins
The Journal of Neuroscience, 27 April 2011, 31(17): 6398-640

The orbitofrontal cortex (OFC) is implicated in a variety of adaptive decision-making processes. Human studies suggest that there is a functional dissociation between medial and lateral OFC (mOFC and lOFC, respectively) subregions when performing certain choice procedures. However, little work has examined the functional consequences of manipulations of OFC subregions on decision making in rodents. In the present experiments, impulsive choice was assessed by evaluating intolerance to delayed, but economically optimal, reward options using a delay-discounting paradigm. Following initial delay-discounting training, rats received bilateral neurotoxic or sham lesions targeting whole OFC (wOFC) or restricted to either mOFC or lOFC subregions. A transient flattening of delay-discounting curves was observed in wOFC-lesioned animals relative to shams—differences that disappeared with further training. Stable, dissociable effects were found when lesions were restricted to OFC subregions; mOFC-lesioned rats showed increased, whereas lOFC-lesioned rats showed decreased, preference for the larger-delayed reward relative to sham-controls—a pattern that remained significant during retraining after all delays were removed. When locations of levers leading to small–immediate versus large–delayed rewards were reversed, wOFC- and lOFC-lesioned rats showed retarded, whereas mOFC-lesioned rats showed accelerated, trajectories for reversal of lever preference. These results provide the first direct evidence for dissociable functional roles of the mOFC and lOFC for impulsive choice in rodents. The findings are consistent with recent human functional imaging studies and suggest that functions of mOFC and lOFC subregions may be evolutionarily conserved and contribute differentially to decision-making processes.

Roles of Nucleus Accumbens Core and Shell in Incentive-Cue Responding and Behavioral Inhibition

Frederic Ambroggi, Ali Ghazizadeh, Saleem M. Nicola, and Howard L. Fields
The Journal of Neuroscience, 4 May 2011, 31(18): 6820-6830

The nucleus accumbens (NAc) is involved in many reward-related behaviors. The NAc has two major components, the core and the shell. These two areas have different inputs and outputs, suggesting that they contribute differentially to goal-directed behaviors. Using a discriminative stimulus (DS) task in rats and inactivating the NAc by blocking excitatory inputs with glutamate antagonists, we dissociated core and shell contributions to task performance. NAc core but not shell inactivation decreased responding to a reward-predictive cue. In contrast, inactivation of either subregion induced a general behavioral disinhibition. This reveals that the NAc actively suppresses actions inappropriate to the DS task. Importantly, selective inactivation of the shell but not core significantly increased responding to the nonrewarded cue. To determine whether the different contributions of the NAc core and shell depend on the information encoded in their constituent neurons, we performed electrophysiological recording in rats performing the DS task. Although there was no firing pattern unique to either core or shell, the reward-predictive cue elicited more frequent and larger magnitude responses in the NAc core than in the shell. Conversely, more NAc shell neurons selectively responded to the nonrewarded stimulus. These quantitative differences might account for the different behavioral patterns that require either core or shell. Neurons with similar firing patterns could also have different effects on behavior due to their distinct projection targets.