2011年12月28日水曜日

2011年の仕事納め


なんとか今年も仕事納まりました。
お世話になった皆様、ありがとうございました。
今年は学会などあまり出歩かず、論文執筆にほぼ専念した一年でした。
結局、論文のリバイズは来年に持ち越してしまいましたが、目標はあくまでアクセプトなので気にしないことにします。

また、今年は三年ぶり(!)に論文を出版できました。
2008年に現職場に来て分野が変わったこともあり苦戦していましたが、これで少しホッとしました。
あとは2008年から取り組んでいるメイン・プロジェクトの論文がアクセプトされれば、「現職での四年間は悪くなかったと思えるのかな」という感じです。

Shinsuke Suzuki*, Kazuhisa Niki, Syoken Fujisaki and Eizo Akiyama, “Neural basis of conditional cooperation”, Social Cognitive and Affective Neuroscience, Vol. 6, No. 3, pp. 338-347, 2011.
産業技術総合研究所の仁木和久さんとの共同研究です。
fMRI実験で「関節互恵的協力行動の神経基盤」の解明を目指しました。
大学院時代の最後のプロジェクトで神経科学に参入するきっかけになったので、思いで深い研究です。出版まで時間がかかりました(かなり前にアクセプトはされてましたが…)。

Shinsuke Suzuki* and Hiromichi Kimura, "Oscillatory dynamics in the coevolution of cooperation and mobility", Journal of Theoretical Biology, Vol. 287, No. 1, pp. 42-47, 2011.
大学院時代の同級生の木村さん(とめ研究所)との共同研究です。
生物の移動能力と協力行動が共進化する可能性をコンピュータ・シミュレーションで示しました。

それでは。
よいお年を!

2011年12月14日水曜日

Attention for Learning Signals in Anterior Cingulate Cortex


Daniel W. Bryden, Emily E. Johnson, Steven C. Tobia, Vadim Kashtelyan, and Matthew R. Roesch
J. Neurosci. 2011;31 18266-18274

Learning theory suggests that animals attend to pertinent environmental cues when reward contingencies unexpectedly change so that learning can occur. We have previously shown that activity in basolateral nucleus of amygdala (ABL) responds to unexpected changes in reward value, consistent with unsigned prediction error signals theorized by Pearce and Hall. However, changes in activity were present only at the time of unexpected reward delivery, not during the time when the animal needed to attend to conditioned stimuli that would come to predict the reward. This suggested that a different brain area must be signaling the need for attention necessary for learning. One likely candidate to fulfill this role is the anterior cingulate cortex (ACC). To test this hypothesis, we recorded from single neurons in ACC as rats performed the same behavioral task that we have used to dissociate signed from unsigned prediction errors in dopamine and ABL neurons. In this task, rats chose between two fluid wells that produced varying magnitudes of and delays to reward. Consistent with previous work, we found that ACC detected errors of commission and reward prediction errors. We also found that activity during cue sampling encoded reward size, but not expected delay to reward. Finally, activity in ACC was elevated during trials in which attention was increased following unexpected upshifts and downshifts in value. We conclude that ACC not only signals errors in reward prediction, as previously reported, but also signals the need for enhanced neural resources during learning on trials subsequent to those errors.

2011年12月7日水曜日

Encoding of Both Positive and Negative Reward Prediction Errors by Neurons of the Primate Lateral Prefrontal Cortex and Caudate Nucleus


Wael F. Asaad and Emad N. Eskandar
The Journal of Neuroscience, 7 December 2011, 31(49): 17772-17787; doi: 10.1523/​JNEUROSCI.3793-11.2011

Learning can be motivated by unanticipated success or unexpected failure. The former encourages us to repeat an action or activity, whereas the latter leads us to find an alternative strategy. Understanding the neural representation of these unexpected events is therefore critical to elucidate learning-related circuits. We examined the activity of neurons in the lateral prefrontal cortex (PFC) and caudate nucleus of monkeys as they performed a trial-and-error learning task. Unexpected outcomes were widely represented in both structures, and neurons driven by unexpectedly negative outcomes were as frequent as those activated by unexpectedly positive outcomes. Moreover, both positive and negative reward prediction errors (RPEs) were represented primarily by increases in firing rate, unlike the manner in which dopamine neurons have been observed to reflect these values. Interestingly, positive RPEs tended to appear with shorter latency than negative RPEs, perhaps reflecting the mechanism of their generation. Last, in the PFC but not the caudate, trial-by-trial variations in outcome-related activity were linked to the animals' subsequent behavioral decisions. More broadly, the robustness of RPE signaling by these neurons suggests that actor-critic models of reinforcement learning in which the PFC and particularly the caudate are considered primarily to be “actors” rather than “critics,” should be reconsidered to include a prominent evaluative role for these structures.

Attentional Enhancement via Selection and Pooling of Early Sensory Responses in Human Visual Cortex


F. Pestilli, M. Carrasco, D.J. Heeger, and J.L. Gardner
Neuron, Volume 72, Issue 5, 832-846, 8 December 2011
10.1016/j.neuron.2011.09.025

お世話になってるJustinの論文。
Neuronから!

The computational processes by which attention improves behavioral performance were characterized by measuring visual cortical activity with functional magnetic resonance imaging as humans performed a contrast-discrimination task with focal and distributed attention. Focal attention yielded robust improvements in behavioral performance accompanied by increases in cortical responses. Quantitative analysis revealed that if performance were limited only by the sensitivity of the measured sensory signals, the improvements in behavioral performance would have corresponded to an unrealistically large reduction in response variability. Instead, behavioral performance was well characterized by a pooling and selection process for which the largest sensory responses, those most strongly modulated by attention, dominated the perceptual decision. This characterization predicts that high-contrast distracters that evoke large responses should negatively impact behavioral performance. We tested and confirmed this prediction. We conclude that attention enhanced behavioral performance predominantly by enabling efficient selection of the behaviorally relevant sensory signals.

2011年12月6日火曜日

Critical Contributions of the Orbitofrontal Cortex to Behavior


Annals of the New York Academy of Sciences
December 2011, Volume 1239, Pages 1–163

Orbitofrontal Cortex (OFC)の特集。
全部の論文がおもしろそう。

Balkanizing the primate orbitofrontal cortex: distinct subregions for comparing and contrasting values (pages 1–13)
Peter H. Rudebeck and Elisabeth A. Murray

Giving credit where credit is due: orbitofrontal cortex and valuation in an uncertain world (pages 14–24)
Mark E. Walton, Timothy E.J. Behrens, MaryAnn P. Noonan and Matthew F.S. Rushworth

The orbitofrontal cortex and response selection (pages 25–32)
James J. Young and Matthew L. Shapiro

Contrasting reward signals in the orbitofrontal cortex and anterior cingulate cortex (pages 33–42)
Jonathan D. Wallis and Steven W. Kennerley

The orbitofrontal cortex, predicted value, and choice (pages 43–50)
Bernard W. Balleine, Beatrice K. Leung and Sean B. Ostlund

Orbitofrontal contributions to value-based decision making: evidence from humans with frontal lobe damage (pages 51–58)
Lesley K. Fellows

Representations of appetitive and aversive information in the primate orbitofrontal cortex (pages 59–70)
Sara E. Morrison and C. Daniel Salzman

Behavioral outcomes of late-onset or early-onset orbital frontal cortex (areas 11/13) lesions in rhesus monkeys (pages 71–86)
Jocelyne Bachevalier, Christopher J. Machado and Andy Kazama

Does the orbitofrontal cortex signal value? (pages 87–99)
Geoffrey Schoenbaum, Yuji Takahashi, Tzu-Lan Liu and Michael A. McDannald

The prefrontal cortex and hybrid learning during iterative competitive games (pages 100–108)
Hiroshi Abe, Hyojung Seo and Daeyeol Lee

Neuronal signals for reward risk in frontal cortex (pages 109–117)
Wolfram Schultz, Martin O’Neill, Philippe N. Tobler and Shunsuke Kobayashi

Contributions of the ventromedial prefrontal cortex to goal-directed action selection (pages 118–129)
John P. O’Doherty

The orbitofrontal cortex and the computation of subjective value: consolidated concepts and new perspectives (pages 130–137)
Camillo Padoa-Schioppa and Xinying Cai

The value of identity: olfactory notes on orbitofrontal cortex function (pages 138–148)
Jay A. Gottfried and Christina Zelano

Population coding and neural rhythmicity in the orbitofrontal cortex (pages 149–161)
Cyriel M.A. Pennartz, Marijn van Wingerden and Martin Vinck

Dopamine neurons code subjective sensory experience and uncertainty of perceptual decisions


Victor de Lafuente and Ranulfo Romo
PNAS December 6, 2011 vol. 108 no. 49 19767-19771

Midbrain dopamine (DA) neurons respond to sensory stimuli associated with future rewards. When reward is delivered probabilistically, DA neurons reflect this uncertainty by increasing their firing rates in a period between the sensory cue and reward delivery time. Probability of reward, however, has been externally conveyed by visual cues, and it is not known whether DA neurons would signal uncertainty arising internally. Here we show that DA neurons code the uncertainty associated with a perceptual judgment about the presence or absence of a vibrotactile stimulus. We observed that uncertainty modulates the activity elicited by a go cue instructing monkey subjects to communicate their decisions. That is, the same go cue generates different DA responses depending on the uncertainty level of a judgment made a few seconds before the go instruction. Easily detected suprathreshold stimuli elicit small DA responses, indicating that future reward will not be a surprising event. In contrast, the absence of a sensory stimulus generates large DA responses associated with uncertainty: was the stimulus truly absent, or did a low-amplitude vibration go undetected? In addition, the responses of DA neurons to the stimulus itself increase with vibration amplitude, but only when monkeys correctly detect its presence. This finding suggests that DA activity is not related to actual intensity but rather to perceived intensity. Therefore, in addition to their well-known role in reward prediction, DA neurons code subjective sensory experience and uncertainty arising internally from perceptual decisions.

Equitable decision making is associated with neural markers of intrinsic value


Jamil Zaki and Jason P. Mitchell
PNAS December 6, 2011 vol. 108 no. 49 19761-19766

Standard economic and evolutionary models assume that humans are fundamentally selfish. On this view, any acts of prosociality—such as cooperation, giving, and other forms of altruism—result from covert attempts to avoid social injunctions against selfishness. However, even in the absence of social pressure, individuals routinely forego personal gain to share resources with others. Such anomalous giving cannot be accounted for by standard models of social behavior. Recent observations have suggested that, instead, prosocial behavior may reflect an intrinsic value placed on social ideals such as equity and charity. Here, we show that, consistent with this alternative account, making equitable interpersonal decisions engaged neural structures involved in computing subjective value, even when doing so required foregoing material resources. By contrast, making inequitable decisions produced activity in the anterior insula, a region linked to the experience of subjective disutility. Moreover, inequity-related insula response predicted individuals’ unwillingness to make inequitable choices. Together, these data suggest that prosocial behavior is not simply a response to external pressure, but instead represents an intrinsic, and intrinsically social, class of reward.

2011年11月30日水曜日

Parietal Cortex and Insula Relate to Evidence Seeking Relevant to Reward-Related Decisions

Nicholas Furl and Bruno B. Averbeck
J. Neurosci. 2011;31 17572-17582 Open Access
http://www.jneurosci.org/cgi/content/abstract/31/48/17572?etoc

コストを支払って情報収集を続けるのか?さっさと決断するのか?人間は(ベイズ・モデルで計算された最適な行動に比べて)「さっさと決断してしまう」傾向にあり、Parietal CortexとInsulaが大きな役割を果たしている。 http://bit.ly/s7SHPK

Decisions are most effective after collecting sufficient evidence to accurately predict rewarding outcomes. We investigated whether human participants optimally seek evidence and we characterized the brain areas associated with their evidence seeking. Participants viewed sequences of bead colors drawn from hidden urns and attempted to infer the majority bead color in each urn. When viewing each bead color, participants chose either to seek more evidence about the urn by drawing another bead (draw choices) or to infer the urn contents (urn choices). We then compared their evidence seeking against that predicted by a Bayesian ideal observer model. By this standard, participants sampled less evidence than optimal. Also, when faced with urns that had bead color splits closer to chance (60/40 versus 80/20) or potential monetary losses, participants increased their evidence seeking, but they showed less increase than predicted by the ideal observer model. Functional magnetic resonance imaging showed that urn choices evoked larger hemodynamic responses than draw choices in the insula, striatum, anterior cingulate, and parietal cortex. These parietal responses were greater for participants who sought more evidence on average and for participants who increased more their evidence seeking when draws came from 60/40 urns. The parietal cortex and insula were associated with potential monetary loss. Insula responses also showed modulation with estimates of the expected gains of urn choices. Our findings show that participants sought less evidence than predicted by an ideal observer model and their evidence-seeking behavior may relate to responses in the insula and parietal cortex.

Dopaminergic Modulation of Risky Decision-Making

Nicholas W. Simon, Karienn S. Montgomery, Blanca S. Beas, Marci R.
Mitchell, Candi L. LaSarge, Ian A. Mendez, Cristina Banuelos, Colin M.
Vokes, Aaron B. Taylor, Rebecca P. Haberman, Jennifer L. Bizon, and Barry
Setlow
J. Neurosci. 2011;31 17460-17470
http://www.jneurosci.org/cgi/content/abstract/31/48/17460?etoc

Many psychiatric disorders are characterized by abnormal risky decision-making and dysregulated dopamine receptor expression. The current study was designed to determine how different dopamine receptor subtypes modulate risk-taking in young adult rats, using a “Risky Decision-making Task” that involves choices between small “safe” rewards and large “risky” rewards accompanied by adverse consequences. Rats showed considerable, stable individual differences in risk preference in the task, which were not related to multiple measures of reward motivation, anxiety, or pain sensitivity. Systemic activation of D2-like receptors robustly attenuated risk-taking, whereas drugs acting on D1-like receptors had no effect. Systemic amphetamine also reduced risk-taking, an effect which was attenuated by D2-like (but not D1-like) receptor blockade. Dopamine receptor mRNA expression was evaluated in a separate cohort of drug-naive rats characterized in the task. D1 mRNA expression in both nucleus accumbens shell and insular cortex was positively associated with risk-taking, while D2 mRNA expression in orbitofrontal and medial prefrontal cortex predicted risk preference in opposing nonlinear patterns. Additionally, lower levels of D2 mRNA in dorsal striatum were associated with greater risk-taking. These data strongly implicate dopamine signaling in prefrontal cortical-striatal circuitry in modulating decision-making processes involving integration of reward information with risks of adverse consequences.

2011年11月23日水曜日

Ubiquity and Specificity of Reinforcement Signals throughout the Human Brain

Vickery TJ, Chun MM, and Lee D
Neuron, Volume 72, Issue 1, 166-177, 6 October 2011

コイン合わせゲーム中の脳活動をfMRIで計測、サポート・ベクター・マシンを用いて解析。メインの結論は「前帯状皮質の活動から次の行動が予測できる!」ではなく、「報酬・罰は(線条体や眼窩前頭野だけでなく)脳全体でコードされている」ということ。 http://1.usa.gov/t0FOfn

Reinforcements and punishments facilitate adaptive behavior in diverse domains ranging from perception to social interactions. A conventional approach to understanding the corresponding neural substrates focuses on the basal ganglia and its dopaminergic projections. Here, we show that reinforcement and punishment signals are surprisingly ubiquitous in the gray matter of nearly every subdivision of the human brain. Humans played either matching-pennies or rock-paper-scissors games against computerized opponents while being scanned using fMRI. Multivoxel pattern analysis was used to decode previous choices and their outcomes, and to predict upcoming choices. Whereas choices were decodable from a confined set of brain structures, their outcomes were decodable from nearly all cortical and subcortical structures. In addition, signals related to both reinforcements and punishments were recovered reliably in many areas and displayed patterns not consistent with salience-based explanations. Thus, reinforcement and punishment might play global modulatory roles in the entire brain.

Double dissociation of value computations in orbitofrontal and anterior cingulate neurons

Steven W Kennerley, Timothy E J Behrens and Jonathan D Wallis
Nature Neuroscience 14, 1581–1589 (2011)

This study reports a double dissociation in the neuronal correlates of value-based decision making in monkey prefrontal cortex, with orbitofrontal cortex neurons encoding choice value relative to recent choice values, while anterior cingulate cortex neurons flexibly encode multiple decision parameters and reward prediction errors using a 'common valuation currency'.

Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex

Yuji K Takahashi, Matthew R Roesch, Robert C Wilson, Kathy Toreson, Patricio O'Donnell, Yael Niv and Geoffrey Schoenbaum
Nature Neuroscience 14, 1590–1597 (2011)

The orbitofrontal cortex (OFC) has been hypothesized to carry information regarding the value of expected rewards. Such information could be used for generating instructive error signals conveyed by dopamine neurons. Here the authors report that this is indeed the case. However, contrary to the simplest hypothesis, OFC lesions did not result in the loss of all value information. Instead, lesions caused the loss of value information derived from model-based representations.

It's a pleasure: a tale of two cortical areas

Daeyeol Lee
Nature Neuroscience 14, 1491–1492 (2011) doi:10.1038/nn.2981

Reward signals are widespread in the brain, but why? A study now identifies an important difference in the reward signals encoded by the neurons in the primate anterior cingulate and orbitofrontal cortices during decision making, suggesting that reward-related activity in these areas is shaped by different contextual factors.

2011年11月17日木曜日

Dissociable Reward and Timing Signals in Human Midbrain and Ventral Striatum

M.C. Klein-Flügge, L.T. Hunt, D.R. Bach, R.J. Dolan, and T.E.J. Behrens
Neuron, Volume 72, Issue 4, 654-664, 17 November 2011
http://www.cell.com/neuron/fulltext/S0896-6273(11)00779-3

Reward prediction error (RPE) signals are central to current models of reward-learning. Temporal difference (TD) learning models posit that these signals should be modulated by predictions, not only of magnitude but also timing of reward. Here we show that BOLD activity in the VTA conforms to such TD predictions: responses to unexpected rewards are modulated by a temporal hazard function and activity between a predictive stimulus and reward is depressed in proportion to predicted reward. By contrast, BOLD activity in ventral striatum (VS) does not reflect a TD RPE, but instead encodes a signal on the variable relevant for behavior, here timing but not magnitude of reward. The results have important implications for dopaminergic models of cortico-striatal learning and suggest a modification of the conventional view that VS BOLD necessarily reflects inputs from dopaminergic VTA neurons signaling an RPE.

2011年11月11日金曜日

Hedging your bets by learning reward correlations in the human brain.

Wunderlich K, Symmonds M, Bossaerts P, Dolan RJ.
Neuron. 2011 Sep 22;71(6):1141-52. Epub 2011 Sep 21.

適切なポートフォリオ選択をするには「各資産(変動)の相関」を考慮する必要がある(資産AとBの価値は連動する、とか)。人間は相関を試行錯誤によって学習でき、学習された相関はInsula、学習信号(相関予測誤差)はACCにコードされている。

Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling.

Dorsolateral and ventromedial prefrontal cortex orchestrate normative choice

Thomas Baumgartner, Daria Knoch, Philine Hotz, Christoph Eisenegger & Ernst Fehr
Nature Neuroscience 14, 1468–1474 (2011)

最後通牒ゲームで不公平提案を拒否するためには「dlPFC(右)→vmPFC」の機能的結合が重要。fMRIとTMSを併用して因果関係にまで踏み込んでるのがポイント。

Humans are noted for their capacity to over-ride self-interest in favor of normatively valued goals. We examined the neural circuitry that is causally involved in normative, fairness-related decisions by generating a temporarily diminished capacity for costly normative behavior, a 'deviant' case, through non-invasive brain stimulation (repetitive transcranial magnetic stimulation) and compared normal subjects' functional magnetic resonance imaging signals with those of the deviant subjects. When fairness and economic self-interest were in conflict, normal subjects (who make costly normative decisions at a much higher frequency) displayed significantly higher activity in, and connectivity between, the right dorsolateral prefrontal cortex (DLPFC) and the posterior ventromedial prefrontal cortex (pVMPFC). In contrast, when there was no conflict between fairness and economic self-interest, both types of subjects displayed identical neural patterns and behaved identically. These findings suggest that a parsimonious prefrontal network, the activation of right DLPFC and pVMPFC, and the connectivity between them, facilitates subjects' willingness to incur the cost of normative decisions.

2011年9月15日木曜日

Differential roles of human striatum and amygdala in associative learning

Jian Li, Daniela Schiller, Geoffrey Schoenbaum, Elizabeth A Phelps & Nathaniel D Daw
Nature Neuroscience (2011)

Although the human amygdala and striatum have both been implicated in associative learning, only the striatum's contribution has been consistently computationally characterized. Using a reversal learning task, we found that amygdala blood oxygen level–dependent activity tracked associability as estimated by a computational model, and dissociated it from the striatal representation of reinforcement prediction error. These results extend the computational learning approach from striatum to amygdala, demonstrating their complementary roles in aversive learning.

The evolution of overconfidence

Dominic D. P. Johnson and James H. Fowler
Nature 477, 317–320 (15 September 2011)

Confidence is an essential ingredient of success in a wide range of domains ranging from job performance and mental health to sports, business and combat1, 2, 3, 4. Some authors have suggested that not just confidence but overconfidence—believing you are better than you are in reality—is advantageous because it serves to increase ambition, morale, resolve, persistence or the credibility of bluffing, generating a self-fulfilling prophecy in which exaggerated confidence actually increases the probability of success3, 4, 5, 6, 7, 8. However, overconfidence also leads to faulty assessments, unrealistic expectations and hazardous decisions, so it remains a puzzle how such a false belief could evolve or remain stable in a population of competing strategies that include accurate, unbiased beliefs. Here we present an evolutionary model showing that, counterintuitively, overconfidence maximizes individual fitness and populations tend to become overconfident, as long as benefits from contested resources are sufficiently large compared with the cost of competition. In contrast, unbiased strategies are only stable under limited conditions. The fact that overconfident populations are evolutionarily stable in a wide range of environments may help to explain why overconfidence remains prevalent today, even if it contributes to hubris, market bubbles, financial collapses, policy failures, disasters and costly wars9, 10, 11, 12, 13.

2011年9月14日水曜日

Behavioral and Neural Properties of Social Reinforcement Learning

Rebecca M. Jones, Leah H. Somerville, Jian Li, Erika J. Ruberry, Victoria Libby, Gary Glover, Henning U. Voss, Douglas J. Ballon, and B. J. Casey
J. Neurosci. 2011;31 13039-13045
http://www.jneurosci.org/cgi/content/abstract/31/37/13039?etoc

Social learning is critical for engaging in complex interactions with other individuals. Learning from positive social exchanges, such as acceptance from peers, may be similar to basic reinforcement learning. We formally test this hypothesis by developing a novel paradigm that is based on work in nonhuman primates and human imaging studies of reinforcement learning. The probability of receiving positive social reinforcement from three distinct peers was parametrically manipulated while brain activity was recorded in healthy adults using event-related functional magnetic resonance imaging. Over the course of the experiment, participants responded more quickly to faces of peers who provided more frequent positive social reinforcement, and rated them as more likeable. Modeling trial-by-trial learning showed ventral striatum and orbital frontal cortex activity correlated positively with forming expectations about receiving social reinforcement. Rostral anterior cingulate cortex activity tracked positively with modulations of expected value of the cues (peers). Together, the findings across three levels of analysis—social preferences, response latencies, and modeling neural responses—are consistent with reinforcement learning theory and nonhuman primate electrophysiological studies of reward. This work highlights the fundamental influence of acceptance by one's peers in altering subsequent behavior.

The Decision Value Computations in the vmPFC and Striatum Use a Relative Value Code That is Guided by Visual Attention

Seung-Lark Lim, John P. O'Doherty, and Antonio Rangel
J. Neurosci. 2011;31 13214-13223
http://www.jneurosci.org/cgi/content/abstract/31/37/13214?etoc

There is a growing consensus in behavioral neuroscience that the brain makes simple choices by first assigning a value to the options under consideration and then comparing them. Two important open questions are whether the brain encodes absolute or relative value signals, and what role attention might play in these computations. We investigated these questions using a human fMRI experiment with a binary choice task in which the fixations to both stimuli were exogenously manipulated to control for the role of visual attention in the valuation computation. We found that the ventromedial prefrontal cortex and the ventral striatum encoded fixation-dependent relative value signals: activity in these areas correlated with the difference in value between the attended and the unattended items. These attention-modulated relative value signals might serve as the input of a comparator system that is used to make a choice.

Feedback Timing Modulates Brain Systems for Learning in Humans

Karin Foerde and Daphna Shohamy
J. Neurosci. 2011;31 13157-13167
http://www.jneurosci.org/cgi/content/abstract/31/37/13157?etoc

The ability to learn from the consequences of actions—no matter when those consequences take place—is central to adaptive behavior. Despite major advances in understanding how immediate feedback drives learning, it remains unknown precisely how the brain learns from delayed feedback. Here, we present converging evidence from neuropsychology and neuroimaging for distinct roles for the striatum and the hippocampus in learning, depending on whether feedback is immediate or delayed. We show that individuals with striatal dysfunction due to Parkinson's disease are impaired at learning when feedback is immediate, but not when feedback is delayed by a few seconds. Using functional imaging (fMRI) combined with computational model-derived analyses, we further demonstrate that healthy individuals show activation in the striatum during learning from immediate feedback and activation in the hippocampus during learning from delayed feedback. Additionally, later episodic memory for delayed feedback events was enhanced, suggesting that engaging distinct neural systems during learning had consequences for the representation of what was learned. Together, these findings provide direct evidence from humans that striatal systems are necessary for learning from immediate feedback and that delaying feedback leads to a shift in learning from the striatum to the hippocampus. The results provide a link between learning impairments in Parkinson's disease and evidence from single-unit recordings demonstrating that the timing of reinforcement modulates activity of midbrain dopamine neurons. Collectively, these findings indicate that relatively small changes in the circumstances under which information is learned can shift learning from one brain system to another.

2011年9月9日金曜日

The Neural and Cognitive Time Course of Theory of Mind

Joseph P. McCleery, Andrew D. R. Surtees, Katharine A. Graham, John E.
Richards, and Ian A. Apperly
J. Neurosci. 2011;31 12849-12854
http://www.jneurosci.org/cgi/content/abstract/31/36/12849?etoc

Neuroimaging and neuropsychological studies implicate both frontal and temporoparietal cortices when humans reason about the mental states of others. Here, we report an event-related potentials study of the time course of one such “theory of mind” ability: visual perspective taking. The findings suggest that posterior cortex, perhaps the temporoparietal cortex, calculates and represents the perspective of self versus other, and then, later, the right frontal cortex resolves conflict between perspectives during response selection.

Contextual Novelty Modulates the Neural Dynamics of Reward Anticipation

Nico Bunzeck, Marc Guitart-Masip, Ray J. Dolan, and Emrah Duzel
J. Neurosci. 2011;31 12816-12822
http://www.jneurosci.org/cgi/content/abstract/31/36/12816?etoc

We investigated how rapidly the reward-predicting properties of visual cues are signaled in the human brain and the extent these reward prediction signals are contextually modifiable. In a magnetoencephalography study, we presented participants with fractal visual cues that predicted monetary rewards with different probabilities. These cues were presented in the temporal context of a preceding novel or familiar image of a natural scene. Starting at ∼100 ms after cue onset, reward probability was signaled in the event-related fields (ERFs) over temporo-occipital sensors and in the power of theta (5–8 Hz) and beta (20–30 Hz) band oscillations over frontal sensors. While theta decreased with reward probability beta power showed the opposite effect. Thus, in humans anticipatory reward responses are generated rapidly, within 100 ms after the onset of reward-predicting cues, which is similar to the timing established in non-human primates. Contextual novelty enhanced the reward anticipation responses in both ERFs and in beta oscillations starting at ∼100 ms after cue onset. This very early context effect is compatible with a physiological model that invokes the mediation of a hippocampal-VTA loop according to which novelty modulates neural response properties within the reward circuitry. We conclude that the neural processing of cues that predict future rewards is temporally highly efficient and contextually modifiable.

2011年8月31日水曜日

Perceptual classification in a rapidly changing environment.


Summerfield C, Behrens TE, Koechlin E.
Neuron. 2011 Aug 25;71(4):725-36.

大「脳」洋航海記より。
http://viking-neurosci.sakura.ne.jp/blog-wp/?p=6134

Humans and monkeys can learn to classify perceptual information in a statistically optimal fashion if the functional groupings remain stable over many hundreds of trials, but little is known about categorization when the environment changes rapidly. Here, we used a combination of computational modeling and functional neuroimaging to understand how humans classify visual stimuli drawn from categories whose mean and variance jumped unpredictably. Models based on optimal learning (Bayesian model) and a cognitive strategy (working memory model) both explained unique variance in choice, reaction time, and brain activity. However, the working memory model was the best predictor of performance in volatile environments, whereas statistically optimal performance emerged in periods of relative stability. Bayesian and working memory models predicted decision-related activity in distinct regions of the prefrontal cortex and midbrain. These findings suggest that perceptual category judgments, like value-guided choices, may be guided by multiple controllers.

2011年8月24日水曜日

Anatomical Evidence for the Involvement of the Macaque Ventrolateral Prefrontal Area 12r in Controlling Goal-Directed Actions


Elena Borra, Marzio Gerbella, Stefano Rozzi, and Giuseppe Luppino
J. Neurosci. 2011;31 12351-12363
http://www.jneurosci.org/cgi/content/abstract/31/34/12351?etoc

The macaque ventrolateral prefrontal (VLPF) area 12r is thought to be involved in higher-order nonspatial information processing. We found that this area is connectionally heterogeneous, and the intermediate part is fully integrated in a cortical network involved in selecting and controlling object-oriented hand and mouth actions. Specifically, intermediate area 12r displayed dense connections with the caudal half of area 46v and orbitofrontal areas and relatively strong extraprefrontal connections involving the following: (1) the hand- and mouth-related ventral premotor area F5 and the anterior intraparietal (AIP) area, jointly involved in visuomotor transformations for grasping; (2) the SII sector that is connected to AIP and F5; (3) a sector of the inferotemporal area TEa/m, primarily corresponding to the sector densely connected to AIP; and (4) the insular and opercular frontal sectors, which are connected to AIP and F5. This connectivity pattern differed markedly from those of the caudal and rostral parts of area 12r. Caudal area 12r displayed dense connections with the caudal part of the VLPF, including oculomotor areas 8/FEF and 45B, relatively weak orbitofrontal connections and extraprefrontal connections limited to the inferotemporal cortex. Rostral area 12r displayed connections mostly with rostral prefrontal and orbitofrontal areas and relatively weaker connections with the fundus and the upper bank of the superior temporal sulcus. The present data suggest that the intermediate part of area 12r is involved in nonspatial information processing related to object properties and identity, for selecting and controlling goal-directed hand and mouth actions.

2011年8月21日日曜日

The Control of Mimicry by Eye Contact Is Mediated by Medial Prefrontal Cortex


Yin Wang, Richard Ramsey, and Antonia F. de C. Hamilton
The Journal of Neuroscience, 17 August 2011, 31(33): 12001-12010.

Twitterで1hc0mさんが紹介してた。
http://twitter.com/#!/1hc0m/status/105468633362341888

Spontaneous mimicry of other people's actions serves an important social function, enhancing affiliation and social interaction. This mimicry can be subtly modulated by different social contexts. We recently found behavioral evidence that direct eye gaze rapidly and specifically enhances mimicry of intransitive hand movements (Wang et al., 2011). Based on past findings linking medial prefrontal cortex (mPFC) to both eye contact and the control of mimicry, we hypothesized that mPFC might be the neural origin of this behavioral effect. The present study aimed to test this hypothesis. During functional magnetic resonance imaging (fMRI) scanning, 20 human participants performed a simple mimicry or no-mimicry task, as previously described (Wang et al., 2011), with direct gaze present on half of the trials. As predicted, fMRI results showed that performing the task activated mirror systems, while direct gaze and inhibition of the natural tendency to mimic both engaged mPFC. Critically, we found an interaction between mimicry and eye contact in mPFC, superior temporal sulcus (STS) and inferior frontal gyrus. We then used dynamic causal modeling to contrast 12 possible models of information processing in this network. Results supported a model in which eye contact controls mimicry by modulating the connection strength from mPFC to STS. This suggests that mPFC is the originator of the gaze–mimicry interaction and that it modulates sensory input to the mirror system. Thus, our results demonstrate how different components of the social brain work together to on-line control mimicry according to the social context.

2011年8月18日木曜日

Associative Learning Increases Trial-by-Trial Similarity of BOLD-MRI Patterns


Renee M. Visser, H. Steven Scholte, and Merel Kindt
J. Neurosci. 2011;31 12021-12028 Open Access
http://www.jneurosci.org/cgi/content/abstract/31/33/12021?etoc

Associative learning is a dynamic process that allows us to incorporate new knowledge within existing semantic networks. Even after years, a seemingly stable association can be altered by a single significant experience. Here, we investigate whether the acquisition of new associations affects the neural representation of stimuli and how the brain categorizes stimuli according to preexisting and emerging associations. Functional MRI data were collected during a differential fear conditioning procedure and at test (4–5 weeks later). Two pictures of faces and two pictures of houses served as stimuli. One of each pair coterminated with a shock in half of the trials (partial reinforcement). Applying Multivoxel Pattern Analysis (MVPA) in a trial-by-trial manner, we quantified changes in the similarity of neural representations of stimuli over the course of conditioning. Our findings show an increase in similarity of neural patterns throughout the cortex on consecutive trials of the reinforced stimuli. Furthermore, neural pattern similarity reveals a shift from original categories (faces/houses) toward new categories (reinforced/unreinforced) over the course of conditioning. This effect was differentially represented in the cortex, with visual areas primarily reflecting similarity of low-level stimulus properties (original categories) and frontal areas reflecting similarity of stimulus significance (new categories). Effects were not dependent on overall response amplitude and were still present during follow-up. We conclude that trial-by-trial MVPA is a useful tool for examining how the human brain encodes relevant associations and forms new associative networks.

2011年8月10日水曜日

Negative Reward Signals from the Lateral Habenula to Dopamine Neurons Are Mediated by Rostromedial Tegmental Nucleus in Primates


Simon Hong, Thomas C. Jhou, Mitchell Smith, Kadharbatcha S. Saleem, and Okihide Hikosaka
The Journal of Neuroscience, 10 August 2011, 31(32):11457-11471.

Lateral habenula (LHb) neurons signal negative “reward-prediction errors” and inhibit midbrain dopamine (DA) neurons. Yet LHb neurons are largely glutamatergic, indicating that this inhibition may occur through an intermediate structure. Recent studies in rats have suggested a candidate for this role, the GABAergic rostromedial tegmental nucleus (RMTg), but this neural pathway has not yet been tested directly. We now show using electrophysiology and anatomic tracing that (1) the monkey has an inhibitory structure similar to the rat RMTg; (2) RMTg neurons receive excitatory input from the LHb, exhibit negative reward-prediction errors, and send axonal projections near DA soma; and (3) stimulating this structure inhibits DA neurons. Surprisingly, some RMTg neurons responded to reward cues earlier than the LHb, and carry “state-value” signals not found in DA neurons. Thus, our data suggest that the RMTg translates LHb reward-prediction errors (negative) into DA reward-prediction errors (positive), while transmitting additional motivational signals to non-DA networks.

2011年7月27日水曜日

A Neural Signature of Hierarchical Reinforcement Learning

J.J.F. Ribas-Fernandes, A. Solway, C. Diuk, J.T. McGuire, A.G. Barto, Y. Niv, and M.M. Botvinick
Neuron, Volume 71, Issue 2, 370-379, 28 July 2011

Human behavior displays hierarchical structure: simple actions cohere into subtask sequences, which work together to accomplish overall task goals. Although the neural substrates of such hierarchy have been the target of increasing research, they remain poorly understood. We propose that the computations supporting hierarchical behavior may relate to those in hierarchical reinforcement learning (HRL), a machine-learning framework that extends reinforcement-learning mechanisms into hierarchical domains. To test this, we leveraged a distinctive prediction arising from HRL. In ordinary reinforcement learning, reward prediction errors are computed when there is an unanticipated change in the prospects for accomplishing overall task goals. HRL entails that prediction errors should also occur in relation to task subgoals. In three neuroimaging studies we observed neural responses consistent with such subgoal-related reward prediction errors, within structures previously implicated in reinforcement learning. The results reported support the relevance of HRL to the neural processes underlying hierarchical behavior.

2011年7月22日金曜日

論文アクセプト!

木村博道さん(とめ研究所)との共著論文:
Shinsuke Suzuki* and Hiromichi Kimura, "Oscillatory dynamics in the coevolution of cooperation and mobility"
がJournal of Theoretical Biology誌にアクセプトされました。

「5月4日に投稿→7月7日に査読結果返送(minor revision)→7月21日に改訂版を再投稿→7月22日にアクセプト」とすこぶる順調に行きました。
うまくいくときはこんなもんですね。

2011年7月21日木曜日

Excitatory transmission from the amygdala to nucleus accumbens facilitates reward seeking

Garret D. Stuber, Dennis R. Sparta, Alice M. Stamatakis, Wieke A. van Leeuwen, Juanita E. Hardjoprajitno, Saemi Cho, Kay M. Tye, Kimberly A. Kempadoo, Feng Zhang, Karl Deisseroth & Antonello Bonci
Nature 475, 377?380 (21 July 2011)

The basolateral amygdala (BLA) has a crucial role in emotional learning irrespective of valence1, 2, 3, 4, 5, 21, 22, 23. The BLA projection to the nucleus accumbens (NAc) is thought to modulate cue-triggered motivated behaviours4, 6, 7, 24, 25, but our understanding of the interaction between these two brain regions has been limited by the inability to manipulate neural-circuit elements of this pathway selectively during behaviour. To circumvent this limitation, we used in vivo optogenetic stimulation or inhibition of glutamatergic fibres from the BLA to the NAc, coupled with intracranial pharmacology and ex vivo electrophysiology. Here we show that optical stimulation of the pathway from the BLA to the NAc in mice reinforces behavioural responding to earn additional optical stimulation of these synaptic inputs. Optical stimulation of these glutamatergic fibres required intra-NAc dopamine D1-type receptor signalling, but not D2-type receptor signalling. Brief optical inhibition of fibres from the BLA to the NAc reduced cue-evoked intake of sucrose, demonstrating an important role of this specific pathway in controlling naturally occurring reward-related behaviour. Moreover, although optical stimulation of glutamatergic fibres from the medial prefrontal cortex to the NAc also elicited reliable excitatory synaptic responses, optical self-stimulation behaviour was not observed by activation of this pathway. These data indicate that whereas the BLA is important for processing both positive and negative affect, the glutamatergic pathway from the BLA to the NAc, in conjunction with dopamine signalling in the NAc, promotes motivated behavioural responding. Thus, optogenetic manipulation of anatomically distinct synaptic inputs to the NAc reveals functionally distinct properties of these inputs in controlling reward-seeking behaviours.

2011年7月20日水曜日

Dissociable Effects of Subtotal Lesions within the Macaque Orbital Prefrontal Cortex on Reward-Guided Behavior

Peter H. Rudebeck and Elisabeth A. Murray
J. Neurosci. 2011;31 10569-10578

The macaque orbital prefrontal cortex (PFo) has been implicated in a wide range of reward-guided behaviors essential for efficient foraging. The PFo, however, is not a homogeneous structure. Two major subregions, distinct by their cytoarchitecture and connections to other brain structures, compose the PFo. One subregion encompasses Walker's areas 11 and 13 and the other centers on Walker's area 14. Although it has been suggested that these subregions play dissociable roles in reward-guided behavior, direct neuropsychological evidence for this hypothesis is limited. To explore the independent contributions of PFo subregions to behavior, we studied rhesus monkeys (Macaca mulatta) with restricted excitotoxic lesions targeting either Walker's areas 11/13 or area 14. The performance of these two groups was compared to that of a group of unoperated controls on a series of reward-based tasks that has been shown to be sensitive to lesions of the PFo as a whole (Walker's areas 11, 13, and 14). Lesions of areas 11/13, but not area 14, disrupted the rapid updating of object value during selective satiation. In contrast, lesions targeting area 14, but not areas 11/13, impaired the ability of monkeys to learn to stop responding to a previously rewarded object. Somewhat surprisingly, neither lesion disrupted performance on a serial object reversal learning task, although aspiration lesions of the entire PFo produce severe deficits on this task. Our data indicate that anatomically defined subregions within macaque PFo make dissociable contributions to reward-guided behavior.

Functional Connectivity of the Striatum Links Motivation to Action Control in Humans

Helga A. Harsay, Michael X. Cohen, Nikolaas N. Oosterhof, Birte U. Forstmann, Rogier B. Mars, and K. Richard Ridderinkhof
J. Neurosci. 2011;31 10701-10711

Motivation improves the efficiency of intentional behavior, but how this performance modulation is instantiated in the human brain remains unclear. We used a reward-cued antisaccade paradigm to investigate how motivational goals (the expectation of a reward for good performance) modulate patterns of neural activation and functional connectivity to improve preparation for antisaccade performance. Behaviorally, subjects performed better (faster and more accurate antisaccades) when they knew they would be rewarded for good performance. Reward anticipation was associated with increased activation in the ventral and dorsal striatum, and cortical oculomotor regions. Functional connectivity between the caudate nucleus and cortical oculomotor control structures predicted individual differences in the behavioral benefit of reward anticipation. We conclude that although both dorsal and ventral striatal circuitry are involved in the anticipation of reward, only the dorsal striatum and its connected cortical network is involved in the direct modulation of oculomotor behavior by motivational incentive.

Reward Value-Based Gain Control: Divisive Normalization in Parietal Cortex

Kenway Louie, Lauren E. Grattan, and Paul W. Glimcher
J. Neurosci. 2011;31 10627-10639

The representation of value is a critical component of decision making. Rational choice theory assumes that options are assigned absolute values, independent of the value or existence of other alternatives. However, context-dependent choice behavior in both animals and humans violates this assumption, suggesting that biological decision processes rely on comparative evaluation. Here we show that neurons in the monkey lateral intraparietal cortex encode a relative form of saccadic value, explicitly dependent on the values of the other available alternatives. Analogous to extra-classical receptive field effects in visual cortex, this relative representation incorporates target values outside the response field and is observed in both stimulus-driven activity and baseline firing rates. This context-dependent modulation is precisely described by divisive normalization, indicating that this standard form of sensory gain control may be a general mechanism of cortical computation. Such normalization in decision circuits effectively implements an adaptive gain control for value coding and provides a possible mechanistic basis for behavioral context-dependent violations of rationality.

2011年7月13日水曜日

Dorsolateral Prefrontal Cortex Drives Mesolimbic Dopaminergic Regions to Initiate Motivated Behavior

Ian C. Ballard, Vishnu P. Murty, R. McKell Carter, Jeffrey J. MacInnes, Scott A. Huettel, and R. Alison Adcock
J. Neurosci. 2011;31 10340-10346

How does the brain translate information signaling potential rewards into motivation to get them? Motivation to obtain reward is thought to depend on the midbrain [particularly the ventral tegmental area (VTA)], the nucleus accumbens (NAcc), and the dorsolateral prefrontal cortex (dlPFC), but it is not clear how the interactions among these regions relate to reward-motivated behavior. To study the influence of motivation on these reward-responsive regions and on their interactions, we used dynamic causal modeling to analyze functional magnetic resonance imaging (fMRI) data from humans performing a simple task designed to isolate reward anticipation. The use of fMRI permitted the simultaneous measurement of multiple brain regions while human participants anticipated and prepared for opportunities to obtain reward, thus allowing characterization of how information about reward changes physiology underlying motivational drive. Furthermore, we modeled the impact of external reward cues on causal relationships within this network, thus elaborating a link between physiology, connectivity, and motivation. Specifically, our results indicated that dlPFC was the exclusive entry point of information about reward in this network, and that anticipated reward availability caused VTA activation only via its effect on the dlPFC. Anticipated reward thus increased dlPFC activation directly, whereas it influenced VTA and NAcc only indirectly, by enhancing intrinsically weak or inactive pathways from the dlPFC. Our findings of a directional prefrontal influence on dopaminergic regions during reward anticipation suggest a model in which the dlPFC integrates and transmits representations of reward to the mesolimbic and mesocortical dopamine systems, thereby initiating motivated behavior.

2011年7月12日火曜日

Neural and computational mechanisms of postponed decisions

Marina Martínez-García, Edmund T. Rolls, Gustavo Deco and Ranulfo Romo
PNAS July 12, 2011 vol. 108 no. 28 11626-11631

後で読む。必ず、たぶん、時間があれば…

We consider the mechanisms that enable decisions to be postponed for a period after the evidence has been provided. Using an information theoretic approach, we show that information about the forthcoming action becomes available from the activity of neurons in the medial premotor cortex in a sequential decision-making task after the second stimulus is applied, providing the information for a decision about whether the first or second stimulus is higher in vibrotactile frequency. The information then decays in a 3-s delay period in which the neuronal activity declines before the behavioral response can be made. The information then increases again when the behavioral response is required. We model this neuronal activity using an attractor decision-making network in which information reflecting the decision is maintained at a low level during the delay period, and is then selectively restored by a nonspecific input when the response is required. One mechanism for the short-term memory is synaptic facilitation, which can implement a mechanism for postponed decisions that can be correct even when there is little neuronal firing during the delay period before the postponed decision. Another mechanism is graded firing rates by different neurons in the delay period, with restoration by the nonspecific input of the low-rate activity from the higher-rate neurons still firing in the delay period. These mechanisms can account for the decision making and for the memory of the decision before a response can be made, which are evident in the activity of neurons in the medial premotor cortex.

Punishment sustains large-scale cooperation in prestate warfare

Sarah Mathew and Robert Boyd
PNAS July 12, 2011 vol. 108 no. 28 11375-11380

2007年にUCLAに滞在してた時にお世話になったRobとSarahの論文がPNASに出てる!

Understanding cooperation and punishment in small-scale societies is crucial for explaining the origins of human cooperation. We studied warfare among the Turkana, a politically uncentralized, egalitarian, nomadic pastoral society in East Africa. Based on a representative sample of 88 recent raids, we show that the Turkana sustain costly cooperation in combat at a remarkably large scale, at least in part, through punishment of free-riders. Raiding parties comprised several hundred warriors and participants are not kin or day-to-day interactants. Warriors incur substantial risk of death and produce collective benefits. Cowardice and desertions occur, and are punished by community-imposed sanctions, including collective corporal punishment and fines. Furthermore, Turkana norms governing warfare benefit the ethnolinguistic group, a population of a half-million people, at the expense of smaller social groupings. These results challenge current views that punishment is unimportant in small-scale societies and that human cooperation evolved in small groups of kin and familiar individuals. Instead, these results suggest that cooperation at the larger scale of ethnolinguistic units enforced by third-party sanctions could have a deep evolutionary history in the human species.

2011年6月30日木曜日

Risk of collective failure provides an escape from the tragedy of the commons

Francisco C. Santos and Jorge M. Pacheco
PNAS June 28, 2011 vol. 108 no. 26 10421-10425

From group hunting to global warming, how to deal with collective action may be formulated in terms of a public goods game of cooperation. In most cases, contributions depend on the risk of future losses. Here, we introduce an evolutionary dynamics approach to a broad class of cooperation problems in which attempting to minimize future losses turns the risk of failure into a central issue in individual decisions. We find that decisions within small groups under high risk and stringent requirements to success significantly raise the chances of coordinating actions and escaping the tragedy of the commons. We also offer insights on the scale at which public goods problems of cooperation are best solved. Instead of large-scale endeavors involving most of the population, which as we argue, may be counterproductive to achieve cooperation, the joint combination of local agreements within groups that are small compared with the population at risk is prone to significantly raise the probability of success. In addition, our model predicts that, if one takes into consideration that groups of different sizes are interwoven in complex networks of contacts, the chances for global coordination in an overall cooperating state are further enhanced.

Trial by trial data analysis using computational models

Nathaniel D. Daw
Chapter for "affect, learning, and decision making: attention and performance xxiii"
August 27, 2009

In numerous and high-pro?le studies, researchers have recently begun to integrate computational models into the analysis of data from experiments on reward learning and decision making (Platt and Glimcher, 1999; O'Doherty et al., 2003; Sugrue et al., 2004; Barraclough et al., 2004; Samejima et al., 2005; Daw et al., 2006; Li et al., 2006; Frank et al., 2007; Tom et al., 2007; Kable and Glimcher, 2007; Lohrenz et al., 2007; Schonberg et al., 2007; Wittmann et al., 2008; Hare et al., 2008; Hampton et al., 2008; Plassmann et al., 2008). As these techniques are spreading rapidly, but have been developed and documented somewhat sporadically alongside the studies themselves, the present review aims to clarify the toolbox (see also O’Doherty et al., 2007). In particular, we discuss the rationale for these methods and the questions they are suited to address. We then offer a relatively practical tutorial about the basic statistical methods for their answer and how they can be applied to data analysis. The techniques are illustrated with ?ts of simple models to simulated datasets. Throughout, we ?ag interpretational and technical pitfalls of which we believe authors, reviewers, and readers should be aware. We focus on cataloging the particular, admittedly somewhat idiosyncratic, combination of techniques frequently used in this literature, but also on exposing these techniques as instances of a general set of tools that can be applied to analyze behavioral and neural data of many sorts. A number of other reviews (Daw and Doya, 2006; Dayan and Niv, 2008) have focused on the scientific conclusions that have been obtained with these methods, an issue we omit almost entirely here. There are also excellent books that cover statistical inference of this general sort with much greater generality, formal precision, and detail (MacKay, 2003; Gelman et al., 2004; Bishop, 2006; Gelman and Hill, 2007).

2011年6月28日火曜日

Distributed Neural Representation of Expected Value

The Journal of Neuroscience, May 11, 2005  25(19):4806 - 4812
Brian Knutson, Jonathan Taylor, Matthew Kaufman, Richard Peterson and Gary Glover

Anticipated reward magnitude and probability comprise dual components of expected value (EV), a cornerstone of economic and psychological theory. However, the neural mechanisms that compute EV have not been characterized. Using event-related functional
magnetic resonance imaging, we examined neural activation as subjects anticipated monetary gains and losses that varied in magnitude and probability. Group analyses indicated that, although the subcortical nucleus accumbens (NAcc) activated proportional to anticipated gain magnitude, the cortical mesial prefrontal cortex (MPFC) additionally activated according to anticipated gain probability. Individual difference analyses indicated that, although NAcc activation correlated with self-reported positive arousal, MPFC activation correlated with probability estimates. These findings suggest that mesolimbic brain regions support the computation of EV in an ascending and distributed manner: whereas subcortical regions represent an affective component, cortical regions also represent a probabilistic component, and, furthermore, may integrate the two.

2011年6月27日月曜日

Neuronal basis of sequential foraging decisions in a patchy environment

Benjamin Y Hayden, John M Pearson & Michael L Platt
Nature Neuroscience 14, 933–939 (2011)


Deciding when to leave a depleting resource to exploit another is a fundamental problem for all decision makers. The neuronal mechanisms mediating patch-leaving decisions remain unknown. We found that neurons in primate (Macaca mulatta) dorsal anterior cingulate cortex, an area that is linked to reward monitoring and executive control, encode a decision variable signaling the relative value of leaving a depleting resource for a new one. Neurons fired during each sequential decision to stay in a patch and, for each travel time, these responses reached a fixed threshold for patch-leaving. Longer travel times reduced the gain of neural responses for choosing to stay in a patch and increased the firing rate threshold mandating patch-leaving. These modulations more closely matched behavioral decisions than any single task variable. These findings portend an understanding of the neural basis of foraging decisions and endorse the unification of theoretical and experimental work in ecology and neuroscience.

2011年6月22日水曜日

Neurobiology of Value Integration: When Value Impacts Valuation

Soyoung Q Park, Thorsten Kahnt, Jörg Rieskamp, and Hauke R. Heekeren
The Journal of Neuroscience, 22 June 2011, 31(25): 9307-9314


お金と電気ショックの効用をどう統合しているのか?独立に加算?交互作用ある?選択モデルを行動データにフィットしても判定できなかったが、前頭前野(報酬関連部位)の神経活動(fMRI)にフィットしたら「交互作用を含んだモデル」が支持された。

選択(意思決定)モデルをfMRI信号にフィットして「モデル比較」をするという発想は面白いけど、モデル内の変数(効用、価値とか)が神経活動レベルでどう表象されているのか分からない以上、難しいと思う。


Everyday choice options have advantages (positive values) and disadvantages (negative values) that need to be integrated into an overall subjective value. For decades, economic models have assumed that when a person evaluates a choice option, different values contribute independently to the overall subjective value of the option. However, human choice behavior often violates this assumption, suggesting interactions between values. To investigate how qualitatively different advantages and disadvantages are integrated into an overall subjective value, we measured the brain activity of human subjects using fMRI while they were accepting or rejecting choice options that were combinations of monetary reward and physical pain. We compared different subjective value models on behavioral and neural data. These models all made similar predictions of choice behavior, suggesting that behavioral data alone are not sufficient to uncover the underlying integration mechanism. Strikingly, a direct model comparison on brain data decisively demonstrated that interactive value integration (where values interact and affect overall valuation) predicts neural activity in value-sensitive brain regions significantly better than the independent mechanism. Furthermore, effective connectivity analyses revealed that value-dependent changes in valuation are associated with modulations in subgenual anterior cingulate cortex–amygdala coupling. These results provide novel insights into the neurobiological underpinnings of human decision making involving the integration of different values.

Frontal Cortex and Reward-Guided Learning and Decision-Making

Matthew F.S. Rushworth, MaryAnn P. Noonan, Erie D. Boorman, Mark E. Walton and Timothy E. Behrens
Neuron Volume 70, Issue 6, 23 June 2011, Pages 1054-1069

Reward-guided decision-making and learning depends on distributed neural circuits with many components. Here we focus on recent evidence that suggests four frontal lobe regions make distinct contributions to reward-guided learning and decision-making: the lateral orbitofrontal cortex, the ventromedial prefrontal cortex and adjacent medial orbitofrontal cortex, anterior cingulate cortex, and the anterior lateral prefrontal cortex. We attempt to identify common themes in experiments with human participants and with animal models, which suggest roles that the areas play in learning about reward associations, selecting reward goals, choosing actions to obtain reward, and monitoring the potential value of switching to alternative courses of action.

2011年6月15日水曜日

The Neural Correlates of Subjective Utility of Monetary Outcome and Probability Weight in Economic and in Motor Decision under Risk

Shih-Wei Wu, Mauricio R. Delgado, and Laurence T. Maloney
The Journal of Neuroscience, 15 June 2011, 31(24): 8822-8831

In decision under risk, people choose between lotteries that contain a list of potential outcomes paired with their probabilities of occurrence. We previously developed a method for translating such lotteries to mathematically equivalent “motor lotteries.” The probability of each outcome in a motor lottery is determined by the subject's noise in executing a movement. In this study, we used functional magnetic resonance imaging in humans to compare the neural correlates of monetary outcome and probability in classical lottery tasks in which information about probability was explicitly communicated to the subjects and in mathematically equivalent motor lottery tasks in which probability was implicit in the subjects' own motor noise. We found that activity in the medial prefrontal cortex (mPFC) and the posterior cingulate cortex quantitatively represent the subjective utility of monetary outcome in both tasks. For probability, we found that the mPFC significantly tracked the distortion of such information in both tasks. Specifically, activity in mPFC represents probability information but not the physical properties of the stimuli correlated with this information. Together, the results demonstrate that mPFC represents probability from two distinct forms of decision under risk.

2011年6月8日水曜日

Neural basis of conditional cooperation

Shinsuke Suzuki, Kazuhisa Niki, Syoken Fujisaki and Eizo Akiyama
Social Cognitive & Affective Neurosci
Volume 6, Issue 3 Pp. 338-347

自分の論文なので一応宣伝。

互恵的協力行動についてのfMRI実験。
参加者はfMRI中で囚人のジレンマゲーム(ランダム・マッチングの繰り返しゲーム)を行う。

行動データ:参加者は「基本的には協力するが、裏切り者(過去のゲームで裏切った回数の多い人)には裏切る」という傾向を持つ。
fMRIデータ:「裏切り者に対して裏切る」とき、dlPFC(前頭前野外背側部)が活動。
→ 互恵性(しっぺ返しやトリガー戦略)の神経科学的基盤


Cooperation among genetically unrelated individuals is a fundamental aspect of society, but it has been a longstanding puzzle in biological and social sciences. Recently, theoretical studies in biology and economics showed that conditional cooperation?cooperating only with those who have exhibited cooperative behavior?can spread over a society. Furthermore, experimental studies in psychology demonstrated that people are actually conditional cooperators. In this study, we used functional magnetic resonance imaging to investigate the neural system underlying conditional cooperation by scanning participants during interaction with cooperative, neutral and non-cooperative opponents in prisoner's dilemma games. The results showed that: (i) participants cooperated more frequently with both cooperative and neutral opponents than with non-cooperative opponents; and (ii) a brain area related to cognitive inhibition of pre-potent responses (right dorsolateral prefrontal cortex) showed greater activation, especially when participants confronted non-cooperative opponents. Consequently, we suggest that cognitive inhibition of the motivation to cooperate with non-cooperators drives the conditional behavior.

2011年5月25日水曜日

Distributed Coding of Actual and Hypothetical Outcomes in the Orbital and Dorsolateral Prefrontal Cortex

H. Abe and D. Lee
Neuron, Volume 70, Issue 4, 731-741, 26 May 2011

チョキで負けたらチョキが悪かったことが分かる(実際の結果からの学習)。一方、じゃんけんのルールを知ってれば、パーを出せば勝っていたはずと分かる(仮想的な結果からの学習)。サルもこのような学習ができ、dlPFC、OFCのニューロンが実際の結果・仮想的結果の両方をコード(dlPFCの方が主に仮想的結果をコード)。

How are decision-making strategies altered by hypothetical outcomes resulting from unchosen actions? Abe and Lee find that monkeys adjust their strategies in a rock-paper-scissors task according to both actual and hypothetical outcomes. Neurons in the prefrontal cortex modulated their activity related to actual and hypothetical outcomes differently depending on the animal's choices, thereby encoding choice-outcome conjunctions for both experienced and hypothetical outcomes.

Action Dominates Valence in Anticipatory Representations in the Human Striatum and Dopaminergic Midbrain

Marc Guitart-Masip, Lluis Fuentemilla, Dominik R. Bach, Quentin J. M. Huys, Peter Dayan, Raymond J. Dolan, and Emrah Duzel
J. Neurosci. 2011;31 7867-7875

The acquisition of reward and the avoidance of punishment could logically be contingent on either emitting or withholding particular actions. However, the separate pathways in the striatum for go and no-go appear to violate this independence, instead coupling affect and effect. Respect for this interdependence has biased many studies of reward and punishment, so potential action?outcome valence interactions during anticipatory phases remain unexplored. In a functional magnetic resonance imaging study with healthy human volunteers, we manipulated subjects' requirement to emit or withhold an action independent from subsequent receipt of reward or avoidance of punishment. During anticipation, in the striatum and a lateral region within the substantia nigra/ventral tegmental area (SN/VTA), action representations dominated over valence representations. Moreover, we did not observe any representation associated with different state values through accumulation of outcomes, challenging a conventional and dominant association between these areas and state value representations. In contrast, a more medial sector of the SN/VTA responded preferentially to valence, with opposite signs depending on whether action was anticipated to be emitted or withheld. This dominant influence of action requires an enriched notion of opponency between reward and punishment.

2011年5月19日木曜日

Ventromedial Frontal Lobe Damage Disrupts Value Maximization in Humans

Nathalie Camille, Cathryn A. Griffiths, Khoi Vo, Lesley K. Fellows, and Joseph W. Kable
The Journal of Neuroscience, 18 May 2011, 31(20): 7527-7532

Recent work in neuroeconomics has shown that regions in orbitofrontal and medial prefrontal cortex encode the subjective value of different options during choice. However, these electrophysiological and neuroimaging studies cannot demonstrate whether such signals are necessary for value-maximizing choices. Here we used a paradigm developed in experimental economics to empirically measure and quantify violations of utility theory in humans with damage to the ventromedial frontal lobe (VMF). We show that people with such damage are more likely to make choices that violate the generalized axiom of revealed preference, which is the one necessary and sufficient condition for choices to be consistent with value maximization. These results demonstrate that the VMF plays a critical role in value-maximizing choice.

2011年5月4日水曜日

Elapsed Decision Time Affects the Weighting of Prior Probability in a Perceptual Decision Task

Timothy D. Hanks, Mark E. Mazurek, Roozbeh Kiani, Elisabeth Hopp, and Michael N. Shadlen
The Journal of Neuroscience, 27 April 2011, 31(17): 6339-6352

Decisions are often based on a combination of new evidence with prior knowledge of the probable best choice. Optimal combination requires knowledge about the reliability of evidence, but in many realistic situations, this is unknown. Here we propose and test a novel theory: the brain exploits elapsed time during decision formation to combine sensory evidence with prior probability. Elapsed time is useful because (1) decisions that linger tend to arise from less reliable evidence, and (2) the expected accuracy at a given decision time depends on the reliability of the evidence gathered up to that point. These regularities allow the brain to combine prior information with sensory evidence by weighting the latter in accordance with reliability. To test this theory, we manipulated the prior probability of the rewarded choice while subjects performed a reaction-time discrimination of motion direction using a range of stimulus reliabilities that varied from trial to trial. The theory explains the effect of prior probability on choice and reaction time over a wide range of stimulus strengths. We found that prior probability was incorporated into the decision process as a dynamic bias signal that increases as a function of decision time. This bias signal depends on the speed–accuracy setting of human subjects, and it is reflected in the firing rates of neurons in the lateral intraparietal area (LIP) of rhesus monkeys performing this task.

Human Dorsal Striatal Activity during Choice Discriminates Reinforcement Learning Behavior from the Gambler's Fallacy

Ryan K. Jessup and John P. O'Doherty
The Journal of Neuroscience, 27 April 2011, 31(17): 6296-6304

Reinforcement learning theory has generated substantial interest in neurobiology, particularly because of the resemblance between phasic dopamine and reward prediction errors. Actor–critic theories have been adapted to account for the functions of the striatum, with parts of the dorsal striatum equated to the actor. Here, we specifically test whether the human dorsal striatum—as predicted by an actor–critic instantiation—is used on a trial-to-trial basis at the time of choice to choose in accordance with reinforcement learning theory, as opposed to a competing strategy: the gambler's fallacy. Using a partial-brain functional magnetic resonance imaging scanning protocol focused on the striatum and other ventral brain areas, we found that the dorsal striatum is more active when choosing consistent with reinforcement learning compared with the competing strategy. Moreover, an overlapping area of dorsal striatum along with the ventral striatum was found to be correlated with reward prediction errors at the time of outcome, as predicted by the actor–critic framework. These findings suggest that the same region of dorsal striatum involved in learning stimulus–response associations may contribute to the control of behavior during choice, thereby using those learned associations. Intriguingly, neither reinforcement learning nor the gambler's fallacy conformed to the optimal choice strategy on the specific decision-making task we used. Thus, the dorsal striatum may contribute to the control of behavior according to reinforcement learning even when the prescriptions of such an algorithm are suboptimal in terms of maximizing future rewards.

Dissociable Effects of Lesions to Orbitofrontal Cortex Subregions on Impulsive Choice in the Rat

Adam C. Mar, Alice L. J. Walker, David E. Theobald, Dawn M. Eagle, and Trevor W. Robbins
The Journal of Neuroscience, 27 April 2011, 31(17): 6398-640

The orbitofrontal cortex (OFC) is implicated in a variety of adaptive decision-making processes. Human studies suggest that there is a functional dissociation between medial and lateral OFC (mOFC and lOFC, respectively) subregions when performing certain choice procedures. However, little work has examined the functional consequences of manipulations of OFC subregions on decision making in rodents. In the present experiments, impulsive choice was assessed by evaluating intolerance to delayed, but economically optimal, reward options using a delay-discounting paradigm. Following initial delay-discounting training, rats received bilateral neurotoxic or sham lesions targeting whole OFC (wOFC) or restricted to either mOFC or lOFC subregions. A transient flattening of delay-discounting curves was observed in wOFC-lesioned animals relative to shams—differences that disappeared with further training. Stable, dissociable effects were found when lesions were restricted to OFC subregions; mOFC-lesioned rats showed increased, whereas lOFC-lesioned rats showed decreased, preference for the larger-delayed reward relative to sham-controls—a pattern that remained significant during retraining after all delays were removed. When locations of levers leading to small–immediate versus large–delayed rewards were reversed, wOFC- and lOFC-lesioned rats showed retarded, whereas mOFC-lesioned rats showed accelerated, trajectories for reversal of lever preference. These results provide the first direct evidence for dissociable functional roles of the mOFC and lOFC for impulsive choice in rodents. The findings are consistent with recent human functional imaging studies and suggest that functions of mOFC and lOFC subregions may be evolutionarily conserved and contribute differentially to decision-making processes.

Roles of Nucleus Accumbens Core and Shell in Incentive-Cue Responding and Behavioral Inhibition

Frederic Ambroggi, Ali Ghazizadeh, Saleem M. Nicola, and Howard L. Fields
The Journal of Neuroscience, 4 May 2011, 31(18): 6820-6830

The nucleus accumbens (NAc) is involved in many reward-related behaviors. The NAc has two major components, the core and the shell. These two areas have different inputs and outputs, suggesting that they contribute differentially to goal-directed behaviors. Using a discriminative stimulus (DS) task in rats and inactivating the NAc by blocking excitatory inputs with glutamate antagonists, we dissociated core and shell contributions to task performance. NAc core but not shell inactivation decreased responding to a reward-predictive cue. In contrast, inactivation of either subregion induced a general behavioral disinhibition. This reveals that the NAc actively suppresses actions inappropriate to the DS task. Importantly, selective inactivation of the shell but not core significantly increased responding to the nonrewarded cue. To determine whether the different contributions of the NAc core and shell depend on the information encoded in their constituent neurons, we performed electrophysiological recording in rats performing the DS task. Although there was no firing pattern unique to either core or shell, the reward-predictive cue elicited more frequent and larger magnitude responses in the NAc core than in the shell. Conversely, more NAc shell neurons selectively responded to the nonrewarded stimulus. These quantitative differences might account for the different behavioral patterns that require either core or shell. Neurons with similar firing patterns could also have different effects on behavior due to their distinct projection targets.

2011年4月20日水曜日

国際会議等で発表を予定されている方へのご提案

信州大学の小松孝徳さんからの提案です。
http://tkomat-lab.com/logo.html

私も全面的に賛同しています。

-----------------------

研究者の皆様への「ご提案」

東日本大震災のような未曾有の大災害に対して,研究者である私たちができること.それは何も特別なことではなく,自分たちのやるべきことを粛々とこなし続けること,つまり社会で力強く活躍できる人材を世に輩出できるよう教育活動に力を注ぎ,そして研究活動に精進することでその研究成果を世界に向けて発信し続けることしかないのかもしれません.

研究成果の発信手段の一つとして,国際会議にて研究発表を行う研究者の方は多いと思います.そしてその際に,何らかの形で今回の大震災に対する海外からの支援への感謝を表明したいと思っている方も多いと思います.そこで,この感謝の気持ちをスライドやポスター上で示せるようなワンポイントのロゴを作成しました.

国際会議に参加する研究者は,その会議における「日本代表」に他なりません.私は日本代表として,胸を張って,笑顔で,そして堂々と,「ありがとう,日本は元気です!」と,海外からの支援へに対する心からの感謝を,そして「この震災に日本人は絶対に負けない!」「日本人研究者は元気だぜ!」という強い気持ちも発信したいと考えています.

こんな若輩者の私の「気持ち」に少しでも共感いただけましたら,このロゴを発表スライドやポスターの片隅に張り付けて頂けないでしょうか.その会議に参加した日本からの発表者の皆さんの多くがこのロゴを使っていると,「おお,日本人はみんな一体になって頑張ってるんだ!」という強いメッセージになると思います.

なお,このロゴは,明治大学理工学部・宮下芳明先生,T-D-F代表・園山隆輔様,書道家 麗 先生に作成いただきました.こんな私の思いつきを迅速に高いクオリティにて具現化していただいたお三方には,本当に心から感謝です(てか,ぶっちゃけこのロゴ,相当カッコイイですよね!).なお,このロゴの配布については特に著作権などは主張いたしませんが,ここに記した目的以外の使用についてはお手数ですが私までご一報を頂けますと幸いです.

「日本代表」となる研究者の皆様のご理解,ご協力を心よりお願い申し上げます.

最後に,東日本大震災にてお亡くなりになられた方々のご冥福を心からお祈りし,被災された皆さまおよびそのご家族の皆様に心からお見舞いを申しあげます.

We will never walk alone!

平成23年4月21日
小松 孝徳
信州大学ファイバーナノテク国際若手研究者育成拠点
e-mail: tkomat@acm.org

2011年4月12日火曜日

Social rejection shares somatosensory representations with physical pain

Ethan Kross, Marc G. Berman, Walter Mischel, Edward E. Smith, and Tor D. Wager
PNAS April 12, 2011 vol. 108 no. 15 6270-6275

How similar are the experiences of social rejection and physical pain? Extant research suggests that a network of brain regions that support the affective but not the sensory components of physical pain underlie both experiences. Here we demonstrate that when rejection is powerfully elicited by having people who recently experienced an unwanted break-up view a photograph of their ex-partner as they think about being rejected areas that support the sensory components of physical pain (secondary somatosensory cortex; dorsal posterior insula) become active. We demonstrate the overlap between social rejection and physical pain in these areas by comparing both conditions in the same individuals using functional MRI. We further demonstrate the specificity of the secondary somatosensory cortex and dorsal posterior insula activity to physical pain by comparing activated locations in our study with a database of over 500 published studies. Activation in these regions was highly diagnostic of physical pain, with positive predictive values up to 88%. These results give new meaning to the idea that rejection “hurts.” They demonstrate that rejection and physical pain are similar not only in that they are both distressing?they share a common somatosensory representation as well.

2011年4月6日水曜日

Neural Correlates of Forward Planning in a Spatial Decision Task in Humans

The Journal of Neuroscience, 6 April 2011, 31(14): 5526-5539
Dylan Alexander Simon and Nathaniel D. Daw

Although reinforcement learning (RL) theories have been influential in characterizing the mechanisms for reward-guided choice in the brain, the predominant temporal difference (TD) algorithm cannot explain many flexible or goal-directed actions that have been demonstrated behaviorally. We investigate such actions by contrasting an RL algorithm that is model based, in that it relies on learning a map or model of the task and planning within it, to traditional model-free TD learning. To distinguish these approaches in humans, we used functional magnetic resonance imaging in a continuous spatial navigation task, in which frequent changes to the layout of the maze forced subjects continually to relearn their favored routes, thereby exposing the RL mechanisms used. We sought evidence for the neural substrates of such mechanisms by comparing choice behavior and blood oxygen level-dependent (BOLD) signals to decision variables extracted from simulations of either algorithm. Both choices and value-related BOLD signals in striatum, although most often associated with TD learning, were better explained by the model-based theory. Furthermore, predecessor quantities for the model-based value computation were correlated with BOLD signals in the medial temporal lobe and frontal cortex. These results point to a significant extension of both the computational and anatomical substrates for RL in the brain.

Signals in Human Striatum Are Appropriate for Policy Update Rather than Value Prediction

The Journal of Neuroscience, 6 April 2011, 31(14): 5504-5511
Jian Li and Nathaniel D. Daw

Influential reinforcement learning theories propose that prediction error signals in the brain's nigrostriatal system guide learning for trial-and-error decision-making. However, since different decision variables can be learned from quantitatively similar error signals, a critical question is: what is the content of decision representations trained by the error signals? We used fMRI to monitor neural activity in a two-armed bandit counterfactual decision task that provided human subjects with information about forgone and obtained monetary outcomes so as to dissociate teaching signals that update expected values for each action, versus signals that train relative preferences between actions (a policy). The reward probabilities of both choices varied independently from each other. This specific design allowed us to test whether subjects' choice behavior was guided by policy-based methods, which directly map states to advantageous actions, or value-based methods such as Q-learning, where choice policies are instead generated by learning an intermediate representation (reward expectancy). Behaviorally, we found human participants' choices were significantly influenced by obtained as well as forgone rewards from the previous trial. We also found subjects' blood oxygen level-dependent responses in striatum were modulated in opposite directions by the experienced and forgone rewards but not by reward expectancy. This neural pattern, as well as subjects' choice behavior, is consistent with a teaching signal for developing habits or relative action preferences, rather than prediction errors for updating separate action values.

2011年4月2日土曜日

Watching My Mind Unfold versus Yours: An fMRI Study Using a Novel Camera Technology to Examine Neural Differences in Self-projection of Self versus Other Perspectives

Peggy L. St. Jacques, Martin A. Conway, Matthew W. Lowder, and Roberto Cabeza
Journal of Cognitive Neuroscience
June 2011, Vol. 23, No. 6, Pages 1275-1284

Self-projection, the capacity to re-experience the personal past and to mentally infer another person's perspective, has been linked to medial prefrontal cortex (mPFC). In particular, ventral mPFC is associated with inferences about one's own self, whereas dorsal mPFC is associated with inferences about another individual. In the present fMRI study, we examined self-projection using a novel camera technology, which employs a sensor and timer to automatically take hundreds of photographs when worn, in order to create dynamic visuospatial cues taken from a first-person perspective. This allowed us to ask participants to self-project into the personal past or into the life of another person. We predicted that self-projection to the personal past would elicit greater activity in ventral mPFC, whereas self-projection of another perspective would rely on dorsal mPFC. There were three main findings supporting this prediction. First, we found that self-projection to the personal past recruited greater ventral mPFC, whereas observing another person's perspective recruited dorsal mPFC. Second, activity in ventral versus dorsal mPFC was sensitive to parametric modulation on each trial by the ability to relive the personal past or to understand another's perspective, respectively. Third, task-related functional connectivity analysis revealed that ventral mPFC contributed to the medial temporal lobe network linked to memory processes, whereas dorsal mPFC contributed to the fronto-parietal network linked to controlled processes. In sum, these results suggest that ventral–dorsal subregions of the anterior midline are functionally dissociable and may differentially contribute to self-projection of self versus other.

2011年3月25日金曜日

Model-Based Influences on Humans' Choices and Striatal Prediction Errors

Nathaniel D. Daw, Samuel J. Gershman, Ben Seymour, Peter Dayan, Raymond J. Dolan
Neuron, Volume 69, Issue 6, 1204-1215, 24 March 2011

The mesostriatal dopamine system is prominently implicated in model-free reinforcement learning, with fMRI BOLD signals in ventral striatum notably covarying with model-free prediction errors. However, latent learning and devaluation studies show that behavior also shows hallmarks of model-based planning, and the interaction between model-based and model-free values, prediction errors, and preferences is underexplored. We designed a multistep decision task in which model-based and model-free influences on human choice behavior could be distinguished. By showing that choices reflected both influences we could then test the purity of the ventral striatal BOLD signal as a model-free report. Contrary to expectations, the signal reflected both model-free and model-based predictions in proportions matching those that best explained choice behavior. These results challenge the notion of a separate model-free learner and suggest a more integrated computational architecture for high-level human decision-making.

Surprise Signals in Anterior Cingulate Cortex: Neuronal Encoding of Unsigned Reward Prediction Errors Driving Adjustment in Behavior

Benjamin Y. Hayden, Sarah R. Heilbronner, John M. Pearson, and Michael L. Platt
The Journal of Neuroscience, 16 March 2011, 31(11):4178-4187

In attentional models of learning, associations between actions and subsequent rewards are stronger when outcomes are surprising, regardless of their valence. Despite the behavioral evidence that surprising outcomes drive learning, neural correlates of unsigned reward prediction errors remain elusive. Here we show that in a probabilistic choice task, trial-to-trial variations in preference track outcome surprisingness. Concordant with this behavioral pattern, responses of neurons in macaque (Macaca mulatta) dorsal anterior cingulate cortex (dACC) to both large and small rewards were enhanced when the outcome was surprising. Moreover, when, on some trials, probabilities were hidden, neuronal responses to rewards were reduced, consistent with the idea that the absence of clear expectations diminishes surprise. These patterns are inconsistent with the idea that dACC neurons track signed errors in reward prediction, as dopamine neurons do. Our results also indicate that dACC neurons do not signal conflict. In the context of other studies of dACC function, these results suggest a link between reward-related modulations in dACC activity and attention and motor control processes involved in behavioral adjustment. More speculatively, these data point to a harmonious integration between reward and learning accounts of ACC function on one hand, and attention and cognitive control accounts on the other.

2011年3月2日水曜日

A Regret-Induced Status Quo Bias

Antoinette Nicolle, Stephen M. Fleming, Dominik R. Bach, Jon Driver, and Raymond J. Dolan
The Journal of Neuroscience, 2 March 2011, 31(9): 3320-3327

何かを変えて失敗したときの後悔の方が大きいから、ヒトは現状維持に傾く。行動を変えて失敗したときには、Insula(島)とmPFC(前頭前野内側部)が活動し、その活動が大きいと次回は現状維持に傾きやすくなる。

なんか人生訓として深い…

単純なコントラスト・ベースの解析だけでもここまでできるんだなあ。さすが、Ray Dolanという感じ。

A suboptimal bias toward accepting the status quo option in decision-making is well established behaviorally, but the underlying neural mechanisms are less clear. Behavioral evidence suggests the emotion of regret is higher when errors arise from rejection rather than acceptance of a status quo option. Such asymmetry in the genesis of regret might drive the status quo bias on subsequent decisions, if indeed erroneous status quo rejections have a greater neuronal impact than erroneous status quo acceptances. To test this, we acquired human fMRI data during a difficult perceptual decision task that incorporated a trial-to-trial intrinsic status quo option, with explicit signaling of outcomes (error or correct). Behaviorally, experienced regret was higher after an erroneous status quo rejection compared with acceptance. Anterior insula and medial prefrontal cortex showed increased blood oxygenation level-dependent signal after such status quo rejection errors. In line with our hypothesis, a similar pattern of signal change predicted acceptance of the status quo on a subsequent trial. Thus, our data link a regret-induced status quo bias to error-related activity on the preceding trial.

2011年2月24日木曜日

A reservoir of time constants for memory traces in cortical neurons.

Nat Neurosci. 2011 Feb 13.
Bernacchia A, Seo H, Lee D, Wang XJ.

適切な学習のためには、報酬を適切な時間スケールで評価することが重要。環境変動が激しい(激しくない)ときは短い(長い)スケールで評価すべき。

ACC、dlPFC、LIP、の三つの部位全てに、異なる時間スケールで報酬情報を保持するニューロンが存在する。そしてその分布は「べき分布」に従う。

According to reinforcement learning theory of decision making, reward expectation is computed by integrating past rewards with a fixed timescale. In contrast, we found that a wide range of time constants is available across cortical neurons recorded from monkeys performing a competitive game task. By recognizing that reward modulates neural activity multiplicatively, we found that one or two time constants of reward memory can be extracted for each neuron in prefrontal, cingulate and parietal cortex. These timescales ranged from hundreds of milliseconds to tens of seconds, according to a power law distribution, which is consistent across areas and reproduced by a 'reservoir' neural network model. These neuronal memory timescales were weakly, but significantly, correlated with those of monkey's decisions. Our findings suggest a flexible memory system in which neural subpopulations with distinct sets of long or short memory timescales may be selectively deployed according to the task demands.

2011年2月21日月曜日

Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex.

サルの損傷研究。報酬に基づく意思決定において、mOFC(内側眼窩前頭野)はdecisionに、lOFC(背側眼窩前頭野)はlearningに関与。 http://www.ncbi.nlm.nih.gov/pubmed/21059901 http://www.ncbi.nlm.nih.gov/pubmed/20346766

「lOFCがないと、credit assignmentに失敗する」とかすごくおもしろい。単純な「条件付け」にもまだまだフロンティアがありそう。

Proc Natl Acad Sci U S A. 2010 Nov 23;107(47):20547-52. Epub 2010 Nov 8.
Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex.
Noonan MP, Walton ME, Behrens TE, Sallet J, Buckley MJ, Rushworth MF.

Uncertainty about the function of orbitofrontal cortex (OFC) in guiding decision-making may be a result of its medial (mOFC) and lateral (lOFC) divisions having distinct functions. Here we test the hypothesis that the mOFC is more concerned with reward-guided decision making, in contrast with the lOFC's role in reward-guided learning. Macaques performed three-armed bandit tasks and the effects of selective mOFC lesions were contrasted against lOFC lesions. First, we present analyses that make it possible to measure reward-credit assignment--a crucial component of reward-value learning--independently of the decisions animals make. The mOFC lesions do not lead to impairments in reward-credit assignment that are seen after lOFC lesions. Second, we examined how the reward values of choice options were compared. We present three analyses, one of which examines reward-guided decision making independently of reward-value learning. Lesions of the mOFC, but not the lOFC, disrupted reward-guided decision making. Impairments after mOFC lesions were a function of the multiple option contexts in which decisions were made. Contrary to axiomatic assumptions of decision theory, the mOFC-lesioned animals' value comparisons were no longer independent of irrelevant alternatives.

Neuron. 2010 Mar 25;65(6):927-39.
Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning.
Walton ME, Behrens TE, Buckley MJ, Rudebeck PH, Rushworth MF.

Orbitofrontal cortex (OFC) is widely held to be critical for flexibility in decision-making when established choice values change. OFC's role in such decision making was investigated in macaques performing dynamically changing three-armed bandit tasks. After selective OFC lesions, animals were impaired at discovering the identity of the highest value stimulus following reversals. However, this was not caused either by diminished behavioral flexibility or by insensitivity to reinforcement changes, but instead by paradoxical increases in switching between all stimuli. This pattern of choice behavior could be explained by a causal role for OFC in appropriate contingent learning, the process by which causal responsibility for a particular reward is assigned to a particular choice. After OFC lesions, animals' choice behavior no longer reflected the history of precise conjoint relationships between particular choices and particular rewards. Nonetheless, OFC-lesioned animals could still approximate choice-outcome associations using a recency-weighted history of choices and rewards.

Ventral Striatum and Orbitofrontal Cortex Are Both Required for Model-Based, But Not Model-Free, Reinforcement Learning

The Journal of Neuroscience, February 16, 2011, 31(7):2700-2705
Michael A. McDannald, Federica Lucantonio, Kathryn A. Burke, Yael Niv, and Geoffrey Schoenbaum

ラットの損傷研究。Unblockingを用いて「vStriatumはモデル・ベースド/モデル・フリー強化学習両方に、OFC は前者のみに効いている」。エレガントだけど、model-based RLの理解が深まったとは感じられなかった…

In many cases, learning is thought to be driven by differences between the value of rewards we expect and rewards we actually receive. Yet learning can also occur when the identity of the reward we receive is not as expected, even if its value remains unchanged. Learning from changes in reward identity implies access to an internal model of the environment, from which information about the identity of the expected reward can be derived. As a result, such learning is not easily accounted for by model-free reinforcement learning theories such as temporal difference reinforcement learning (TDRL), which predicate learning on changes in reward value, but not identity. Here, we used unblocking procedures to assess learning driven by value- versus identity-based prediction errors. Rats were trained to associate distinct visual cues with different food quantities and identities. These cues were subsequently presented in compound with novel auditory cues and the reward quantity or identity was selectively changed. Unblocking was assessed by presenting the auditory cues alone in a probe test. Consistent with neural implementations of TDRL models, we found that the ventral striatum was necessary for learning in response to changes in reward value. However, this area, along with orbitofrontal cortex, was also required for learning driven by changes in reward identity. This observation requires that existing models of TDRL in the ventral striatum be modified to include information about the specific features of expected outcomes derived from model-based representations, and that the role of orbitofrontal cortex in these models be clearly delineated.

2011年2月4日金曜日

Prefrontal coding of temporally discounted values during intertemporal choice.

Kim S, Hwang J, Lee D.
Neuron. 2008 Jul 10;59(1):161-72.

Reward from a particular action is seldom immediate, and the influence of such delayed outcome on choice decreases with delay. It has been postulated that when faced with immediate and delayed rewards, decision makers choose the option with maximum temporally discounted value. We examined the preference of monkeys for delayed reward in an intertemporal choice task and the neural basis for real-time computation of temporally discounted values in the dorsolateral prefrontal cortex. During this task, the locations of the targets associated with small or large rewards and their corresponding delays were randomly varied. We found that prefrontal neurons often encoded the temporally discounted value of reward expected from a particular option. Furthermore, activity tended to increase with [corrected] discounted values for targets [corrected] presented in the neuron's preferred direction, suggesting that activity related to temporally discounted values in the prefrontal cortex might determine the animal's behavior during intertemporal choice.

サルの異時点間意思決定。 http://bit.ly/eHK9LF 選択行動は双曲割引で説明されて、前頭前野外背側部(DLPFC)のニューロンが割引現在価値を保持。しかし、Daeyeol Lee、相変わらず力技だなあ。

2011年2月3日木曜日

Temporal discounting predicts risk sensitivity in rhesus macaques.

Hayden BY, Platt ML.
Curr Biol. 2007 Jan 9;17(1):49-53.

サルのリスク選好は時間選好から予測できる。
「リスク〜報酬を得るまでの時間が長い」と解釈できて、次の選択までの時間(ITI)が長いとリスク回避的になる(時間割引で説明できる)。全体的な傾向としてサルはリスク愛好的らしい。ちょっと意外。

この論文を読むと、ヒトを対象としたした実験と動物実験では、似ているようで全然違う枠組みを使っていることが分かる(経済実験にはITIっていう概念ないし)。比較・解釈には注意が必要だなあ。

「リスク選好は時間選好で予測できる」となると、時間割引の起源をどう説明すれば良いんだろう?利子の存在で説明するのはトートロジーっぽいしなあ。「現在価値が発散するのを防ぐため」という身も蓋もない説明をした先生もいたけど…

2011年2月2日水曜日

Dopamine-Mediated Reinforcement Learning Signals in the Striatum and Ventromedial Prefrontal Cortex Underlie Value-Based Choices

Gerhard Jocham, Tilmann A. Klein, and Markus Ullsperger
The Journal of Neuroscience, February 2, 2011, 31(5):1606-1613; doi:10.1523/JNEUROSCI.3904-10.2011

A large body of evidence exists on the role of dopamine in reinforcement learning. Less is known about how dopamine shapes the relative impact of positive and negative outcomes to guide value-based choices. We combined administration of the dopamine D2 receptor antagonist amisulpride with functional magnetic resonance imaging in healthy human volunteers. Amisulpride did not affect initial reinforcement learning. However, in a later transfer phase that involved novel choice situations requiring decisions between two symbols based on their previously learned values, amisulpride improved participants' ability to select the better of two highly rewarding options, while it had no effect on choices between two very poor options. During the learning phase, activity in the striatum encoded a reward prediction error. In the transfer phase, in the absence of any outcome, ventromedial prefrontal cortex (vmPFC) continually tracked the learned value of the available options on each trial. Both striatal prediction error coding and tracking of learned value in the vmPFC were predictive of subjects' choice performance in the transfer phase, and both were enhanced under amisulpride. These findings show that dopamine-dependent mechanisms enhance reinforcement learning signals in the striatum and sharpen representations of associative values in prefrontal cortex that are used to guide reinforcement-based decisions. 

強化学習とドーパミン。D2 antagonistを投与すると、学習に影響はないけど、学習結果を使った行動選択が一部改善される。また、予測誤差や価値関連のfMRI信号も増大する。データはきれいだけど、解釈はすっきりしない感じ。議論がムズい…

今日から来週火曜にかけて、行動実験×15人。がんばろう。

2011年1月27日木曜日

Representation of Others' Action by Neurons in Monkey Medial Frontal Cortex

Kyoko Yoshida, Nobuhito Saito, Atsushi Iriki, Masaki Isoda
Current Biology 21, 1–5, February 8, 2011

自己と他者を区別するMFCニューロン
(MFC: medial frontal cortex)

他人が意思決定をするのを見ているトライアル、自分が意思決定するトライアルが交互に現れる。
(他人の意思決定の結果の情報を、次の自分の意思決定で使えるのがポイント)

主に2種類のニューロンが見られた。
Self type: 自己の意思決定時に反応
Other type: 他者の意思決定時に反応
あと、少数だけど Mirror type(自己と他者両方に反応)も見られた。

おもしろいことに、
dorsal(preSMA含む)にはSelf typeが多くて、ventral(rostral cingurate含む)にはOther typeが多かった。

→ MFCが他者の行動と自己の行動を区別するのに重要だろう。

Nature Neuroscienceあたりに載ってもおかしくない研究だと思ったけど…

Successful social interaction depends on not only the ability to identify with others but also the ability to distinguish between aspects of self and others [1-4]. Although there is considerable knowledge of a shared neural substrate between self-action and others' action [5], it remains unknown where and how in the brain the action of others is uniquely represented. Exploring such agent-specific neural codes is important because one's action and intention can differ between individuals [1]. Moreover, the assignment of social agency breaks down in a range of mental disorders [6-8]. Here, using two monkeys monitoring each other's action for adaptive behavioral planning, we show that the medial frontal cortex (MFC) contains a group of neurons that selectively encode others' action. These neurons, observed in both dominant and submissive monkeys, were significantly more prevalent in the dorsomedial convexity region of the MFC including the pre-supplementary motor area than in the cingulate sulcus region of the MFC including the rostral cingulate motor area. Further tests revealed that the difference in neuronal activity was not due to gaze direction or muscular activity. We suggest that the MFC is involved in self-other differentiation in the domain of motor action and provides a fundamental neural signal for social learning.

2011年1月25日火曜日

States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning.

Neuron, Volume 66, Issue 4, 585-595, 27 May 2010
States versus Rewards: Dissociable Neural Prediction Error Signals Underlying Model-Based and Model-Free Reinforcement Learning.
Jan Gla"schersend email, Nathaniel Daw, Peter Dayan, John P. O'Doherty

 Model-based RL & Model-free RL.

Learning from Others: Introduction to the Special Review Series on Social Neuroscience.

Neuron Volume 65 Issue 6
March 24, 2010.

 Social neuroscience特集(レビュー論文):

Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Chris Frith, Uta Frith.
The Developing Social Brain: Implications for Education. Sarah-Jayne Blakemore.
Humans, Brains, and Their Environment: Marriage between Neuroscience and Anthropology? Georg Northoff.
Conceptual Challenges and Directions for Social Neuroscience. Ralph Adolphs.
The Challenge of Translation in Social Neuroscience: A Review of Oxytocin, Vasopressin, and Affiliative Behavior. Thomas R. Insel.
Social Interactions in “Simple” Model Systems. Marla B. Sokolowski.
Social Cognition and the Evolution of Language: Constructing Cognitive Phylogenies. W. Tecumseh Fitch, Ludwig Huber, Thomas Bugnyar.
Primate Social Cognition: Uniquely Primate, Uniquely Social, or Just Unique? Richard W. Byrne, Lucy A. Bates.
Genetics of Human Social Behavior. Richard P. Ebstein, Salomon Israel, Soo Hong Chew, Songfa Zhong, Ariel Knafo.