Neural Correlates of the Divergence of Instrumental Probability Distributions

Mimi Liljeholm, Shuo Wang, June Zhang, and John P. O'Doherty
J. Neurosci. 2013;33 12519-12527




Flexible action selection requires knowledge about how alternative actions impact the environment: a “cognitive map” of instrumental contingencies. Reinforcement learning theories formalize this map as a set of stochastic relationships between actions and states, such that for any given action considered in a current state, a probability distribution is specified over possible outcome states. Here, we show that activity in the human inferior parietal lobule correlates with the divergence of such outcome distributions–a measure that reflects whether discrimination between alternative actions increases the controllability of the future–and, further, that this effect is dissociable from those of other information theoretic and motivational variables, such as outcome entropy, action values, and outcome utilities. Our results suggest that, although ultimately combined with reward estimates to generate action values, outcome probability distributions associated with alternative actions may be contrasted independently of valence computations, to narrow the scope of the action selection problem.


How the Visual Brain Encodes and Keeps Track of Time

Paolo Salvioni, Micah M. Murray, Lysiann Kalmbach, and Domenica Bueti
J. Neurosci. 2013;33 12423-12429


Time is embedded in any sensory experience: the movements of a dance, the rhythm of a piece of music, the words of a speaker are all examples of temporally structured sensory events. In humans, if and how visual cortices perform temporal processing remains unclear. Here we show that both primary visual cortex (V1) and extrastriate area V5/MT are causally involved in encoding and keeping time in memory and that this involvement is independent from low-level visual processing. Most importantly we demonstrate that V1 and V5/MT come into play simultaneously and seem to be functionally linked during interval encoding, whereas they operate serially (V1 followed by V5/MT) and seem to be independent while maintaining temporal information in working memory. These data help to refine our knowledge of the functional properties of human visual cortex, highlighting the contribution and the temporal dynamics of V1 and V5/MT in the processing of the temporal aspects of visual information.


The Expected Value of Control: An Integrative Theory of Anterior Cingulate Cortex Function

Amitai Shenhav, Matthew M. Botvinick, Jonathan D. Cohen
Neuron, Volume 79, Issue 2, 217-240, 24 July 2013

「何をどれくらいコントロールするのか?をそれぞれの期待報酬(expected value of control)を考慮したうえで決める」ことに関わっているとすると、先行研究の雑多な結果が統一的に解釈できる。

The dorsal anterior cingulate cortex (dACC) has a near-ubiquitous presence in the neuroscience of cognitive control. It has been implicated in a diversity of functions, from reward processing and performance monitoring to the execution of control and action selection. Here, we propose that this diversity can be understood in terms of a single underlying function: allocation of control based on an evaluation of the expected value of control (EVC). We present a normative model of EVC that integrates three critical factors: the expected payoff from a controlled process, the amount of control that must be invested to achieve that payoff, and the cost in terms of cognitive effort. We propose that dACC integrates this information, using it to determine whether, where and how much control to allocate. We then consider how the EVC model can explain the diverse array of findings concerning dACC function.


A Dual Operator View of Habitual Behavior Reflecting Cortical and Striatal Dynamics

Kyle S. Smith, Ann M. Graybiel
Neuron, Volume 79, Issue 2, 361-374, 27 June 2013

道具的条件づけにおける「習慣化(Habit formation)*」について。

習慣化*:道具的条件づけにおいて、何度も同じ学習を繰り返しているうちに、行動が「Goal-directed behavior」から「Habitual (Automatic) behavior」への移り変わること。

Habits are notoriously difficult to break and, if broken, are usually replaced by new routines. To examine the neural basis of these characteristics, we recorded spike activity in cortical and striatal habit sites as rats learned maze tasks. Overtraining induced a shift from purposeful to habitual behavior. This shift coincided with the activation of neuronal ensembles in the infralimbic neocortex and the sensorimotor striatum, which became engaged simultaneously but developed changes in spike activity with distinct time courses and stability. The striatum rapidly acquired an action-bracketing activity pattern insensitive to reward devaluation but sensitive to running automaticity. A similar pattern developed in the upper layers of the infralimbic cortex, but it formed only late during overtraining and closely tracked habit states. Selective optogenetic disruption of infralimbic activity during overtraining prevented habit formation. We suggest that learning-related spiking dynamics of both striatum and neocortex are necessary, as dual operators, for habit crystallization.


Restricting Temptations: Neural Mechanisms of Precommitment

Molly J. Crockett, Barbara R. Braams, Luke Clark, Philippe N. Tobler, Trevor W. Robbins, Tobias Kalenscher
Neuron, Volume 79, Issue 2, 391-401, 24 July 2013


Humans can resist temptations by exerting willpower, the effortful inhibition of impulses. But willpower can be disrupted by emotions and depleted over time. Luckily, humans can deploy alternative self-control strategies like precommitment, the voluntary restriction of access to temptations. Here, we examined the neural mechanisms of willpower and precommitment using fMRI. Behaviorally, precommitment facilitated choices for large delayed rewards, relative to willpower, especially in more impulsive individuals. While willpower was associated with activation in dorsolateral prefrontal cortex (DLPFC), posterior parietal cortex (PPC), and inferior frontal gyrus, precommitment engaged lateral frontopolar cortex (LFPC). During precommitment, LFPC showed increased functional connectivity with DLPFC and PPC, especially in more impulsive individuals, and the relationship between impulsivity and LFPC connectivity was mediated by value-related activation in ventromedial PFC. Our findings support a hierarchical model of self-control in which LFPC orchestrates precommitment by controlling action plans in more caudal prefrontal regions as a function of expected value.


The Differential Effects of Reward on Space- and Object-Based Attentional Allocation

Jeongmi Lee and Sarah Shomstein
J. Neurosci. 2013;33 10625-10633


Estimating reward contingencies and allocating attentional resources to a subset of relevant information are the most important contributors to increasing adaptability of an organism. Although recent evidence suggests that reward- and attention-based guidance recruits overlapping cortical regions and has similar effects on sensory responses, the exact nature of the relationship between the two remains elusive. Here, using event-related fMRI on human participants, we contrasted the effects of reward on space- and object-based selection in the same experimental setting. Reward was either distributed randomly or biased a particular object. Behavioral and neuroimaging results show that space- and object-based attention is influenced by reward differentially. Space-based attentional allocation is mandatory, integrating reward information over time, whereas object-based attentional allocation is a default setting that is completely replaced by the reward signal. Nonadditivity of the effects of reward and object-based attention was observed consistently at multiple levels of analysis in early visual areas as well as in control regions. These results provide strong evidence that space- and object-based allocation are two independent attentional mechanisms, and suggest that reward serves to constrain attentional selection.


Diffusion Dynamics of Socially Learned Foraging Techniques in Squirrel Monkeys

Nicolas Claidière, Emily J.E. Messer, William Hoppitt, Andrew Whiten
Current Biology, Volume 23, Issue 13, 1251-1255, 27 June 2013


Social network analyses [1,2,3,4,5] and experimental studies of social learning [6,7,8,9,10] have each become important domains of animal behavior research in recent years yet have remained largely separate. Here we bring them together, providing the first demonstration of how social networks may shape the diffusion of socially learned foraging techniques [11]. One technique for opening an artificial fruit was seeded in the dominant male of a group of squirrel monkeys and an alternative technique in the dominant male of a second group. We show that the two techniques spread preferentially in the groups in which they were initially seeded and that this process was influenced by monkeys’ association patterns. Eigenvector centrality predicted both the speed with which an individual would first succeed in opening the artificial fruit and the probability that they would acquire the cultural variant seeded in their group. These findings demonstrate a positive role of social networks in determining how a new foraging technique diffuses through a population.


Canceling actions involves a race between basal ganglia pathways

Robert Schmidt, Daniel K Leventhal, Nicolas Mallet, Fujun Chen & Joshua D Berke
Nature Neuroscience (2013) doi:10.1038/nn.3456
Received 22 March 2013 Accepted 31 May 2013 Published online 14 July 2013

結果:視床下核(Subthalamic nucleus)神経細胞は常にStopの合図に反応する。一方、基底核回路の下流に位置する黒質網様部(substantia nigra pars reticulata)の神経細胞はStopが成功した時にのみ反応する。
結論:Race modelが示唆するように、視床下核から黒質網様部にかけて「行動の取り消し(stop)」に関与する神経回路が存在する。

Salient cues can prompt the rapid interruption of planned actions. It has been proposed that fast, reactive behavioral inhibition involves specific basal ganglia pathways, and we tested this by comparing activity in multiple rat basal ganglia structures during performance of a stop-signal task. Subthalamic nucleus (STN) neurons exhibited low-latency responses to 'Stop' cues, irrespective of whether actions were canceled or not. By contrast, neurons downstream in the substantia nigra pars reticulata (SNr) only responded to Stop cues in trials with successful cancellation. Recordings and simulations together indicate that this sensorimotor gating arises from the relative timing of two distinct inputs to neurons in the SNr dorsolateral 'core' subregion: cue-related excitation from STN and movement-related inhibition from striatum. Our results support race models of action cancellation, with stopping requiring Stop-cue information to be transmitted from STN to SNr before increased striatal input creates a point of no return.


The Curse of Planning: Dissecting Multiple Reinforcement-Learning Systems by Taxing the Central Executive

Otto AR, Gershman SJ, Markman AB, Daw ND.
Psychol Sci. 2013 May;24(5):751-61



ヒトは利用可能な認知リソースに応じて、「モデル・フリー(簡単だけど柔軟性に欠ける)」と「モデル・ベースド(柔軟な対応が可能だが認知的負荷が高い)」、二種類の強化学習を使い分ける。 http://pss.sagepub.com/content/24/5/751

A number of accounts of human and animal behavior posit the operation of parallel and competing valuation systems in the control of choice behavior. In these accounts, a flexible but computationally expensive model-based reinforcement-learning system has been contrasted with a less flexible but more efficient model-free reinforcement-learning system. The factors governing which system controls behavior—and under what circumstances—are still unclear. Following the hypothesis that model-based reinforcement learning requires cognitive resources, we demonstrated that having human decision makers perform a demanding secondary task engenders increased reliance on a model-free reinforcement-learning strategy. Further, we showed that, across trials, people negotiate the trade-off between the two systems dynamically as a function of concurrent executive-function demands, and people’s choice latencies reflect the computational expenses of the strategy they employ. These results demonstrate that competition between multiple learning systems can be controlled on a trial-by-trial basis by modulating the availability of cognitive resources.


How psychological framing affects economic market prices in the lab and field

Ulrich Sonnemann, Colin F. Camerer, Craig R. Fox, and Thomas Langer
PNAS July 16, 2013 vol. 110 no. 29 11779-11784


"judged likelihoods of possible events vary systematically with the way the entire event space is partitioned, with probabilities of each of N partitioned events biased toward 1/N"

A fundamental debate in social sciences concerns how individual judgments and choices, resulting from psychological mechanisms, are manifested in collective economic behavior. Economists emphasize the capacity of markets to aggregate information distributed among traders into rational equilibrium prices. However, psychologists have identified pervasive and systematic biases in individual judgment that they generally assume will affect collective behavior. In particular, recent studies have found that judged likelihoods of possible events vary systematically with the way the entire event space is partitioned, with probabilities of each of N partitioned events biased toward 1/N. Thus, combining events into a common partition lowers perceived probability, and unpacking events into separate partitions increases their perceived probability. We look for evidence of such bias in various prediction markets, in which prices can be interpreted as probabilities of upcoming events. In two highly controlled experimental studies, we find clear evidence of partition dependence in a 2-h laboratory experiment and a field experiment on National Basketball Association (NBA) and Federation Internationale de Football Association (FIFA World Cup) sports events spanning several weeks. We also find evidence consistent with partition dependence in nonexperimental field data from prediction markets for economic derivatives (guessing the values of important macroeconomic statistics) and horse races. Results in any one of the studies might be explained by a specialized alternative theory, but no alternative theories can explain the results of all four studies. We conclude that psychological biases in individual judgment can affect market prices, and understanding those effects requires combining a variety of methods from psychology and economics.


Human cooperation

David G. Rand, Martin A. Nowak
Trends in Cognitive Sciences, 15 July 2013

巨匠Martin A. Nowakによる協力行動の進化の総説論文。
Suzuki & Kimura (2013)を引用しているのが素晴らしい(笑)!

Why should you help a competitor? Why should you contribute to the public good if free riders reap the benefits of your generosity? Cooperation in a competitive world is a conundrum. Natural selection opposes the evolution of cooperation unless specific mechanisms are at work. Five such mechanisms have been proposed: direct reciprocity, indirect reciprocity, spatial selection, multilevel selection, and kin selection. Here we discuss empirical evidence from laboratory experiments and field studies of human interactions for each mechanism. We also consider cooperation in one-shot, anonymous interactions for which no mechanisms are apparent. We argue that this behavior reflects the overgeneralization of cooperative strategies learned in the context of direct and indirect reciprocity: we show that automatic, intuitive responses favor cooperative strategies that reciprocate.


The Neural Representation of Unexpected Uncertainty during Value-Based Decision Making

Elise Payzan-LeNestour, Simon Dunne, Peter Bossaerts, John P. O'Doherty
Neuron, Volume 79, Issue 1, 10 July 2013, Pages 191–201


ヒトは報酬学習において、三種類の不確実性を同時に推定し学習率を調整している。また、その三種類「リスク(報酬の分散)、推定した報酬確率の不安度、報酬確率が変わる可能性」はそれぞれ異なる脳領域で処理されている。 http://www.cell.com/neuron/abstract/S0896-6273(13)00368-1

Uncertainty is an inherent property of the environment and a central feature of models of decision-making and learning. Theoretical propositions suggest that one form, unexpected uncertainty, may be used to rapidly adapt to changes in the environment, while being influenced by two other forms: risk and estimation uncertainty. While previous studies have reported neural representations of estimation uncertainty and risk, relatively little is known about unexpected uncertainty. Here, participants performed a decision-making task while undergoing functional magnetic resonance imaging (fMRI), which, in combination with a Bayesian model-based analysis, enabled us to separately examine each form of uncertainty examined. We found representations of unexpected uncertainty in multiple cortical areas, as well as the noradrenergic brainstem nucleus locus coeruleus. Other unique cortical regions were found to encode risk, estimation uncertainty, and learning rate. Collectively, these findings support theoretical models in which several formally separable uncertainty computations determine the speed of learning.


Neural Correlates of Risk Perception during Real-Life Risk Communication

Ralf Schmalzle, Frank Hacker, Britta Renner, Christopher J. Honey, and
Harald T. Schupp
J. Neurosci. 2013;33 10340-10347

(普通のfMRI解析とは異なり、Intersubject correlation (ISC) に注目している)

During global health crises, such as the recent H1N1 pandemic, the mass media provide the public with timely information regarding risk. To obtain new insights into how these messages are received, we measured neural data while participants, who differed in their preexisting H1N1 risk perceptions, viewed a TV report about H1N1. Intersubject correlation (ISC) of neural time courses was used to assess how similarly the brains of viewers responded to the TV report. We found enhanced intersubject correlations among viewers with high-risk perception in the anterior cingulate, a region which classical fMRI studies associated with the appraisal of threatening information. By contrast, neural coupling in sensory-perceptual regions was similar for the high and low H1N1-risk perception groups. These results demonstrate a novel methodology for understanding how real-life health messages are processed in the human brain, with particular emphasis on the role of emotion and differences in risk perceptions.


Reduced Striatal Responses to Reward Prediction Errors in Older Compared with Younger Adults

Ben Eppinger, Nicolas W. Schuck, Leigh E. Nystrom, and Jonathan D. Cohen
J. Neurosci. 2013;33 9905-9912


We examined whether older adults differ from younger adults in how they learn from rewarding and aversive outcomes. Human participants were asked to either learn to choose actions that lead to monetary reward or learn to avoid actions that lead to monetary losses. To examine age differences in the neurophysiological mechanisms of learning, we applied a combination of computational modeling and fMRI. Behavioral results showed age-related impairments in learning from reward but not in learning from monetary losses. Consistent with these results, we observed age-related reductions in BOLD activity during learning from reward in the ventromedial PFC. Furthermore, the model-based fMRI analysis revealed a reduced responsivity of the ventral striatum to reward prediction errors during learning in older than younger adults. This age-related reduction in striatal sensitivity to reward prediction errors may result from a decline in phasic dopaminergic learning signals in the elderly.


Oxytocin blunts social vigilance in the rhesus macaque

R. Becket Ebitz, Karli K. Watson, and Michael L. Platt
PNAS July 9, 2013 vol. 110 no. 28 11630-11635

オキシトシンが向社会的行動(信頼/協力など)を促進することは知られている。しかし、「向社会性そのものが促進されるのか?」、「社会的警戒感が薄れた結果として、促進されるのか?」は分かっていなかった。サルで実験したところ後者っぽいという話。 http://www.pnas.org/content/110/28/11630

Exogenous application of the neuromodulatory hormone oxytocin (OT) promotes prosocial behavior and can improve social function. It is unclear, however, whether OT promotes prosocial behavior per se, or whether it facilitates social interaction by reducing a state of vigilance toward potential social threats. To disambiguate these two possibilities, we exogenously delivered OT to male rhesus macaques, which have a characteristic pattern of species-typical social vigilance, and examined their performance in three social attention tasks. We first determined that, in the absence of competing task demands or goals, OT increased attention to faces and eyes, as in humans. By contrast, OT reduced species typical social vigilance for unfamiliar, dominant, and emotional faces in two additional tasks. OT eliminated the emergence of a typical state of vigilance when dominant face images were available during a social image choice task. Moreover, OT improved performance on a reward-guided saccade task, despite salient social distractors: OT reduced the interference of unfamiliar faces, particularly emotional ones, when these faces were task irrelevant. Together, these results demonstrate that OT suppresses vigilance toward potential social threats in the rhesus macaque. We hypothesize that a basic role for OT in regulating social vigilance may have facilitated the evolution of prosocial behaviors in humans.


Reward Value-Contingent Changes of Visual Responses in the Primate Caudate Tail Associated with a Visuomotor Skill

Shinya Yamamoto, Hyoung F. Kim, and Okihide Hikosaka
J. Neurosci. 2013;33 11227-11238 Open Access

その際、尾状核の後部(Caudate Tail)がその処理に関わっている。

A goal-directed action aiming at an incentive outcome, if repeated, becomes a skill that may be initiated automatically. We now report that the tail of the caudate nucleus (CDt) may serve to control a visuomotor skill. Monkeys looked at many fractal objects, half of which were always associated with a large reward (high-valued objects) and the other half with a small reward (low-valued objects). After several daily sessions, they developed a gaze bias, looking at high-valued objects even when no reward was associated. CDt neurons developed a response bias, typically showing stronger responses to high-valued objects. In contrast, their responses showed no change when object values were reversed frequently, although monkeys showed a strong gaze bias, looking at high-valued objects in a goal-directed manner. The biased activity of CDt neurons may be transmitted to the oculomotor region so that animals can choose high-valued objects automatically based on stable reward experiences.


The Human Brain Encodes Event Frequencies While Forming Subjective Beliefs

Mathieu d'Acremont, Wolfram Schultz, and Peter Bossaerts
J. Neurosci. 2013;33 10887-10897

事前情報はStriatum、実際のデータの情報はDefault-Modeネットワーク(angular gyri, posterior
cingulate, medial prefrontal cortex)に、事後確率はinferior frontal gyrusにコードされている。

To make adaptive choices, humans need to estimate the probability of future events. Based on a Bayesian approach, it is assumed that probabilities are inferred by combining a priori, potentially subjective, knowledge with factual observations, but the precise neurobiological mechanism remains unknown. Here, we study whether neural encoding centers on subjective posterior probabilities, and data merely lead to updates of posteriors, or whether objective data are encoded separately alongside subjective knowledge. During fMRI, young adults acquired prior knowledge regarding uncertain events, repeatedly observed evidence in the form of stimuli, and estimated event probabilities. Participants combined prior knowledge with factual evidence using Bayesian principles. Expected reward inferred from prior knowledge was encoded in striatum. BOLD response in specific nodes of the default mode network (angular gyri, posterior cingulate, and medial prefrontal cortex) encoded the actual frequency of stimuli, unaffected by prior knowledge. In this network, activity increased with frequencies and thus reflected the accumulation of evidence. In contrast, Bayesian posterior probabilities, computed from prior knowledge and stimulus frequencies, were encoded in bilateral inferior frontal gyrus. Here activity increased for improbable events and thus signaled the violation of Bayesian predictions. Thus, subjective beliefs and stimulus frequencies were encoded in separate cortical regions. The advantage of such a separation is that objective evidence can be recombined with newly acquired knowledge when a reinterpretation of the evidence is called for. Overall this study reveals the coexistence in the brain of an experience-based system of inference and a knowledge-based system of inference.


Ventromedial Prefrontal Cortex Encodes Emotional Value

Amy Winecoff, John A. Clithero, R. McKell Carter, Sara R. Bergman, Lihong
Wang, and Scott A. Huettel
J. Neurosci. 2013;33 11032-11039


The ventromedial prefrontal cortex (vmPFC) plays a critical role in processing appetitive stimuli. Recent investigations have shown that reward value signals in the vmPFC can be altered by emotion regulation processes; however, to what extent the processing of positive emotion relies on neural regions implicated in reward processing is unclear. Here, we investigated the effects of emotion regulation on the valuation of emotionally evocative images. Two independent experimental samples of human participants performed a cognitive reappraisal task while undergoing fMRI. The experience of positive emotions activated the vmPFC, whereas the regulation of positive emotions led to relative decreases in vmPFC activation. During the experience of positive emotions, vmPFC activation tracked participants' own subjective ratings of the valence of stimuli. Furthermore, vmPFC activation also tracked normative valence ratings of the stimuli when participants were asked to experience their emotions, but not when asked to regulate them. A separate analysis of the predictive power of vmPFC on behavior indicated that even after accounting for normative stimulus ratings and condition, increased signal in the vmPFC was associated with more positive valence ratings. These results suggest that the vmPFC encodes a domain-general value signal that tracks the value of not only external rewards, but also emotional stimuli.