IntroductionReinforcement learning refers to the ability to associate events, stimuli, or actions with rewards and punishments1, and therefore plays an integral part in acquiring new skills, ranging from basic perceptual2 and motor skills3, to the development of higher-level cognitive strategies4,5. The identification of factors that contribute to reinforcement learning and decision making is therefore key in understanding normal - as well as abnormal - behavioral adaption.An intrinsic property of many common reinforcement learning paradigms is that more rewarded options (i.e., Good options) are sampled much more frequently than less rewarded or punished options (i.e., Bad options). The purpose of sampling is to reduce an option’s uncertainty (i.e., estimation-uncertainty), which increases the reliability of estimates of relevant behavioral factors (e.g., expected outcomes or expected values). Accordingly, because the estimation-uncertainty for frequently sampled Good options is small, expected value estimates are useful for guiding decision making. By contrast, other decision heuristics may be more appropriate for less sampled Bad options because their estimation-uncertainty is large.Some evidence suggests that estimation-uncertainty itself has utility for decision making. For example, options that were scarcely sampled in a passive viewing phase were more likely to be selected in a subsequent phase with informative feedback6,7, and options with more variable (vs. constant) outcomes were more frequently selected during learning8,9. Additionally, the time since an option was last sampled, a factor which in dynamic environments is monotonically and positively correlated with estimation-uncertainty, was used to guide decisions in a restless bandit-task, and individuals more sensitive thereof displayed increased exploration rates10. The utility of estimation-uncertainty likely depends on how much it can be reduced by informative feedback during learning (i.e., its prospective information-gain)11, therefore suggesting a role for estimation-uncertainty during learning by incentivizing exploration.However, it remains unclear if estimation-uncertainty affects decisions for the longer-term, namely when information-gains are not provided and learning hindered. On one hand, because estimation-uncertainty is believed to afford information-gains, it might only affect behavior during learning with informative feedback. On the other hand, other types of uncertainty - such as risk and ambiguity - are perceived as strongly aversive under no-learning conditions12,13, with ambiguity being closely related to estimation-uncertainty14.The present study was designed to test two hypotheses. First, whether estimation-uncertainty acquired during learning lingers and affects decisions in the absence of information-gains and further learning. Second, whether its impact is asymmetric with regards to the valence of the considered options (i.e., Good and Bad options). We designed a probabilistic reinforcement learning task with five different conditions, each consisting of pairs of one Good and one Bad option (two-armed bandit). The absolute expected values for Good and Bad options differed between conditions, but the difference between Good and Bad options within a condition was identical across conditions. In a subsequent test phase, participants (n = 50) made decisions between inter-mixed options from the learning phase using the learned information and without receiving feedback. Behavioral modeling was used to disentangle the contribution of different factors (e.g., estimation-uncertainty, expected value) during the test phase.The results support the hypothesis that participants consider estimation-uncertainty both during learning and in the subsequent test phase without learning. In particular, model-agnostic evidence show that decisions involving only uncertain Bad options are less influenced by expected values (as compared to decisions with only Good options), as well as significant correlations between an options sampling-rate during learning and its selection-rate during the test phase. Critically, these effects cannot be explained by differences in expected values. Behavioral modeling supports these results by showing that estimation-uncertainty contributed significantly to test-phase decisions involving Bad options, but not to decisions between two Good options. These results were replicated in two additional experiments (100 participants each, publicly available datasets)15.Our study highlights how acquired estimation-uncertainty affects decisions without further learning. Failing to account for estimation-uncertainty limits our understanding of the motivational factors that drive behavior, including reinforcement learning biases related to psychopathology.ResultsParticipants performed a probabilistic reinforcement learning task with five different conditions, each consisting of pairs of one Good and one Bad option (probability of receiving positive/negative feedback being respectively 0.75/0.25 and 0.25/0.75; Fig. 1A, B). The conditions differed in the probability (pAppetitive) that a feedback would be drawn from an appetitive context (positive/negative feedback = +1₪/0₪), or an aversive context (positive/negative feedback=0₪/-1₪). In other words, each option’s expected value correlated positively with the value of pAppetitive (Supplementary Table 1). In a subsequent test phase, participants made decisions between pairs consisting of inter-mixed options from the learning phase (i.e., Good vs Bad, Good vs Good and Bad vs Bad pairs; Fig. 1C). During this phase, no feedback was provided. Besides behavioral measures, we use behavioral modeling to disentangle the contribution of different behavioral factors to performance (e.g., estimation-uncertainty, expected value), and replicate our initial findings by analyzing two separate independent experiments (100 participants each)15.Fig. 1: Learning phase (n = 50 samples, unless stated otherwise).A Schematic of stimulus-outcome contingencies. Stimulus images from the Snoddgrass and Vanderwart ‘Like’ Objects dataset39 are released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License by courtesy of Michel J. Tarr, Carnegie Mellon University http://tarrlab.org. B Schematic of trial progression during the learning phase. New objects were presented in each block, for a total of 15 different pairs of objects. C Schematic of trial progression during the test phase. D Actual learning curves. E Average actual learning performances (t(49)1.0 = 10.886, p