Every day, our brain makes thousands of decisions, big and small. Any of these decisions - from the least consequential such as picking a restaurant to the more important such as pursuing a different career or moving to a new city - may result in better or worse outcomes.
How does the brain gauge risk and reward in making these calls? The answer to this question continues to puzzle scientists, but a new study carried out by researchers at Harvard Medical School and Harvard University offers intriguing clues.
The research, published Feb. 19 in Nature and supported in part by federal funding, incorporated machine-learning concepts into mouse experiments to study the brain circuitry that supports reward-based decisions.
The scientists uncovered two groups of brain cells in mice: one that helps mice learn about above-average outcomes and another associated with below-average outcomes. Together, the experiments showed, these cells allow the brain to gauge the full range of possible rewards associated with a choice.
Our results suggest that mice - and by extension, other mammals - seem to be representing more fine-grained details about risk and reward than we thought before."
Jan Drugowitsch, co-senior author, associate professor of neurobiology, Blavatnik Institute at Harvard Medical School
If confirmed in humans, the findings could provide a framework for understanding how the human brain makes reward-based decisions and what happens to the ability to judge risk and reward when reward circuitry fails.
Machine learning illuminates reward-based decisions
Neuroscientists have long been interested in how the brain uses past experiences to make new decisions. However, according to Drugowitsch, many traditional theories about such decision-making fail to capture the complexity and nuance of real-world behavior.
Drugowitsch uses the example of selecting a restaurant: If you're in the mood to play it safe, you might choose a restaurant with a menu that experience tells you is reliably good, and if you feel like taking a risk, you might opt for a restaurant that you know offers a mix of exceptional and subpar dishes.
In the above example, the restaurants differ considerably in their range of offerings, yet existing neuroscience theories consider them equivalent when averaged, and thus predict an equal likelihood of choosing either.
"We know that this is not how humans and animals act - we can decide between seeking risks and playing it safe," Drugowitsch said. "We have a sense of more than just average expected rewards associated with our choices."
In recent years, machine-learning researchers developed a theory of decision-making that better captures the full range of potential rewards linked to a choice. They incorporated this theory into a new machine-learning algorithm that outperformed alternative algorithms in Atari video games and a range of other tasks in which each decision has multiple possible outcomes.
"They basically asked what happens if rather than just learning average rewards for certain actions, the algorithm learns the whole distribution, and they found it improved performance significantly," Drugowitsch said.
In a 2020 Nature paper, Naoshige Uchida, professor of molecular and cellular biology at Harvard University, and colleagues reanalyzed existing data to explore whether this machine-learning theory applied to neuroscience, in the context of decision-making in rodent brains. The analysis showed that in mice, activity of the neurotransmitter dopamine - which plays a role in reward-seeking, pleasure, and motivation - corresponded to reward-learning signals predicted by the algorithm.
In other words, Drugowitsch said, the work suggested that the new algorithm was better at explaining dopamine activity.
How mouse brains represent a range of rewards
In the new study, Drugowitsch teamed up with co-senior author Uchida to take the research a step further. Together, they designed mouse experiments to see how this process plays out in a brain region called the ventral striatum, which stores information about possible rewards associated with a decision.
"Dopamine activity only provides the learning signal for expected rewards, but we wanted to find representations of these learned rewards directly in the brain," Drugowitsch said.
The researchers trained mice to associate different odors with rewards of varying magnitudes - in essence, teaching mice the range of possible outcomes of a choice. They then presented the mice with odors, and observed licking behavior (mice lick more in anticipation of better rewards) while recording neural activity in the ventral striatum.
The team identified two distinct groups of neurons in the brain: One that helps a mouse learn about better-than-expected outcomes and another tied to worse-than-expected outcomes.
"You can think of this as having an optimist and a pessimist in your brain, both giving you advice on what to do next," Drugowitsch explained.
When the researchers silenced the "optimistic" neurons, the mouse exhibited behavior suggesting that it anticipated a less appealing reward. Conversely, when the researchers silenced the "pessimistic" neurons, the mouse behaved as if it expected a higher value treat.
"These two groups of brain cells work together to form a representation of the full distribution of potential rewards for a decision," Drugowitsch said.
The researchers see many future directions for their work, including how the brain makes decisions when there is more uncertainty about what each initial option represents and how their findings apply to more general reasoning about the world.
Drugowitsch noted that more research is needed to confirm the results in humans and to adapt the findings to the complexity of human decision-making. However, based on the parallels between mouse and human brains, he believes the work may already shed some light on how humans assess risk in decisions and why people with certain conditions such as depression or addiction may struggle with such assessments.
Source:
Journal reference:
Lowet, A. S., et al. (2025). An opponent striatal circuit for distributional reinforcement learning. Nature. doi.org/10.1038/s41586-024-08488-5.