Cross-Cutting Edge

Good Systems Scholar Refines Alignment Research to Ensure AI Better Reflects Human Values

By Michael Wolman

November 24, 2025

Brad Knox, a research associate professor in the Department of Computer Science, discusses AI alignment at Good Systems' annual research kickoff event, on September 26, 2025, at The University of Texas at Austin. — Brad Knox presents at Good Systems' annual research kickoff event on September 26th.

As artificial intelligence continues to shape daily life, Good Systems is bridging interconnected themes to study its social and ethical impacts. Last year, the initiative introduced cross-cutting themes to link its six core research projects and explore questions that span multiple fields.

This year, one of the researchers who launched that effort will continue his work: Brad Knox, a research associate professor in the Department of Computer Science, focuses on ensuring AI systems act in ways that reflect human values, a problem known as AI alignment.

Over the past year, he has advanced this work through both theoretical and applied research. With the Living and Working with Robots team — Good Systems’ Sam Baker (Department of English, College of Liberal Arts), Desmond Ong (Department of Psychology, College of Liberal Arts) and Peter Stone (Computer Science, College of Natural Sciences), along with School of Law professor Sean Williams — Knox explored the risks of AI companionship, identifying traits of digital companions that could harm users, disrupt relationships or raise broader ethical concerns. The project highlights the societal significance of human-AI interaction as such systems become increasingly common.

Knox spends much of his time grappling with a deceptively simple question: How should AI systems infer what humans want from their behavior? Existing answers, he says, are often flawed. “Algorithms that infer how to serve humans are either implicitly or explicitly derived from a psychological model of how human behavior is influenced by their hidden values, desires, and norms,” Knox said. “The problem is that this influence is poorly understood and typically overlooked by algorithm designers who prefer to reuse whatever psychological model is most predominant in their subfield, giving little thought to its accuracy."

Take online recommendation systems like news feeds on social media. According to Knox, a human user’s engagement with such content is often the result of both temptation and lack of information about the content being selected, and yet recommender system algorithms often assume that users click on the content that gives them the most value, and so they optimize to maximize such engagement behavior. The result: these recommender systems serve tempting, addictive, low-quality content that users later regret engaging with.

This dilemma arises across AI applications, from large language models (LLMs) to robotics. There’s no easy fix, but Knox and his team are developing methods and algorithms aimed at improving how AI systems interpret what users actually mean. As part of his work on reinforcement learning from human feedback, a key process in training large language models, Knox said his team has “proposed psychological models of how people form preferences” and shown that “people can be guided toward one model or another, enabling an algorithm built from such a model to better interpret their input.”

Knox’s research achievements last year included an Outstanding Paper Award for Emerging Topics in Reinforcement Learning at the Reinforcement Learning Conference for an article he co-authored on designing aligned reward functions, which represent what people want from a reinforcement learning algorithm. He also received a Coefficient Giving grant supporting foundational research on the beliefs, values and decision-making of artificial agents.

As part of that grant, Knox co-leads the AI + Human Objectives Institute (AHOI), a two-year initiative that brings together academics and members of the broader AI-safety community for dialogue on responsible AI design. The institute, he said, aims to bridge two worlds that would mutually benefit from more interaction: traditional academic research and the Effective-Altruism-influenced AI-safety movement.

“Brad’s research on alignment has made real contributions to how we think about AI safety,” said Stone, a founding member of Good Systems. “He brings a rare combination of technical depth and interdisciplinary breadth that’s essential to Good Systems’ mission.”

In 2025 – 26, Knox will continue to develop methods for aligning human input with how algorithms interpret it, advancing the broader goal of AI systems that more accurately reflect what people want. He will pursue that work alongside continued interdisciplinary collaborations within Good Systems.

Grand Challenge:

Good Systems