Back

Community Perspective – Gillian Hadfield

On any given day, we make judgments about what is “okay” and “not okay”.

From mediating a dispute between co-workers to deciding if an outfit is appropriate for a dinner date, these decisions are part of modern life. For Gillian Hadfield, normativity, or the practice of classifying certain behavior as “okay” or “not okay”, is foundational to our ability to live in cooperative groups, and is more complex than it seems.

This process of making normative judgments may have ramifications for AI alignment, the field of research focused on developing AI that will act in accordance with human goals and values. The typical approach to doing this is to embed values in AI as if they were a fixed ground truth. However, Hadfield’s research shows that human norms and values are not static, but the result of dynamic interactions within human groups. Understanding how we make and use normative judgments in multiple facets of society will be crucial in developing AI that makes similar judgments competently.

“Human values are not ground truth, they are the equilibrium. They are characteristics of the interaction at that group level,” says Hadfield. “If you’re going to build an [AI] system that is integrated into that dynamic equilibrium in the way that humans are, it’s going to have to be adaptive and responsive and flexible. That’s what it means to participate in that equilibrium.”

Gillian Hadfield is Professor of Government and Policy and Professor of Computer Science at Johns Hopkins University. She holds a Canada CIFAR AI Chair at the Vector Institute for AI, and is a faculty affiliate at the Center for Human-Compatible AI at the University of California, Berkeley.

Hadfield’s research focuses on designing legal and regulatory systems to govern AI and other complex emerging technologies. Her AI2050 project examines the infrastructure of human normative systems—how human groups establish, maintain or adapt values, norms, and rules from interpersonal to institutional levels. Hadfield aims to facilitate the creation of cooperative AI systems that are able to seek out and learn relevant normative classifications in any social environment. This work addresses Hard Problem #3, which concerns AI alignment and building AI systems that are compatible with human society.

The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.

The concept that underlies most of your work is normativity. What is normativity? How is it different from deciding if something is true?

A very simple definition is the way in which humans say: this is an okay thing to do, this is not an okay thing to do. We say that about everything from what we eat, how we speak to one another, how we behave, what partners we choose, [and] how we run our companies. I think of normativity as the fact that humans have this ability to label things in that way. We often associate it with all the “shoulds”—you should do this, you shouldn’t do that. And we capture that in things like norms and rules and etiquette, regulations, law, and governance.

This sort of labeling ability spans multiple scales—what is known about how, or at what level of organization, normativity arises?

I like to think about it in an evolutionary sense. How did it evolve? I do think that it evolved at the human boundary—and the key thing here is that it evolved. Because it’s a concept that relates to a group of humans;it is a shared labeling. We develop the capacity to articulate and to share others’ justifications, or debate [them].

I think we don’t know a lot about the earliest versions of this, but we can imagine that in the most simple human societies, it’s what the pattern of discussions among people settles into. We are a group that allows this, we are a group that does it this way— we build our houses this way, we treat our children this way, we wear this kind of clothing. It’s the defining of the group, [and] I think in the earliest human societies the process would be emergent in that way. I think the evolutionary path is to increasingly structured ways of doing this— I refer to this as the evolution of normative infrastructure, classification institutions that help us say this is okay, this is not okay.

One of the components in my AI2050 proposal is a project I’m doing with a cultural anthropologist and a primatologist in Kenya on the Turkana of Kenya. We’re just analyzing the data from our fieldwork. What’s really interesting is this is a group that has established their elders as their institution for resolving disagreements. Sometimes that gets referred to as an informal institution, but there’s actually a lot of structure—there are rules about how the elders are supposed to perform this function. We ask our respondents things like, ‘would you help enforce the elders’ decision?’ If the elders say: X stole a goat, X must pay three goats’—would you help the person who’s entitled to the three goats, get them?

We see different answers depending on whether we say in our vignette, [if] the elders follow the required process of announcing the meeting in all villages, or if they didn’t. We see that if [the elders] didn’t do that, people are very clear—no, I wouldn’t feel the same obligation or incentive to help them. We’re really excited to be able to document that structure—then we can track [that process] through human development and evolution, to more elaborate structures.

Are people aware that they're making a normative judgment? Do their answers change, based on the fact that they know that they're making a normative decision?

Some colleagues and I just published a paper where we tested for that—and actually, we were surprised by this result. If you think about the way the law works [for example], it says: we have a rule that says you can’t wear short sleeves in the office. We think of that as: first, you make a factual judgment—does this outfit have short sleeves? Then, we just apply the rule—the outfit [has] short sleeves. Therefore, it violates the rules, and isn’t allowed. What we discovered was that if we asked people, “Here’s an image, does it have short sleeves?”, they answer one way. But if we asked a different group of people [about the same image], “Does it violate a rule against wearing short sleeves?”, they answered differently. Generally, people are less likely to say a rule has been violated when making a normative judgment.

I think this indicates that we’re triggering different parts of the brain and different decision making processes. We weren’t testing that, and that would be a great follow up study. But when we know that we are making normative judgments, when we’re asked to make normative judgments, we know there’s a lot of other things in play—the fairness of our society, and the stability of our systems. We may or may not be thinking about that consciously. If you think this ability is so fundamental, as I do, to the stability and productivity of human groups, then we should have sophisticated systems in our brains for being careful about how we make those judgments.

How are most AI systems trained to make decisions, and how would you train an AI to make normative judgments?

Theoretically, I’ve noticed this in the last seven years that I’ve been engaged in the AI alignment community, that there has been this idea that we just give the rules to the system. We’ve seen that with the emergence of large language models and reinforcement learning from human feedback, which is a fine tuning technique that’s used, for example, by OpenAI. Another approach that Anthropic is using is called Constitutional AI—there’s increasing sophistication here, but it’s still imbued with the sense that there is a ground truth about what normative judgments our machines should be making.

I think part of what we’re seeing right now with Gemini (Google’s AI chatbot, which had its ability to generate images of people suspended after it generated ahistorical images)— drawing pictures of German soldiers in 1943 who were not all white males—that’s coming from this idea. Developers [are assuming] we can put the values in and condition the models on these values, but you need a way for it to be a conversation and a process with the community. I think constitutional AI, for example, misses a key point that all of our constitutions only make sense as part of a constitutional process.

We resolve the constitutional questions through those processes, ultimately overseen by the Supreme Court, [which is] able to make a determination of what is or is not constitutional. What is or is not constitutional is not a fact that is in the environment, it’s what the community is thinking now. You can even see that with the shift we’re seeing in the US Supreme Court over the last couple of years—that it’s not like there’s a fixed answer to these questions. Ultimately, those normative decisions need to be the ones that inspire confidence in the general population—that they’re living in a group that has rules, that interprets those rules in impersonal and reliable ways, but are open to change or shift in how we think about things.

I think one of the key things that I tried to bring to this area is this much more complex understanding of how human normativity works. It’s actually pretty amazing, and my answer to the secret of our success is that we built these phenomenal systems. But we don’t understand very much about them and tend to think of them in superficial ways. I think this will be dangerous for AI if we don’t bring a much more sophisticated understanding of normativity to bear.

What are some things that AI developers need to be thinking about when building “normatively competent” AI systems?

I think we may need new training techniques. I’m increasingly persuaded that normativity is fundamental to human reasoning and human intelligence, and that it’s not just something that gets added on—which is how we’re building machines now. I don’t think it’s a consideration that can be slapped onto the fine tuning stage at the end.

They need to be thinking about how humans do this—we pay attention. We are constantly taking in information about the state of the equilibrium that we’re in. When we’re waiting for the bus to go to work, we are watching things like: How are people behaving on this bus? How are they responding when somebody’s using language that I thought was inappropriate? How are they responding when nobody’s getting up to give their seat to the pregnant or disabled person? That’s information we are taking in all the time. You can’t be too task-oriented, right? We’re operating on these multiple levels. We’re taking a bus going to work, but we’re also getting information all the time about the normative state.

We’re building machines that are doing tasks like “answer this question” or “write this summary”—[but] they’re just disconnected. When are they getting the feedback? When are they reading the reactions? They’re not getting information from live responses in the community. We need to think about how we might attach our machines to actual resources, to be able to go and ask, “Would this be okay? What would a group of people think about this?”

When we’re thinking about taking an action, and we’re not sure, we might ask for help from a colleague, “Could you read this email over for me? I’m not sure I’m using the right words”. Maybe all models would need a capacity to say: I need to consult another model that may be specialized. Maybe I need to quickly pull together a little jury or do some reasoning. Critically, this is not just a set of values you could plug in, or a wrapper that you can put on the model—I think it’s actually core to building [it].