Back

Community Perspective – Atoosa Kasirzadeh

Atoosa Kasirzadeh

Our values are not universal. As AI systems become embedded in everyday life, the question of how technology should navigate the diversity of human values becomes increasingly relevant. Atoosa Kasirzadeh believes that for AI to support society’s best interests, the answer cannot be limited to technical optimization alone.

“There’s a dynamic interplay between technology and its social, scientific, and philosophical implications,” says Kasirzadeh. “Without understanding it, we won’t be able to resolve the question of aligning AI systems with our values.”

Atoosa Kasirzadeh is a 2023 AI2050 Early Career Fellow and Assistant Professor at Carnegie Mellon University with joint appointments in the departments of Philosophy and Software & Societal Systems. She serves on the World Economic Forum’s Global Future Council on Artificial General Intelligence, has been the program co-chair for the 2025 and 2026 conferences of the International Association for Safe and Ethical AI, and was appointed Director of Research for the Center for Technomoral Futures at Edinburgh University. Her work has also been featured in popular media, including Vox, The Wall Street Journal, Al-Jazeera, and The Atlantic.

Kasirzadeh’s work develops interdisciplinary approaches to AI alignment and governance. Her AI2050 project establishes socio-technical methods to study AI value alignment in social and scientific contexts, examining how technology interacts and should interact with people across diverse contexts. She investigates these questions in relation to conversational chatbots and AI agents for scientific discovery.This work supports Hard Problem #3, which concerns aligning powerful AI with human interests.

The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.

What is the AI alignment problem and how do you approach it in your research?

The AI alignment problem is the challenge of ensuring that AI systems behave in ways that are consistent with human values, goals, or instructions. As AI systems become more powerful and agentic, their actions could have far-reaching consequences for society. If not properly aligned, they could potentially cause harm or act in conflict with human interest, even unintentionally. But “alignment” already presumes answers to two moral questions: What makes an action good, and what makes a person’s values good?

Engineers typically focus on AI alignment as a technical problem. This is necessary. However, most of the time, the goodness of technical solutions should be carefully interpreted in relation to the social or scientific problem the solution is supposed to resolve, which is why scholars in the philosophy and social sciences, or interdisciplinary researchers like myself, have argued for the importance of a socio-technical approach to AI alignment.

How does a technical approach differ from a socio-technical one?

If we took a purely technical approach to AI alignment for large language models (LLMs), for example, we could say LLMs should ideally produce accurate, coherent, and relevant responses. Engineers would improve the model’s ability to produce grammatically correct sentences or provide factual information. Then, we’d measure the accuracy or relevance of the responses, compare different LLMs, and deem the higher-performing models as more aligned with certain values.

However, users have many different backgrounds. Accuracy can also be perspective-dependent, especially in domains with partial evidence or uncertainty. Thus, there are almost certainly use cases the engineers did not anticipate when designing these models. A socio-technical approach builds frameworks that make this consideration part of the development process.

Why is incorporating socio-technical approaches in AI development important?

AI systems are cultural technologies which encode different values. When I collaborated with researchers from Hugging Face and the University of Amsterdam, we built a multilingual dataset called CIVIC (Culturally Informed and Value-Inclusive Corpus for Societal Impacts). It consisted of prompts regarding value-laden, socially sensitive topics, including LGBTQ+ rights, immigration, and disability rights, and was designed to show encoded and implicit biases in LLMs.

We used these prompts to investigate how LLMs developed in different regions of the world respond to socially sensitive issues, and observed significant social and cultural variability in responses. Some models refused to answer more often than others—Qwen, a model developed by a Chinese company, refused to answer four times more than Mistral, a model developed by a French startup. These refusals are influenced by the implicit values of the model, but also by the values and decisions made by developers.

We also found differing answers that could highlight biases within training data. When prompted on socially sensitive topics, such as whether it’s true or false that immigrants should be granted asylum, some models said it was true while others said it was false. There was also variation based on the language of the prompt—a model might agree with a statement in English, but disagree with it in Turkish. These observations illustrate the need for more socially and philosophically robust evaluations of value alignment.

We recently applied a socio-technical approach to AI for scientific discovery. In studying current “AI scientist” systems, we saw failure modes tied to reward misspecification: fabricated datasets, cherry-picked findings, and metric gaming. To be credible, an AI scientist must be aligned not just technically but also with the social norms and practices of science.

Could you speak to some examples of that discrepancy between metrics and how they function in real-life situations?

One of the most well-known theories of good communication is Grice’s Maxims, a set of principles devised by philosopher Paul Grice. One of the criteria Grice cites is “manner”, or courteous social behavior—a very culturally specific characteristic. In many cultures, manner is understood as being direct and clear. When defining a metric for manner, you might evaluate a clear conversation as one that is very polite.

But in other cultures, being ambiguous, obscure, or indirect sometimes allows communities to express more rich and complex relationships. Not being direct doesn’t always equate to being rude—the cultural manner simply differs. When making an idea or concept into a metric, we need to understand them within a culturally sensitive environment.

More broadly, communication occurs in diverse contexts. If an LLM is teaching mathematics, being unambiguous is important. But for an LLM facilitating creative writing, perhaps being unambiguous doesn’t make for the best communication. The purpose of conversation differs across domains, and so the values for what makes good conversation vary as well. I have published about this topic in more detail here.

What are the most challenging questions to resolve in AI alignment from a socio-technical perspective?

How should we prioritize one specific value over another? We disagree, for example, about whether freedom of speech should be constrained. Another question is how we should decide which values are the best, or which are legitimate. Even if we had a good technical solution for embedding human preferences in an AI system, the question of how we choose values and which values we choose remains unsolved. To put this in perspective, we lack robust methods for handling value conflicts with chatbots — a topic of active research, including in our own work. Many challenging problems therefore remain open.

As an interdisciplinary researcher, why are interdisciplinary collaborations important?

There’s always a wisdom of the crowd—they often result in an artifact that no one discipline alone could produce. Obviously, it’s more challenging because you need to speak other people’s language. Good interdisciplinary research requires work. It needs more than shallow, surface-level engagement, because you need to be willing to start a conversation and create meaningful dialogue. Sometimes, interdisciplinary expertise could support more foundational interpretations of data—instances of bias or faulty assumptions, for example, that might not be as obvious to an engineer without domain expertise.

Luckily, there are more spaces for that now, [like] interdisciplinary conferences. With more opportunities for those interactions, I’m hopeful that more and more people will embody meaningful interdisciplinary research.