As legislation and policy for AI is developed, we are making decisions about the societal values AI systems should uphold. But how do we trust that AI systems are abiding by human values such as privacy or fairness? How would we ensure that AI systems are technically capable of meeting our expectations? Nicolas Papernot recognizes the separation between the technical development of AI from the complex societal expectations surrounding it. To bridge the gap between these areas, Papernot is developing methods to translate human values into technical guarantees.
“It’s important to have communication channels between different disciplines so that we can understand each other,” says Papernot. “We [need to] adapt legal frameworks to what technology can do, but also for technologists to understand where the technology needs to go based on [people’s] expectations in the legal environment.”
Nicolas Papernot is a 2023 AI2050 Early Career Fellow and Assistant Professor of Computer Engineering, Computer Science, and Law at the University of Toronto. He is a 2022 Sloan Research Fellow in Computer Science and a Member of the Royal Society of Canada’s College of New Scholars. Papernot’s research interests are in the intersection of security, privacy, and machine learning. His AI2050 project designs verifiable AI treaties that could enable regulators, companies, governments, and the public to trust that AI is following consistent, documentable standards. This project addresses Hard Problem #8, which concerns coordinating AI governance between countries, companies and other key actors.
The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.
What does AI governance research entail?

Essentially, the question we’re asking is: how do we trust AI or machine learning systems? To trust a machine learning system, we need a sense of how robust the correctness of the model will be under varying operating conditions — what changes to the inputs should we expect the model to be able to understand? Our previous work showed that if a malicious entity can [perturb] model inputs, they can force the model to make an incorrect prediction. Models are not only fragile in the predictions they make, but they’re also easily subverted. If we want to be able to trust these systems, we have to ensure that they’re making the right predictions.
Another [question] is related to privacy, because a lot of machine learning systems are handling sensitive data. How do we enable hospitals, companies, and so on to apply machine learning techniques to these sensitive datasets without learning anything that is specific to individuals but only patterns found across the population? That’s a difficult balancing act between protecting privacy and enabling institutions to train models that are beneficial to society.
How would those considerations be implemented in real-world scenarios?

You have different actors — institutions like companies, individuals, and governments or [regulatory] organizations. Governance looks at how these different actors interact in this ecosystem. In practice, different actors prioritize different aspects of trust. Maybe individuals care about their privacy, while regulators are concerned with fairness, and institutions want a robust model. You have to understand how these different actors collaborate to arrive at a machine learning system that [satisfies] all these constraints.
There’s a lot of research on all these different aspects — if you can train a robust model that also respects privacy, for example. But there hasn’t been a lot of work done on how different actors communicate their preferences. How does a user [communicate] to a company about how private they expect the model to be? How does a regulator inspect what a company has done?
There’s also a question of how much trust there is between different actors. Can we assume that they’re behaving as they claim to be? There might be misaligned incentives— if a company is trying to generate profit, they might have an incentive opposed to an individual who is trying to protect their privacy. Because of these misaligned incentives, something similar to the Volkswagen diesel scandal could happen in machine learning. Volkswagen figured out how to decrease emissions during testing, but revert to higher emission levels once the test was over. The properties of models are not evaluated using the same processes or datasets. This leads to discrepancies where, if the company is malicious, they can exploit them to generate profit while claiming that they satisfy regulations. Conversely, if the company is honest, they might [be punished] for something that they didn’t do on purpose, by making a claim in good faith that the regulator is not able to confirm.
Are these misaligned incentives part of why regulating AI is so challenging?

Yes, it’s one of the reasons. Another reason is that sometimes there’s a discrepancy between the technology that the computer science community is building and what the broader population’s expectations are. Privacy is a good example — legal definitions of privacy are very different from definitions of privacy in computer science. This means that you have two worlds evolving in parallel. There can be cases where the technology is not adequate [to address] the expectations set up by legal definitions.
It also requires that we survey end users of technology to understand what their expectations are at varying degrees of expertise. Part of my AI2050 project is to perform such a survey, so that we better understand where the gaps are in terms of the guarantees that we should be able to demonstrate.
Last year, the US Supreme Court overturned the precedent set by their 1984 decision in Chevron v. Natural Resources Defense Council. While courts were once required to defer to the expertise of regulatory agencies, the authority to make decisions is reverting back to the courts. How does this shift in decision-making from experts to non-experts impact the importance of your work?

There’s an unrealistic assumption that either we trust the entity creating the model, or we defer the evaluation of that model to a [presumably unbiased] agency. Neither of these are satisfying. The latter is not satisfying [because] it puts a lot of work on these agencies — AI is applied to so many different fields, [and] you need domain expertise to evaluate models.
We’re keen on leveraging cryptography to place the onus [of evaluating models] on the entity creating the model. We’ve been collaborating with cryptographers to generate certificates of good behavior — attesting to the degree of privacy achieved during training, or design choices accounting for bias in the data set, for example. Since it’s done within a cryptographic protocol, evidence cannot be falsified. The regulators don’t have to do all the work, but they can trust the result. For the entity creating the model, [they] can diminish the financial risks that they would otherwise face from uncertainty around regulation, [especially] if decisions are left to non-experts.
On an international level, different countries are trying to assess what the others are doing. If [countries] provide cryptographic certificates, there is a way for an international body to audit these claims.
How would you introduce the concept of thinking about how AI models use data? For most, the process might seem like a black box. What questions would you encourage people to consider?

We need to clarify what data is used for and how [AI predictions] are obtained. I’ll give you a concrete example — the everyday perception of what a file is. In the past, you had a CD, and a file is on that CD. If you want to give the file to somebody else, you give them the CD. Now, you put something in a cloud storage space. It’s not clear where the file is. It’s abstract, and it makes it difficult for people to understand what data is.
AI increases this complexity. We don’t know if a model’s output is based on my personal data, or because it’s a statistical fact. For example, French people are more likely to like cheese — I’m French, so I’m allowed to say that — so if a model predicts that French people like cheese, is it basing it on my online shopping history, or a statistical pattern of people from France? If the model is making that prediction because it overfit to my purchase history specifically, that’s not good. But if it’s something that it has seen many times, and is inferring a correlation, then that’s alright — [unless] the prediction itself is something that should remain private.
Those are the kinds of things that I want to discuss in an open-ended interview format. There’s a lot to be learned from what people share with us.