Imagine your childhood home.
Now, imagine the typical simple depiction of a house: a rectangular shape, one or two floors tall, with a door in the front, a couple of windows, and a triangular roof.
How similar was your home to the most common portrayal of a house? Chances are, if you grew up in a North American suburb, your home looked relatively similar. But if you grew up in Hong Kong, or Mumbai, or Algeria, your home might have looked very different. Danish Pruthi noticed this discrepancy was perpetuated in the outputs generated by AI models. Instead of being limited to a single depiction of a home, AI models routinely underrepresent entire geographical regions of the world. Left unaddressed, Pruthi is concerned that this geocultural bias will have far-reaching consequences:
“AI technology is often marketed as technology for all of humanity, but in the community that I live in, in India, a lot of [it] isn’t as usable,” says Pruthi. “One of the worries of not doing this work [is that] technology will not be as open or available or useful for everyone — it’s going to boost certain privileged sections of society and increase the divide.”
Danish Pruthi is a 2023 AI2050 Early Career Fellow and Assistant Professor at the Indian Institute of Science (IISc), Bangalore. His work has been recognized by a 2018 Carnegie Mellon University Presidential Fellowship, a 2017 Siebel Scholarship, and industry awards from Google and Adobe. He has also conducted research at Google AI, Facebook AI Research, Microsoft Research, and Amazon AI.
Pruthi’s research interests are in natural language processing and deep learning, with a focus towards the inclusive development and evaluation of AI models. His AI2050 project quantifies geographical representation in AI systems, investigating how models exhibit bias or overlook the geoculture of underrepresented regions of the world. This project addresses Hard Problem #6, which concerns access to participation in the AI ecosystem and equitable access to AI’s benefits and capabilities.
The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.
Why do you think geocultural representation is overlooked in AI research concerning bias or representation?

The axes of discrimination that are primarily studied are race, gender, occupation, [and] sometimes religion. They are important and I’m glad that there’s more freedom and openness to talk about those issues. Geographical and cultural context is often missed— to give you examples, if you ask models questions about leaders in India, the answers are not as accurate, or not as detailed. If you ask them to plan an itinerary for a city in India, maybe that’s not as extensive as if you asked them to plan for a city in the US. They are subtle, and these are small examples — these are things we’re still discovering.
There is also erasure — I often worry about a lot of future content coming from these models. If these models are heavily underpredicting certain geographies, that’s erasing the voices and the content about these places in future conversations. This has a long history in colonialism, and in a broader history where minority voices have been erased.
What are the most egregious examples of geographical erasure that you’ve seen?

If you start with something like “I live in…”, or “My aunt is from…”, these are very simple geographically-minded prefixes. The most likely completion [by a large language model] is “Canada”, which is about 36 times more likely than “India” or “Pakistan”. Even accounting for the English-speaking population, South Asia has a much higher population than Canada, whose [population] is less than a single state in India. This may not be the most egregious example, but a very simple [one] that shows that even for things as basic as where you live, this hasn’t been completely figured out.
To address the other side of this issue — do you think that there are consequences for people who see their experiences as the default?

Definitely, I see many examples of it. For instance, I broadly identify as a researcher in the Natural Language Processing community, which basically studies how computers handle text. Every time we write a paper [that] involves English, nobody questions if it works for other languages. But if you write a paper about Hindi, people are very suspicious that whatever you’re doing may not actually work for anything else. I think we are acknowledging this more than we were a decade ago, but if you are in a privileged community, you take it for granted. People are not being malicious. They just don’t tend to think about those questions.
What reactions have you gotten when bringing this up to researchers in the natural language processing community?

A lot of people think that it’s just an issue of data — once you fix that, everything’s going to be magically equivalent to English performance. Getting high-quality data is an absolutely non-trivial thing — there have been so many investments, over decades, in the community for English that have gotten us to a stage where we have rich resources.
But there’s a lot of linguistic diversity that is unaccounted for. To give another example, Turkish has a very rich morphology. It’s a little bit like Lego. When you add a suffix or prefix, it changes meaning in a very specific deterministic way. But English doesn’t have that property. Models that tend to work for English may not work very well for Turkish because languages [themselves] are different.
You’ve said that methods to reduce bias with underrepresented groups don't map one-to-one onto geographical bias. Could you speak to that issue?

Part of it is that the data exists in certain languages that are still out of the scope of current technology to both consume and produce. There is also the question of content or knowledge about many places missing from the internet. A lot of cultural things that happen in many places are not universal — festivals, or ways of life, or traditions. Wedding ceremonies in India look very different from wedding ceremonies in the US, for example.
It’s partly the language barrier, and partly that the content is not available, or not as abundant. We need to get people to contribute more organically and also in an incentive-driven way. Even things as simple as typing on a mobile phone — for many Indian languages, that’s quite tedious. We haven’t even thought about good interfaces to type in many of our own languages, so you can’t blame people for not putting out content. People who put out a lot of content also tend to be of a certain economic status, such that they have time to showcase their work and put out their ideas. For a lot of people, that’s a luxury! They’re barely making ends meet, and they have no time to talk about their celebrations or festivals or customs. It’s a very complex socioeconomic issue.
You want people from all over the world to participate in your research. Why has getting that degree of participation been challenging?

One thing that I have been sorely missing is access to people across different geographies. We did a recent study where we were trying to show pictures of model generated kitchens, roads, houses, to people across different parts of the world. Then, they rate them from 1-5 on how well it reflects their surroundings. Just to do that evaluation, getting access to 50-100 people from different parts of the world turns out to be a non-trivial task.
You mentioned wanting to collaborate with other AI2050 Fellows in your proposal. What might some of those collaborations look like?

Some of the people I admire — just to name a couple of examples, like Percy Liang and Aditi Raghunathan, have done impressive work on the robustness of models when confronted with changes in the distribution that these models were trained on. One possible change in the distribution could be that you are now operating in a different geography. Take a simple image recognition model — when you train it, you tell it, “this is a kitchen”, or “this is a mailbox”. But even everyday things tend to look different in different regions! I think these are fantastic use cases for some of the work they have done.
Another issue is whether models should mimic the state as it is. If you believe that, you’d want model output to reflect the current situation. But the reality is that in many cases, the current situation is terrible! If you look at, let’s say, occupation statistics in faculty positions, there are very few women faculty members in leading universities. I think it’s a very important question for interdisciplinary scholars. If we were given a free hand on what the ideal output should be, what values should we include? What values should we aspire towards? It may not reflect the current status quo.
Issues of diversity and representation aren’t often framed as a matter of technical capability. Could you speak to how you see representation as a robustness or generalizability issue?

We definitely need diverse opinions, because if not, you end up collapsing into situations where you’re only speaking to and building for a certain demographic. Historically, there are tons of examples [of this] — even cars were rarely tested for women. The standard mannequins that were used to evaluate safety standards were modeled as male mannequins, so cars were evaluated to be safer for men [than for women], which is a simple evaluation failure. Diverse opinions and testing make the system more robust to diverse use cases.
If technology worked for a lot more people, a lot more people would be testing that technology and contributing to that technology. There would be much more diverse perspectives [in technology], because a lot more people would be thinking about it from all sorts of angles and testing it from all sorts of angles. It’s not just morally the right thing to do — technologically, it’s probably the right thing to do too.