Community Perspective – Chelsea Finn

Q&A with Chelsea Finn, AI2050 Early Career Fellow

Everyone makes mistakes — it’s an unavoidable part of learning something new. With the appropriate guidance, being corrected is often a valuable learning opportunity, a chance to identify where we went wrong and understand how to prevent future mistakes. Chelsea Finn is developing methods that similarly allow machines to learn from their mistakes. She wants machines to be able to use human feedback to reassess their understanding — even if machines don’t learn in exactly the same ways we do.

“Machines pick up on things that a human wouldn’t, and make mistakes that a human would never make,” says Finn. “Sometimes, machines will do things that appear very complex and impressive, but at the same time, not be able to do something that seems very basic.”

Chelsea Finn is a 2023 AI2050 Early Career Fellow and Assistant Professor of Computer Science and Electrical Engineering at Stanford University. She was awarded a Sloan Fellowship and a National Science Foundation CAREER award in 2023, and an Association for Computing Machinery doctoral dissertation award in 2018. Her research has also been covered by various media outlets including the New York Times, Wired, and Bloomberg. 

Finn’s research focuses on the capability of robots and other agents to display broadly intelligent behavior through learning and interaction. Her AI2050 project develops machine learning methods for AI systems to learn from targeted human feedback and environmental interactions, creating more reliable AI systems that could be taught and corrected. This project addresses Hard Problem #2, which concerns solving challenges related to AI performance, especially in safety-critical applications. 

The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.


Why do machines struggle in the real world?

The way that machine learning algorithms work is that you typically present them with data, and that data provides examples of how this machine should behave and output in different circumstances. Oftentimes, that data doesn’t necessarily reflect everything that it might encounter, [both] at test time and when it’s deployed in the real world.

The other challenge is that it’s only learning from examples. It’s trying to find patterns in those examples that will help it make predictions on new scenarios. Sometimes the patterns that it learns are not necessarily the correct patterns. It might pay attention to patterns in the data that aren’t actually reflective of how you want it to be making predictions. Some of the research that we’ve been thinking about is other ways of training machines, so that they’re learning the right patterns or able to handle new scenarios.


What are some examples of how a machine could learn the wrong pattern?

One basic example is if you’re trying to train a machine to be able to detect a species of bird for ecological or recreational purposes. You provide examples of images of birds, [and] the model might learn patterns based on what the bird looks like, but it could also learn patterns based on the environment that the bird is in, for example. Some of these patterns, like the environment, might be reflective of where you will find these birds, like a bird that you often find in a forest or on the water. [They] are somewhat predictive of the bird itself, but they may also not be. Then you might get conflicting signals on what species of bird it is, and the machine might struggle to figure out which signals it should use to classify the bird.

Finding patterns in data is how nearly all machine learning systems work. Large language models might pay attention to patterns of how someone speaks to make predictions, and try to infer something about them, even if we don’t want it to pay attention to that. All these systems are trained with machine learning, and they exhibit the same sorts of strengths and weaknesses.


What are the typical approaches in the field thus far to train machine learning models?

The dominant paradigm is called supervised learning. You collect a lot of data, and have inputs and outputs — basically an example saying, “when you see this input, this is what you should predict.” For example, when you see this image, you should predict that it’s a pigeon, or when you see this audio snippet, this is how you should transcribe it. You clicked many, many of those examples, and then tried to do an optimization in order to train a model that can correctly output the thing that you want it to. 


How does your proposed approach differ from supervised learning?

Some of our research is figuring out alternative ways of providing supervision to models.  Something that we’re working on right now is looking at a model that’s been trained and then seeing if we notice some failure modes in it, [and] figuring out ways to provide small amounts of supervision to correct for those failure modes. In the bird example, if you notice that it’s paying attention to trees in the background when making a prediction, we’re developing an approach where you could verbally say that the model should ignore trees when making its prediction, and then adapt the model more globally to change the pattern it’s using to make predictions. 

[In] typical machine learning methods, you provide lots and lots of these examples. Instead, we’re trying to target a failure mode, and then adjust how it behaves on all of those examples with a single bit of supervision. It’s like a teacher trying to describe a misconception to a student. Hopefully, the student can revise their understanding, and a good student would be able to take that information and change all of their predictions relevant to that misconception.


When you're teaching a human student, you have to make sure that the student understands where they went wrong. Is there a similar challenge in ensuring that misconceptions are communicated to a machine?

Students can typically describe [what they understand] back to you. Having that sort of dialogue with a machine learning model is a lot harder. Although we are starting to move towards machines that can verbalize and describe things like language models and vision language models, it’s definitely a challenge. For the example of trying to tell it to “ignore trees”, it needs to be able to understand what the word “trees” means, and ground it in the images that it sees in its own representation of a tree. Something can potentially go wrong there. 

Fortunately, we have vision language models that can understand images quite well. We leverage those in order to try to understand what the teacher is trying to tell them. In the case of trying to tell it to “ignore trees”, the model might understand that, but might develop a new misconception in trying to make a pattern. Having an iterative process is something that we haven’t quite done yet, but you could definitely imagine that sort of iterative process with teachers and students.


Does this process also depend on a teacher being able to pick up on what the spurious correlation is?

Absolutely, part of the initial work that we’ve done is that we collaborated with researchers working on human-computer interaction to design an interface that allows people to look at the predictions that the model is making, and identify what misconceptions it might have. 

To start, we took all the examples that the model was getting wrong and all the examples that it was getting right, and showed those to a person. Then you compare [them] and see if there’s any noticeable differences between the things that it’s getting right and the things that it’s getting wrong. We found that that interface made it fairly straightforward, even for non-machine learning practitioners, to verbally describe some of the features that the model might be paying attention to, and which it shouldn’t be paying attention to. 


What are the major benefits to this approach?

It might be easier for people to provide supervision to machines, including people who are not experts. It might take less time if you can quickly describe what you want them all to do, rather than providing a ton and ton of data, which can be very expensive to collect. In language models, you have to pay people to write programming code if you want your model to learn how to code, and it takes a lot of time and effort for people to do that. 

The other benefit is just the model might work better if the supervision is more targeted, and is providing more information, or complementary information to a dataset of examples.


If you were talking to someone without a background in machine learning, what would they find counterintuitive or surprising about it?

Especially as someone who works a lot on robotics, a very common unintuitive thing is that you would think that being able to reliably pick up objects and do something basic with them would be fairly straightforward. In contrast, something like debugging C++ code would be something that’s very challenging. It turns out that it’s easier to get a model to debug your C++ code than to pick up objects in the world. 

The part of it that’s unintuitive to people is that we have a lot of data on StackOverflow on C++ programming, and we don’t have a lot of data on how robots should move their arms to pick up objects. Also, even though something is intuitive to humans, like picking up objects — that doesn’t mean that it’s inherently a simple thing to do. Many years of evolution have developed us to be able to do that. Many other animals cannot do that. Sometimes it’s easier for us to work on getting a language model to do something that’s pretty complicated than to get a robot to do something basic by human standards.