Back

Community Perspective – Josh Tenenbaum

Josh Tenenbaum wants to develop AI that “thinks like a human”.

While that might sound like the stuff of science fiction, the intent isn’t to build uncannily human-like androids. Rather, Tenenbaum is focused on how the human brain and mind work—the processes through which we learn, think, and generalize.

“I want to separate the goal of building AI that ‘thinks like humans’ from the goal of building synthetic humans,” says Tenenbaum. “When you understand aerodynamics, you can build planes and rockets and balloons—you’re not just reproducing big birds. It’s about understanding the general principles from the biology side.”

Josh Tenenbaum is a 2023 AI2050 Senior Fellow and Professor of Computational Cognitive Science at the Massachusetts Institute of Technology in the Department of Brain and Cognitive Sciences, the Computer Science and Artificial Intelligence Laboratory and the Center for Brains, Minds and Machines.

Tenenbaum’s research combines computational modeling with behavioral experiments in adults and children to “reverse engineer” human intelligence and understand it in engineering terms. His AI2050 project aims to create a computational framework for AI that is built based on this mechanistic understanding of human cognition. This work concerns Hard Problem #1, which seeks to overcome scientific limitations in current AI to enable further breakthroughs.

Though current generalist AI models such as ChatGPT can execute a wide array of tasks, Tenenbaum highlights that they do not function in the same way that human cognitive capacity does. Current AI models are often trained with “lifetimes of data” and computational power—while humans learn to make sense of the world in only a few short years, and even demonstrate intelligence before language acquisition.

“Human minds are the only system that, just under normal growth processes, reliably grows into full human intelligence,” says Tenenbaum. “We’d like to understand how.”

The AI2050 initiative gratefully acknowledges Fayth Tan for assistance in producing this community perspective.

Your proposed research concerns creating AI that thinks “like humans do”. I don't think that a lot of people think about how we think, so—how do humans think?

A lot of the rhetoric around AI and all the machine learning progress that’s happened recently is: well, who really knows how the human mind works? It’s a great mystery, we’re never really going to figure that out, so let’s not worry about that. Let’s think about the inputs and outputs to human minds—language being obviously one of the main ones, but also vision—and let’s build some system that can produce the same outputs from the same inputs. If you think of human intelligence as a great big input-output function, then you might try to approach it as a machine learning problem.

But the way we think about it is to try to understand the human mind in terms of computation. What the mind [is] doing is the inspiration for computation. If you go back to Turing, or Boolean logic, or Charles Babbage and Ada Lovelace—the people who invented what we now call modern formal computing, they were all trying to capture their intuition about what the human mind did, how it works. This is part of that same enterprise. When we say we want to build AI that thinks in a human-like way, that’s the goal. The same way that people study the weather, the atmosphere, the climate, by building models of this really complex thing and the underlying mechanisms by which it works—we can study the brain and the mind in that way as well.

All the tools of computer science are effectively now tools to understand the computations [of] the mind—what are the algorithms? What are the data structures, what are the circuits that implement those ideas? It’s approaching the study of the mind and the brain like a computer scientist to try to understand it. We can use that as the basis for building an artificial system, an analogous notion of intelligence.

You spoke about how at the beginning of computer science, the inspiration was thinking about how the human mind works. Somewhere along the way that became more separated—you'd be trying to re-integrate that understanding across subjects?

What has typically happened is somebody on the AI side takes some intuition or some little idea and says, ‘Oh, that’s really interesting, let me see what I can do with that’. Then they kind of run with it and see how far they can go—they start off being inspired, and maybe even guided by some aspect of something that somebody has understood about the mind or the brain. But the point is that the actual guidance from anything to do with the brain was some little teeny thing many decades ago.

In parallel, there’s people working in cognitive science and neuroscience who have continued to try to keep the engineering and the science in much closer connection. That’s very hard to do because they are different languages in different communities. Others in my field, we call ourselves computational cognitive scientists. But the general idea is what you might call “reverse-engineering” human intelligence, which is trying to understand human intelligence, how it works, how it arises, in the same engineering terms that we have on the computer science side.

When you talk about “reverse engineering” human intelligence, are there specific features of that you'd like to recapitulate?

I can lay out three contrasts, three key principles of how I think people are currently thinking about intelligence in AI systems—and how it works almost exactly the opposite, in a really interesting way, in human minds and brains. What we’re trying to capture has to do with the way our minds grow. There’s the difference between scaling up—this is the Silicon Valley, internet-based approach, scale up massive compute massive data—versus the growing up model. Every human starts out as a baby, who is clearly less intelligent in important ways than an adult, and also very intelligent in other ways. We’d like to understand that growing up, as opposed to scaling up.

Currently, in AI, intelligence is seen as the end product of learning. It’s the idea that we have some very simple, even sort of “dumb” learning mechanism, like [predicting] the next token or character in a language model, [or] some pixels in an image. The idea is if you do enough of that on big enough data and a big enough network, at the end of all of that, there’s this surprising emergent phenomena, which is something called intelligence. The thing about emergent phenomena is that they can be fragile, and we don’t really know why they work, where they come from, [or] when they apply.

Point number two is: how do we generalize? Intelligence is also not just about learning, it’s about generalizing to new situations. How does generalization work in AI? Broadly, people would say it depends on similarity to training data—the more new situations are similar to ones you’ve seen in the past, the more we should expect generalization. We shouldn’t necessarily expect systems to do things that are totally different from the data that they’ve been trained on. If the basic idea of these simple mechanisms is that they find patterns in the data, then even when you make them really, really big, they’re still just finding the patterns in the data.

Point three [deals] with the relation between intelligence very generally and language. It’s striking that the things that everyone’s talking about as AI now are some kind of chatbot. These systems are the way AI presented itself to the general public in science fiction—a machine that you talked to, whether it was Star Trek or Space Odyssey. The idea was always: a true AI system was one that you could talk to. Language is the most powerful way that human minds have always interfaced with each other, and AI has always been presented as [being able] to do that. What that all comes down to is the idea that thinking in general—reasoning, planning, problem solving, imagination—in this [modern, ‘scaling up’] approach, derives from language. It’s an emergent property of training on language, carefully constructed and curated language.

To summarize the three things: Intelligence is a surprising emergent phenomena that comes at the end of learning. Its generalization depends on similarity to training data in sort of weak and somewhat fragile ways, although with enough data that can be very powerful. And number three, thinking derives from language and specially curated language.

And then there’s how humans work.

It’s exactly the opposite. Intelligence isn’t the end result of learning. It’s what we start with. There’s ways in which babies’ brains and minds—and these are things that cognitive scientists and neuroscientists study—are intelligent from the very beginning. For the first year or two of our life, we’re not linguistic creatures, but we’re still intelligent—[we] can do all sorts of things, make things, help people out, interact—any parent knows that. It’s what helps humans learn from so much less data and learn and generalize robustly. There’s a key intelligence that’s built in from the beginning, not what emerges at the end. That’s point one.

This connects the second point—how does generalization work beyond experience? It’s about the models that our brains construct of the world. The mind is a modeling engine—it builds internal simulations of the world. In intuitive physics, we have a mental model of how objects interact, forces, how we can interact with the world. We also talk about intuitive psychology, or theory of mind. We understand other people’s beliefs and desires when people do things or say things. We intuitively understand you might have your own beliefs about the world, so my model has to have a model of your model. We can have joint models to enable our collaboration, whether it’s cooperatively having a conversation or doing things in the world together.

Our minds are built from the very beginning to have models of the physical world and the social world, which is remarkable. These are probabilistic models, they’re designed to model the world approximately—to handle uncertainty and risk, and to be able to make good guesses and good bets. A core property of intelligence is to make good guesses about what could happen or might happen, and good bets on how to spend our own time and resources. That means that our generalization to new situations—it’s not unlimited, but it can go much beyond our data of our own experience because it’s filtered through that model.

To understand model-based generalization, think about what happened with COVID. We had this new kind of pandemic, we didn’t have the data, but we had people in epidemiology [and] public health building all kinds of models—large scale models of how epidemics can spread and small scale models, like, is this airborne or is it based on touch? Even before we had data in a new situation, we were able to make some reasonable guesses and bets. Human minds are designed to build models of the world intuitively, and then in science, more formally. That’s a key kind of intelligence that lets us handle new situations without having lots and lots of similar training data.

The third point is where language comes in. Language builds on all of this. All this stuff is present before language and it enables language—it’s exactly the opposite of thinking emerging from language. When children are first learning language, they learn it much more quickly than a large language model. Large language models are trained on hundreds of lifetimes of human data, maybe much more. A human only has one lifetime and only a few years—a three or four year old is already quite linguistic and doing a lot of interaction through language. You have much less data and you have to learn much more robustly. That works because language builds on all this stuff that was already there.

How do those factors come into play when you think about humans being able to “scale up”, not just individually, but socially, culturally, and generationally?

For us, the social and cultural part of how humans learn is key to human intelligence, and it’s key to building AI systems that think and learn in human-like ways. The core notion of intelligence, before language, is a system for modeling the world—making good guesses and good bets. That’s what probabilistic programming tries to capture. That’s a toolkit that’s been around for a while. It’s had a lot of success in modeling human intelligence, but it’s faced its own scaling challenges.

The key part of what I’m working on for my AI2050 project is how to scale up the models that we have. This is where we look to language, including human language, as a medium of human cultural transmission and cultural growth of knowledge. Written language is this incredibly powerful substrate for thinking and sharing thoughts. Most of what we have learned about the world, we didn’t really learn from our direct experience [or] direct sense data, we learned from people telling us things or reading things, right? So we’re trying to do that.

We’re using [models] that are basically like language models—neural language models that are trained on natural language, but also code. We’re taking code-trained LLMs that are really good at translating natural language to code in the probabilistic language of thought, [using] these probabilistic programming languages. We’re trying to use that to model and to “capture” the human language capacity as a means of transmitting and sharing information. It’s not a means of general thinking. It’s an interface, just like these code LLMs are interfaces between natural language and computer code.

The idea is [to] have a system that can build up lots and lots of code [and] can model the world probabilistically. It [doesn’t] build it up by humans writing all the code, but by talking to people and translating what they say into an internal programming language that it can use to model and simulate the world—which is how we’re starting to understand human minds. That allows our AI systems to build on the same cultural base of knowledge that humans do.