Qian Yang studies the interaction between humans and AI systems with the goal of helping humans to make better use of increasingly complex AI systems. This work is relevant to Hard Problem #6 (to increase the ability of all the world’s population to make use of AI). She is an Assistant Professor in Computing and Information Science at Cornell University and the recipient of an AI2050 Early Career Fellowship.
Yang is a co-author of the research paper “Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts,” which was presented at the 2023 ACM conference on Computer-Human Interaction (CHI) in Hamburg, Germany. The research, which was partially funded by Yang’s AI2050 fellowship, explores the following question: Millions have tinkered with instructing pre-trained large language models (LLMs) to answer their questions. Does that mean they can all design effective instructions and create robust GPT-powered applications?
Yang and collaborators created BotDesigner, software that helps people create GPT-3-based chatbots using natural language instructions only. In addition to what existing tools (like those from OpenAI) can do, BotDesigner allows people to test the robustness of their chatbots (how well they respond to unexpected inputs) for potential human-chatbot conversations. Yang et al. followed the progress of 10 people as they attempted to create a tutorial chatbot using BotDesigner. Four individuals were professional programmers, three were amateur programmers, and three had no or minimal programming experience.
“Our findings suggest that, while end-users can explore prompt designs opportunistically, they struggle to make robust, systematic progress,” the paper reports. “Their struggles echo many well-known struggles observed in end-user programming systems and non-expert users of interactive machine learning systems.” Moreover, people’s understandable inclinations to design prompts that resemble human-to-human instructions become a hindrance in prompting GPT. For example, people tend to be polite with GPT. They tend not to repeat their prompts even after being told repetition can help get GPT to do what they want it to do.
Learn more about Qian Yang:
Congratulations on having your paper presented at CHI! What is the difference between a basic Large Language Model (LLM) such as ChatGPT and one that has been customized?
“Basic” LLMs like ChatGPT are what we sometimes call “Foundation Models” — models that people can train and improve further using additional data or “prompts” (natural language instructions and examples of desired responses). Our tool, BotDesigner, helps people with or without formal AI training to improve a foundation model using prompts only.
Prompting is an interesting technique. LLMs are trained on an enormous amount of text. Prompts tell the model, “This is the kind of output I want, so generate responses using the training data similar to this.” In other words, prompts directly bias the model towards generating people’s desired outputs without changing the model itself. Therefore, it is very convenient, accessible, and compelling, but such improvement is also not as stable as other laborious and data-intensive techniques for improving LLMs.
Do you think that many people using ChatGPT realize that there is a large prompt that is sent to the model before anything that they type?
I think many – millions according to OpenAI – have prompted ChatGPT without being fully aware that what they are doing is “prompting.” When you provide ChatGPT with a paragraph of text, and ask, “Summarize it,” both the paragraph and the request to “Summarize it” are prompts. Both the paragraph’s content and how you frame the request can determine ChatGPT’s summary quality. For example, the request “Summarize it. Stick to the original text.” is more likely to get ChatGPT to generate a factual summary than just requesting “Summarize it.”
Our research shows that many people are unaware that they are “prompting.” Instead, they are more inclined to instruct ChatGPT in ways similar to instructing another human. Even after we remind them that repetition helps (e.g., telling ChatGPT “be succinct” three times is more likely to get ChatGPT to generate succinct responses), people are reluctant to do so, feeling it’s “impolite.” This reminds me a lot of the late Clifford Nass’s research back in the 1990s that showed people interact with computers politely as in real social relationships. How fascinating is that?
Since chatbots already have a lot of capabilities, why would it be important for people or businesses to customize them with additional prompts?
Right, LLMs can already provide unprecedentedly fluent answers to people’s questions. But it would be deeply problematic to consider that off-the-shelf LLMs like ChatGPT – with simple prompts or instructions — are ready to answer people’s questions reliably. The most well-known problem is probably GPT’s hallucination: GPT can provide inaccurate or false information with a confident tone. Fixing such problems with techniques like prompting and fine-tuning is important and necessary.
Your paper is squarely aligned with Hard Problem #6, which seeks to improve “access, participation, and agency in the development of AI and the growth of its ecosystem and its beneficial use.”
Totally. The accessibility of recent LLMs such as ChatGPT is arguably one of the biggest innovations such LLMs bring. For a long time, our research shows that the lack of data and the lack of expert – or even competent – machine learning engineers (rather than the lack of capable AI algorithms) was the biggest barrier for companies to innovate with AI. Both data and expert AI engineers are expensive, highly-sought-after resources. LLMs raise the ceiling of what people can achieve without them. That’s truly exciting.
The ease of access to LLMs also opens up new risks, though. This research explored one of these risks: the risk of people creating brittle – if not ineffective – prompts. [Brittle prompts work sometimes, but only for particular inputs; ineffective prompts rarely if ever work.] I am very excited to expand this line of research and explore ways to harness LLMs’ accessibility, while mitigating their risks.
Even though some of the participants in your study were not programmers, they needed many skills that programmers must exhibit, like the ability to think of a solution as a step-by-step process, and how to engage in systematic debugging (resolving defects in computer programs). Do you think that the need to have these skills will mean that only people who know how to think like programmers or scientists will be able to deploy AI into novel environments in the near future?
To a degree, yes. Some computational and AI literacy are necessary for people to deploy AI in novel environments. That’s not to say people who do not know how to write programming code cannot innovate with AI. Instead, people need to know, at least, how to assess an AI system’s capabilities and limitations. It is very dangerous if one thinks ChatGPT is capable of medical advice just because they asked GPT three questions and it responded with correct-sounding answers.
It took us many years to start to recognize how search engines and social media newsfeed rankers lead to unintended consequences like misinformation and biases. We need to be equally, if not more, vigilant about LLMs. That vigilance so far does require some computer science and AI knowledge.
Based on what you and your colleagues have learned, what’s next?
Building upon this research, my team is actively exploring the potential of LLMs in providing value to people who are not computer scientists. We are working with clinical scientists, for example, to see how they might use LLMs to help them formulate better hypotheses for new clinical trials. These scientists’ knowledge needs are so niche and specialized, traditionally it has been difficult to create bespoke AI systems to help them. We are exploring how LLMs might help overcome these barriers.
My colleague Natalie Bazarova and I are also leading the Digital and AI Literacy Initiative, trying to bring computational and AI literacy to traditionally underserved communities such as teens and immigrant communities. The goal is to help more people harness the advances in AI to serve their own needs.