![]() There are two different paradigms in the research of prompts, and they share different views: Inspired by the PET papers ( Schick and Schütze, 2021a, b), prompt-based fine-tuning (the critical point is that we still further optimize the parameters) is regarded as a path towards better few-shot learners for small language models (by small, I mean millions instead of billions of parameters, like BERT or RoBERTa) For super-large models like 175B GPT-3 and 11B T5 ( Raffel et al., 2020), since fine-tuning them is hard (this is just my guess, and I never had the chance to do so) and also costly, it is expected instead to fix their parameters and apply them to different tasks by different prompts (either discrete ones or soft ones, which I will talk about later). Scao and Rush (2021) show that a prompt may be worth 100 conventional data points, suggesting that prompts can bring a giant leap in sample efficiency. By closing the gap between the two stages, deploying the pre-trained models on specific tasks becomes much easier, especially for the few-shot case-when you only have a dozen of training examples for a new task, it is hard to fine-tune the pre-trained models and the new task-specific parameters effectively, but the process is much smoother with prompting. For a classification task, we just need to design a template (“It was”) and the expected text responses (we call these label words, e.g., “great” for the positive label and “terrible” for the negative label in the figure). On the other hand, prompting makes it possible for downstream tasks to take the same format as the pre-training objectives, as illustrated in the above figure, and requires no new parameters. In the standard “pre-training and fine-tuning” paradigm, the gap between the pre-training stage and the downstream task can be significant: the objectives are different, and for the downstream tasks, we usually need to introduce new parameters-for example, for a BERT-large model and a binary classification task, it requires an additional set of 1,024 x 2 parameters. At the end of it, I am going to introduce our ACL’21 paper, “ Making Pre-trained Language Models Better Few-shot Learners.” Why we want promptsĪn illustration for pre-training, standard fine-tuning and prompt-based fine-tuning with demonstrations, taking a sentiment classification task as an example (from Gao et al., 2021). In this blog post, I will provide an overview of recent prompt-based methods and my perspective of prompting. It is natural to expect a higher probability from the LM to generate “terrible” than “great” then.Īfter the release of GPT-3, many prompt-related papers emerged, and many of them have discussed prompt-based learning for medium-sized pre-trained models like BERT (BERT-base has 110M parameters, 1000x smaller than the largest GPT-3). ![]() For example, say we want to classify the sentiment of the movie review “ No reason to watch”, we can append a prompt “It was” to the sentence, getting “ No reason to watch. ![]() So what is a prompt? A prompt is a piece of text inserted in the input examples, so that the original task can be formulated as a (masked) language modeling problem. The giant model size of GPT-3 is an important factor for its success, while the concept of prompts and demonstrations also gives us new insights about how we can better use language models. However, the GPT-3 model with 175B parameters ( Brown et al., 2020) has brought a new way of using LMs for downstream tasks: as the title “Language Models are Few-Shot Learners” suggests, GPT-3 can well handle a wide range of tasks with only a few examples by leveraging natural-language prompts and task demonstrations as context, while not updating the parameters in the underlying model. Starting from BERT ( Devlin et al., 2019), fine-tuning pre-trained language models (LMs) with task-specific heads on downstream applications has become standard practice in NLP. Prompting: Better Ways of Using Language Models for NLP Tasks ![]()
0 Comments
Leave a Reply. |