Shot prompting in Large Language Model

I Introduction

The advances of Large Language Models (LLM) in the last few weeks and months have brought significant visibility to the capabilities of these LLM’s. The most impressive of the current crop of model is GPT-4 released by OpenAI in March 2023. With an exploding user base of hundreds of millions of users, the usability and utility of these models is now undeniable.

The process of interacting with these LLM’s is referred to as ‘prompting’ and the process of developing these prompts is known as ‘prompt engineering’. Prompt engineering can have a significant effect on the quality of the response from an LLM and is the subject of active research. This skill of prompt engineering is a young skill domain, and it is rapidly changing. That said, there are tried and trusted techniques that form the bed rock, or core set of skills, that every aspiring prompt engineer should be comfortable with before more advanced techniques can be applied.

This is the first in a series of articles on these core skills and is intendent to provide a solid foundation for those that are completely new to a structured approach to prompting beyond basic question/answer prompting, to those that have some exposure in a not so systematic way to structured prompting.

II How LLM’s do what they do and how they can be ‘tuned’

LLM’s take huge sets of data, billions and billions of data points, and feed it into a neural network. The neural network learns the structure of the language by being exposed to such large datasets. The LLM can then predict what the next character in a sequence of characters is most likely to be based on probabilities. So, in essence, LLMs are just next character predication engines, and what they produce is a stream of characters and words that represent what humans would naturally say, that is natural language.

The domain of knowledge of these LLMs is very general and covers everything from law to fictional writing and everything in between. The magic of these models in this general background, because it allows them to perform across a large range of tasks, in a very human way.

These models are known as pretrained models because when we as users interact with these models, they have already been trained, and we simply use them. That said, we can fine tune these models by giving them additional language that setup up patterns that the LLM would tend to follow when it generates its response. This is how we can fine tune a model with what is known as in context training.

We can structure our prompts in such a way as to make the LLM produce an output in a particular way, that can be very useful when you are trying to get a generalist model to do specific things.

This leads us to the topic of this article with the simplest of prompts that have been found to improve the outputs of LLM, shots.

III Shot prompting, the structure

The main idea behind shot prompting is to give the LLM a structure of the problem and the response. This is different to a simple demand prompt, also known as a standard prompt, where you demand a response from the LLM in some unstructured way.

Here is an example of a simple demand prompt:

Prompt

The sky is

Response

Blue

The issue with the above demand prompt is that the structure of the input and output is left to the user and the LLM. Research has shown that simply structuring a prompt and using a colon (1), yes, just the addition of a colon, can significantly improve performance of a prompt.

Here is a structured prompt that makes the input and output more predictable.

Prompt

Question: The sky is

Answer:

Response

Answer: blue

What is the benefit of this type of structuring? Well… the structuring is exactly the benefit! Let’s take this a step further:

Prompt

Question related to the color of an object: The sky is

Answer

Response

Answer: blue

The benefit of this is that its structure allows for any objects color to be determines, such as the below:

Question related to the color of an object: A Tomato is

Answer

Response

Answer: red

This structure allows you to get a predictable and structured output every time. This is important when you are working with more complex or nuanced questions than just stating the color of an object.

Now that we understand the basic idea behind shot prompting, let’s look at the first type, and most widely used for classification tasks, Zero shot prompting.

IV Zero shot prompting, the simplest structured prompt

Now that we get that shots rely on a structure to setup the problem, we can look at the first of the shot prompt idea, the Zero shot prompt.

In fact, the prompt done above for the color of the sky is an example of a Zero shot prompt, this is because the input and output are directly fed to the model with the expectation that the model will output a desirable result. That is, this prompt has no examples that are used to pretrain the model on what a good output looks like. That said, zero shot prompting has been shown to perform very well, sometimes outperforming other shot prompt type that follow for tasks where the model has already been exposed to the structure of the prompt, so, the input/output structure, or as is sometimes used, the Question/Answer prompt structure (2).

Here is a classic example of a Zero shot text classification prompt that is used to determine the sentiment of a sentence.

Prompt

Sentence: The movie was ok

Sentiment:

Response

Sentiment: neutral

OK, that looks ok, right? But what if you only wanted positive and negative as your answers, there is no pertaining that is done with Zero shot prompting that would give the LLM the idea of what a good output looks like, so the model outputs its best effort. The model was not wrong, it just wasn’t what you wanted. And that is 90% of the battle with prompt engineering, getting the model to output what you actually want.

Now, you can extend the structure of the zero shot prompt and for the text classification prompt above you could add an options section of what could be included in the output.

Prompt

Sentence: The movie was ok

Sentiment options: positive, negative

Sentiment:

Response:

Sentiment: negative

This would be moving closer to the output that you want, but you could still get undesirable results, which would need more prompt updates, that’s the engineering part!

There is a different way of showing the model what a good output looks like, and you may have guessed that it just takes showing the model what good output looks like… As we saw in the section above on LLMs and how they can be tuned via prompts, let’s look at a way to finetune a model with a one shot prompt.

Side note: “zero shot” is sometimes referred to in research to mean when a model has not seen a situation before, usually in true classification models (think image classification models, that categories a dog from a cat). In this context, zero shot is the ability of a model to categories something that it has never seen before based on what has been seen. A good example of this is to categories a zebra as a striped horse, even though the model has never seen a zebra before (3).

V One shot prompting

One shot prompting effectively gives the model a single example of what the complete structure of the output should look like. Let’s extend the sentiment classification prompt above to show the model what the desired output looks like.

Prompt

Sentence: The meal was good

Sentiment: positive

Sentence: The movie was ok

Sentiment:

Response

Sentiment: positive

As you can see from the above, a single complete example of the structure of the input and the output is given to the model before the incomplete section is added. This is how you can fine tune an already tuned model. Note that the options section was removed, but we were not able to give both options in this single example construct.

One shot prompting is not used very often, but when it is used it can be a powerful tool in very specific domains, especially when it is quite cumbersome to come up with an example in the first place. Take the example of DePlot that was used to extract information from charts (4).

The power of the single shot is that it gives the model a view into what a good output would look like. Many times, a single example, or shot, is sufficient to get excellent results out of a model. However, sometimes there may be different outputs that are equally as good, and this is where the next shot prompt type comes in, few shot prompting.

VI Few shot prompting, if one is good… more must be better

The idea here is that if one is good, then more must be better right?

That is true to an extent, but if the output is simple, then adding more shots as examples may be redundant and could even confuse the model!

In the first prompt related to color of objects, a single shot prompt would probably do the trick and would give acceptable results on a repeatable basis.

However, with the text classification prompts where there are different acceptable outputs, adding additional prompts would be beneficial.

Prompt

Sentence: The meal was good

Sentiment: positive

Sentence: The play was awful

Sentiment: negative

Sentence: The movie was ok

Sentiment:

Response:

Sentiment: negative

This approach to prompting is providing more in context training to the model. You will hear about in context training, that is the prompt content that comes before the intention section of the prompt where you are describing your problem, and showing the model what the output should look like.

So, pretraining of the model is also called in context learning, it’s the same thing but researchers prefer to call it pretraining and prompt engineers tend to call it in context learning.

So, there you have the different shot prompt types, zero shot, one shot and few shot.

VII Key updates from research on shot approaches

Questions or statements (5)

In the research paper Learning from Task Descriptions, it was noted that it does not matter if you give a task description as a statement or if you couch the request as a question. This means that form our Zero shot example above, it would be equivalent to the model to prompt a statement as:

Prompt (statement)

The sky is

Response

blue

This world be the same as couching the request as a question such as:

Prompt

What color is the sky?

Response

Blue

Now, that said, the article is all about generalizing large language model to learn from descriptions of tasks that have already been embedded into the model. As most LLM’s have been exposed to loads of question-and-answer type scenarios, it would be preferable to use question and answer format wherever possible. That way the model will have large preexisting set up examples to base its response on.

Description or masking (6)

There are two ways to get a model to give you a response. One is to provide a description of what you are after, the other is to provide a mask where the model will fill in the indicated blank.

Consider the Zero shot example above again, the descriptive approach would be given as (not we use the question format from A. above):

Prompt

What color is the sky?

Response

blue

For a mask style prompt, this can be done more concisely by using a mask in this way:

Prompt

Sky <COLOR>

Response

Blue

As the context window that you have available is limited, it is better to use the least number of words (or in LLM speak ‘tokens’) and as such using masks is preferred. If more is better, less is more!

VII When to use which approach

Research results have shown a clear order of how these different approaches perform. In GENERAL, zero shot is not as good as one shot… one shot is not as good as a few shot.

Further, from the section directly above, it is better to use the question-and-answer format for your shots, that would be a very familiar format for the LLM to follow and should give good results.

Remember, this all depends on your ability to come up with examples that are good quality, as we know, context really matters for prompts, and giving bad examples will be much worse than giving no examples at all!

Also, you have to construct these examples and that may be time consuming to do and may be prone to fat finger trouble.

Lastly, the complexity of your tasks also matters in terms of how many shots an LLM will need in a prompt to give you good results. And that is the secret to prompting, stop when you get to good enough, leave perfection to the researchers, we as prompt engineers are interested in getting stuff done!

One thing to remember is, the more shots you give your prompt, the more STRUCTURE you are proving to the model and the LESS creative the model will be. You are paying a novelty price with the more prompts you give, so be aware of this, and maybe start with a zero- or one-shot prompt to see how the model performs before you jump straight to a large set of examples in a few shot approach (7).

IX Conclusion

Shot prompting is the most basic of prompting techniques, and its simplicity is its power. For the vast majority of non-reasoning tasks, such as classification and composition, shot prompts are more than enough to get you good results. When you can get away with a zero- or one-shot prompt, do so. But when you are working on a more demanding task, or the model is not giving you what you want, you can often improve your results by increasing the number of shots you give as examples and the additional effort will bear fruit.

I hope you have found this article informative, and I hope to see you again in another one of the series on prompt engineering.

References in this post

1 Reynolds, L., & McDonell, K. (2021). Prompt programming for large language models: Beyond the few-shot paradigm. In arXiv [cs.CL]. http://arxiv.org/abs/2102.07350

2 Wei, J., Bosma, M., Zhao, V. Y., Guu, K., Yu, A. W., Lester, B., Du, N., Dai, A. M., & Le, Q. V. (2021). Finetuned language models are zero-shot learners. In arXiv [cs.CL]. http://arxiv.org/abs/2109.01652

3 https://huggingface.co/tasks/zero-shot-image-classification

4 Liu, F., Eisenschlos, J. M., Piccinno, F., Krichene, S., Pang, C., Lee, K., Joshi, M., Chen, W., Collier, N., & Altun, Y. (2022). DePlot: One-shot visual language reasoning by plot-to-table translation. In arXiv [cs.CL]. http://arxiv.org/abs/2212.10505

5 Orion Weller, Nicholas Lourie, Matt Gardner, and Matthew E. Peters. 2020. Learning from Task Descriptions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1361–1375, Online. Association for Computational Linguistics.

6 https://huggingface.co/tasks/fill-mask

7 Reynolds, Laria, and Kyle McDonell. “Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm.” Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, May 2021. Crossref, https://doi.org/10.1145/3411763.3451760