Blog/AI

How to Use Examples to Get Precise and Consistent LLM Output

By Iqbal Ali

April 24, 2025 ·

LLMs don’t think. They’re mimics. And, like jazz musicians who spend years listening and absorbing patterns and then riffing on these patterns to create something new, so too do LLMs generate text based on their extensive training dataset.

AI playing saxophone. But not exactly what we were looking for.

The good news is that this mimicry is programmable. By feeding the right examples, we can transform LLMs into effective collaborative partners, guiding their output to better align with our needs.

Before I evangelise examples any further, I should mention that you might not even need them.

Zero-shot prompting (i.e., prompting without examples) will probably remain the primary way we interact with LLMs. And as AI gets better at understanding and anticipating our needs, zero-shot prompting will likely become increasingly effective. But, as things stand, plenty of situations still benefit from using examples.

In this article, I’ll explain the importance of patterns and attention and show some practical examples so you can see when and how to use them in your workflow.

What Problem Do Examples Solve?

In the paper “Do LLMs Understand Ambiguity in Text?“, Aryan Keluskar, Amrita Bhattacharjee, and Huan Liu wrote the following:

“LLMs often struggle with the inherent uncertainties of human communication, leading to misinterpretations, miscommunications, hallucinations, and biased responses.”

They use the following simple example to illustrate ambiguity and the impact of it:

Prompt: What is the home stadium of the Cardinals?

I ran this prompt twenty times. Sometimes, the LLM would explain that there’s both a football and a baseball team called the Cardinals, with different home stadiums. But most of the time, it defaulted to the football team. Give it a try and see what you get. So, why does this happen?

When the LLM receives this prompt, it doesn’t “plan” or “think.” Instead, the prompt is broken down into “tokens” (word fragments)…

How a prompt is broken into tokens

Each token activates a context-sensitive mind map of associations. Each token (like “stadium” or “Cardinals”) sits at the centre of its own network of related associations. The “context-sensitive” part means it considers the relationships between all the tokens in the prompt.

web of associations created by AI

With this mind map of associations, AI uses probabilistic predictions to determine which associations to use according to the context (i.e. the other text). This is a fancy way of saying AI makes assumptions about what associations it should make for each token. This process is repeated for each token until a final output is generated.

AI delivers results

Ambiguity throws off where the LLM focuses its attention. With our ‘Cardinals’ prompt, the AI, drawing on its training data, assumes we mean the football team. But let’s say we actually want information about the baseball team. In that case, the result falls short of our expectations. The AI isn’t being a good collaborator here.

AI confused by the options. Which to choose?

While this can be fixed by being more specific with our prompts, this simple example highlights how ambiguity can trip us up. And when prompts get more complex, these small ambiguities add up, leading to bigger deviations from the answers we want.

overcome by choices

As a result, we not only get a response that’s the wrong answer for us, but we also see inconsistencies in the formatting, structure, and length of the responses each time we run this prompt. I’m sure you’ve had the experience where you’re interested in a single answer (i.e. like a stadium location) and get a wall of text instead.

One solution to this is to get better at zero-prompting. We can sharpen our language by being extremely clear about how we want LLM to focus its attention:

Prompt: What is the home stadium of the Cardinals baseball team? Be brief and concise with your responses just telling me the full team name, name of the stadium, and the location.

Response: Cardinals, Busch Stadium, St. Louis

We get a much better answer. But this doesn’t scale well. We’d need to anticipate all the ways ambiguity could creep into our prompt. The problem is that as prompts get more complex (e.g., asking the LLM for insights from research), it becomes harder to write with that level of clarity. This is even more challenging when the data the AI processes changes constantly, like in an application.

Here is where examples help. Instead of communicating complex instructions to the AI, we can shortcut this process by providing one or more well-chosen examples. Sometimes, a single example is enough:

Prompt: What is the home stadium of the Boston Red Sox? Boston Red Sox, Fenway Park, Boston

What is the home stadium of the Cardinals?

Response: Cardinals, Busch Stadium, St. Louis

The above is a ‘one-shot prompt’ – we gave it a single example: another baseball team. It’s enough to nudge the AI’s attention toward baseball most of the time. I say most of the time because even that isn’t enough in this case.

We can further amplify this nudge, with a ‘few-shot’ prompt, where we provide multiple examples:

Prompt: What is the home stadium of the Boston Red Sox? Boston Red Sox, Fenway Park, Boston

What is the home stadium of the Los Angeles Dodgers? Los Angeles Dodgers, Dodger Stadium, Los Angeles, California

What is the home stadium of the New York Yankees? New York Yankees, Dodger Stadium, Bronx, New York

What is the home stadium of the Cardinals?

Response: What is the home stadium of the Cardinals? Cardinals, Busch Stadium, St. Louis, Missouri

Now, not only is the LLM more consistent, but it is also trained on the format, too. We’ve effectively taught it how to focus and behave. Notice how using this prompt also makes follow-up prompts much simpler. For instance, if we want to know the home stadium of the Chicago Cubs:

Prompt: Chicago Cubs

Response: What is the home stadium of the Chicago Cubs? Chicago Cubs, Wrigley Field, Chicago, Illinois

The response will now match the examples. Let’s try another.

Prompt: Yomiuri Giants

Response: What is the home stadium of the Yomiuri Giants? Yomiuri Giants, Tokyo Dome, Tokyo, Japan

We’ve created a mini-app that accepts a simple instruction (our prompt) and responds with a stable and consistently formatted response.

I don’t know about you, but I’ve learned way more about baseball than I ever wanted to. Let’s move on to use cases, techniques, and more complex applications.

Guiding Focus and Interpretation

User directing the AI focus

In early 2023, I started exploring ways to process large amounts of unstructured text data with AI—things like surveys and reviews. ChatGPT was new, and the idea was to see if it could categorise and label responses within large datasets. I quickly ran into the limits of the technology. One major limitation was the inconsistent accuracy across datasets—a problem that still exists today.

The aforementioned ‘ambiguity’ was a big reason for this limitation. Another factor was where the LLM focused its attention. Take, for instance, this example:

Prompt: You will be given user response input to the question for a sports supplement company: “What do you think of our product?”.

Give me a single category label for this response.

Input: “I really liked it. Like really loved it, it helped performance.”

A typical LLM response to this prompt is this:

Response: Category Label: Positive Feedback

That’s okay in some cases, but the goal of analysing these responses is to pull out useful insights. Ideally, we want to know what the user liked. We can improve the prompt by adding a bit of context, like this:

Prompt: You are the voice of the customer expert. You will be given user response input to the question for a sports supplement company: “What do you think of our product?”. Give me a single category label for this response.

Input: “I really liked it. Like really loved it, it helped performance.”

We’ve provided a persona for the LLM to inhabit. Doing this somewhat guides the LLM’s focus. The response is better as a result:

Response: Category Label: Performance Enhancement

But if we run this multiple times, we’ll see the LLM still slips up, sometimes defaulting back to “Positive Feedback.” When building my text-mining tool (now called Ressada, by the way), I processed thousands of responses like this, so consistency mattered. This is where examples helped.

Prompt: You are a voice of the customer expert. You will be given user response input to the question for a sports supplement company: “What do you think of our product?”.

Give me a single category label for this response.

%EXAMPLE%

Input: “I really like it. I like the nice smell.”

Output: Category label: Smells nice

Input: “I really liked it. Like really loved it, it helped performance.”

Adding the example above trained the LLM to direct its focus accordingly. Since making that change, I saw a dramatic improvement in the accuracy of Ressada’s dynamic categorisation abilities. Usually, this kind of thing would be achieved through expensive model training!

After working with even more datasets, I realised I needed to be even more precise. Providing the LLM with both good and bad examples gave the LLM a further boost in focus.

Prompt: You are a voice of the customer expert. You will be given user response input to the question for a sports supplement company: “What do you think of our product?”. Give me a single category label for this response.

%EXAMPLE (GOOD)%
Input: “I really like it. I like the nice smell.”

Output: Category label: Smells nice

%EXAMPLE (BAD)%
Input: “I really like it. I like the nice smell.”
Output: Category label: Positive feedback

Input: “I really liked it. Like really loved it, it helped performance.”
Output:

So, if you need tighter control like I did, try adding that extra layer of guidance for the LLM.

This approach also works well for summarising text. Sometimes, summaries can focus on all the wrong things, missing the core ideas the text explores. For instance, take the following article I wrote for my comic series “Mr Jones’s Smoking Bones”. I’ve truncated the article for the sake of brevity, but you can read the full article here:

Prompt: Sir Henry Lawrence was a prominent military officer and administrator in the British Empire. He died on 2 July 1857, killed by an artillery shell during a siege in Lucknow, India. … [article excerpt] … How apt then, that the place he now haunts is, in all intents and purposes, an artificial home—an almost Machiavellian deceit turned back upon the coloniser.

Summarise the article in less than 50 words.

The summary I got back was okay, I guess. But the most important aspect of the article is the themes and concepts rather than the events depicted. To get the summary to focus on those themes, I could add an example to the end of that same prompt:

Example: The article begins by introducing the concept of [concept], then for the main body explores [xyz], concluding with [xyz]

By adding that example, I’m guiding the LLM’s attention toward the essay’s key themes without spelling them out. The resulting summary is far more specific and helpful. The bottom line? Think of yourself as a traffic director, steering the LLM’s attention to achieve precision.

Ensuring Structural Consistency

AI mimicking a structure created by the user.

That last example describes how examples can be used to shape the structure of content. While the earlier prompt included an example to guide the focus towards themes, it didn’t necessarily guarantee the output would follow a specific format.

Let’s revisit that previous example of summarisation. This time, I want to ensure the AI returns an output in a specific format, like this.

Prompt: <previous prompt>
%FORMAT EXAMPLE%
##Overall summary

Overall summary of the article, summarising the introduction, main body, and conclusion in separate paragraphs
## Main themes explored

Summarise the article with a focus on the themes and concepts and how that is explored.
## Key insights and points

List 5 key insights and points from the article

When we use the example above, we get an output like this:

Response: ## Overall summary

The article explores the concept of “unhoming” as a multifaceted consequence of British colonialism in India, examining its physical, psychological, and spiritual impacts. It uses the haunting of Sir Henry Lawrence to illustrate the dislocating nature of colonialism, suggesting that both the colonisers and colonised could experience this state of displacement.

## Main themes explored

The article explores the themes of colonialism, imperialism, displacement (“unhoming”), and the psychological and spiritual impacts of colonial rule. It introduces concepts like liminality, and the ghost of Sir Henry Lawrence as a metaphor for the lingering effects of the colonial experience and its unresolved traumas.

## Key insights and points

“Unhoming” is a multi-dimensional consequence of colonialism, encompassing physical, psychological, and spiritual displacement.
The colonisers, despite building structures to feel “at home,” also experienced a form of unhoming, trapped by the contradictions of their project.
The ghost of Sir Henry Lawrence symbolizes the enduring presence of colonial trauma.
Colonial architecture served as both a symbol of imperial power and an attempt to recreate “home” for the colonizers.
The article encourages a confrontation with the unresolved legacies of the colonial past.

See how the example not only specifies a structure (which the LLM respects), but also directs the LLM’s attention. First, an overall summary, then a thematic summary, followed by key insights. That’s a lot packed into a simple prompt.

Notice also, that the example is more of a description of what each section should include, rather than a concrete output. Examples are flexible. If it’s easier to explain where the LLM should focus, we can just do that. If not, we can provide concrete output examples. The key is to be consistent within the prompt itself.

We can use the same technique to structure responses in tables (I’m using CSV formatting in the example):

Prompt: <previous prompt>

%FORMAT EXAMPLE (TABLE)% Overall summary, Main themes explored, Key insights and points Overall summary of the article, summarising the introduction, main body, and conclusion in separate paragraphs, Summarise the article with a focus on the themes and concepts and how that is explored, List 5 key insights and points from the article

Returning a specific structure is particularly useful for applications. When we need the LLM to output in a structured format like markdown (as above) or JSON:

Prompt: You are a voice of the customer expert. You will be given user response input to the question for a sports supplement company: “What do you think of our product?”. Give me a single category label for this response.

%EXAMPLE (GOOD)%

Input: “I really like it. I like the nice smell.”

Output: {‘category’:’Smells nice’,’original_input’:’I really like it. I like the nice smell.’}

%EXAMPLE (BAD)%

Input: “I really like it. I like the nice smell.”

Output: {‘category’:’Positive feedback’,’original_input’:’I really like it. I like the nice smell.’}

Input: “I really liked it. Like really loved it, it helped performance.”

Output:

Using the prompt above, we get the LLM to output in JSON format, ready to power our applications. And, as before, we’re leveraging good and bad examples to guide the LLM even further.

Achieving Tonal Consistency

Hurtful mimicking too close to home. I’m like “blah, blah”. You’re like “baw, baw”.

In 2004, Robert Zemeckis (director of Back to the Future) took on an ambitious project: The Polar Express (don’t worry if you haven’t watched it, it’s not really worth it). He wanted to create a super-realistic animation, so he used motion capture to record the actors’ performances. Every movement, facial expression, and physical detail was based on the real actors. The result? Animated characters that almost looked real.

The key term here is ‘almost’. Most reviews and anyone who’s seen the movie agree that the characters look creepy. By trying to mimic reality, they created something that was neither real nor animated. While the term ‘uncanny valley’ existed before this film, the release popularised it.

The uncanny valley is the unsettling feeling of unease or revulsion experienced when encountering something that closely resembles, but is not quite, human.

Why am I telling you this? Because there’s a strong desire amongst many to use AI to mimic their specific tone of voice. The idea is to have AI write articles in our voice. Now, while there are good use cases and reasons to do this (like, say, for automated email notifications), we shouldn’t be going out there pretending we wrote articles we didn’t. There’s a fine line between productive and deceptive.

But let’s assume our intentions are pure. Something I’ve discussed in an earlier article is how we should break down tasks. It’s reasonable to include a step in our article-writing process where we generate a rough draft in our tone as part of the exploration process. I do this sometimes to understand what an article with a certain structure might look like. I also think it’s fine to use AI in an editorial role.

So, how do we get AI to nail our specific tone? Can we give it examples of our writing and have it mimic our style? Yep. But the results are often hit or miss. We can easily end up with a ‘textual’ version of the uncanny valley. The match is often good enough for a quick rough draft, though. Sharing some relevant writing samples usually works. We don’t need a lot of examples, either.

Prompt: Write in my tone of voice: Example: <example of work />

An alternative approach might be to analyse our writing for its specific tone and style and use that description.

The best way I’ve found to match a tone of voice is to use a rough draft of the target content itself and ask AI to do a light line edit to improve flow and grammar. This likely works because there’s a higher degree of relevance and ‘closeness’ to the final content we want to create. I can share more tips on this later if this is of interest. It’s a big topic. For now, I also want to cover other use cases for tone of voice.

Other common use cases are less about mimicking us and more about hitting the right tone and style for the situation. For example, the tone needed for landing page content is very different from a technical specification document. Take a simple sentence like “Our product is great.” and rewrite it in a few different tones:

Formal: “We are confident that our product offers significant advantages.”
Friendly: “You’re going to love our product!”
Technical: “The product’s performance metrics exceed industry standards.”

Of course, there’s a lot more nuance than that, but you get the idea. Examples can act as a quick shortcut to hitting specific tonal and stylistic beats in a nuanced way. We can simply paste relevant texts or documents into your prompts.

Prompt: INSTRUCTION. Write in a tone similar to the writing in this document

The key here is ‘relevance.’ As mentioned earlier, relevance is all-important when trying to hit a specific tone or style. Using examples that are closer to the requested output seems to work best.

What’s Next?

Examples are powerful ways to refine LLM output. They direct attention, ensure consistency, and align responses closer to our goals. While zero-shot prompting will remain the primary interaction mode with AI, examples enable a higher degree of precision in our intent, making them better collaborators. So, what can we do to maximise AI’s pattern-matching abilities?

During my graphic design degree, we were encouraged to create mood boards for our design projects. Mood boards are collages of images, text, and other samples intended to evoke and communicate a particular “feel”. They’re like projections of what we wanted our final piece to be.

In a way, examples are the equivalent of these mood boards. They force us to think ahead and consider how we want the AI to respond to us.

My course tutor took things a step further. He was always collecting interesting packaging designs, ad copy snippets, graffiti, etc.—basically, anything that he thought might serve some use in the future.

Perhaps we could also collect patterns this way for use as examples for AI? It’s something I’ve started doing at a project level for Ressada. Lorenzo Carreri took this idea further still, creating a database of examples.

Now, this is closer to what my design tutor was doing. This also brings us back to that jazz musician: constantly absorbing interesting patterns and thinking of new and interesting ways of customising and riffing on those patterns. If we do the same thing, we can purposefully interact with AI like a true artist.

References

Mobile reading? Scan this QR code and take this blog with you, wherever you go.

Written By

Iqbal Ali

Experimentation consultant and coach.

Edited By

Carmen Apostu

Head of Marketing at Convert

How to Use Examples to Get Precise and Consistent LLM Output

What Problem Do Examples Solve?

Guiding Focus and Interpretation

Ensuring Structural Consistency

Achieving Tonal Consistency

What’s Next?

References

Product

For CRO Agencies

Resources

Company