Basic LLM Settings

🟦 This article is rated medium

Reading Time: 5 minutes

Last updated on March 10, 2025

Takeaways

Control output randomness: Adjusting settings like Temperature and Top P can help manage the creativity and predictability of AI outputs.
Structure and length: Maximum Length and Stop Sequences allow you to control how long or structured the responses are.
Reduce repetition: Frequency and Presence penalties ensure varied outputs by discouraging repeated words.
Optimize LLM settings: Knowing how to adjust these settings helps fine-tune the behavior of the language model for specific tasks.

What are LLM Settings?

We can use certain large language models (LLMs) settings to control various aspects of the model, such as how 'random' it is. These settings can be adjusted to produce more creative, diverse, and interesting output. The Temperature, Top-P and Max Length settings are the most important, but we describe every setting that the OpenAI Playground allows you to modify.

Tip

For an in-depth understanding of LLM settings, read our detailed blog post with a guide to LLM parameters and practical examples using the OpenAI API in Python.

Temperature

Temperature regulates the unpredictability of a language model's output. With higher temperature settings, outputs become more creative and less predictable as it amplifies the likelihood of less probable tokens while reducing that for more probable ones. Conversely, lower temperatures yield more conservative and predictable results. The following example illustrates these differences in output:

Prompt

What are 10 weird, unique, and fun things to do at the beach? Make a list without descriptions.

ChatGPT Output (Temperature=0, Top-P = 1)

Build a sandcastle
Collect seashells
Play beach volleyball
Fly a kite
Have a picnic
Try paddleboarding
Play frisbee
Go snorkeling
Take a long walk along the shore
Watch the sunset

ChatGPT Output (Temperature=1.5, Top-P = 1)

Wade in the shallow waters blowing bubblegum bubbles
Create intricate sandcastle sculptures
Join an improvised beach volleyball match
Take a scenic hike along the cliffs nearby
People-watch and guess each person's beach-related story
Collect seashells to create your own piece of beach art
Learn to surf or try other water sports
Kickstart a spontaneous sand fight
Attempt shoreline fishing like the locals
Engage in friendly competition by organizing a sandcastle building contest

The output produced with a higher temperature setting offers a more imaginative and diverse list of activities to do at the beach. This can be very useful for creative writing.

Note

If you adjust the temperature too high, you can get non-sensical outputs like Start a sponge-ball baseball home run contest near Becksmith Stein Man Beach.

Top-P

Top-P is a setting in language models that helps manage the randomness of their output. It works by establishing a probability threshold and then selecting tokens whose combined likelihood surpasses this limit.

For instance, let's consider an example where the model predicts the next word in The cat climbed up the ___. The top five words it might be considering could be tree (probability 0.5), roof (probability 0.25), wall (probability 0.15), window (probability .07) and carpet, with probability of .03.

If we set Top-P to .90, the AI will only consider those tokens that cumulatively add up to at least ~90%. In our case:

Adding tree -> total so far is 50%.
Then adding roof -> total becomes 75%.
Next comes wall, and now our sum reaches 90%.

So, for generating output, the AI will randomly pick one among these three options (tree, roof, and wall) as they make up around ~90 percent of all likelihoods. This method can produce more diverse outputs than traditional methods that sample from the entire vocabulary indiscriminately because it narrows down choices based on cumulative probabilities rather than individual token.

Maximum Length

The maximum length is the total # of tokens the AI is allowed to generate. This setting is useful since it allows users to manage the length of the model's response, preventing overly long or irrelevant responses. The length is shared between the USER input in the Playground box and the ASSISTANT generated response. Notice how with a limit of 256 tokens, our PirateGPT from earlier is forced to cut its story short mid-sentence.

Note

This also helps control costs if you're paying for the use of the model through the API rather than using the Playground.

Other LLM Settings

There are many other settings that can affect language model output, such as stop sequences, and frequency and presence penalties.

Stop Sequences

Stop sequences tell the model when to cease output generation, which allows you to control content length and structure. If you are prompting the AI to write an email, setting "Best regards," or "Sincerely," as the stop sequence ensures the model stops before the closing salutation, which keeps the email short and to the point. Stop sequences are useful for output that you expect to come out in a structured format such as an email, a numbered list, or dialogue.

Frequency Penalty

A frequency penalty is a setting that discourages repetition in the generated text by penalizing tokens proportionally to how frequently they appear. The more often a token is used in the text, the less likely the AI is to use it again.

Presence Penalty

The presence penalty is similar to the frequency penalty, but flatly penalizes tokens based on if they have occurred or not, instead of proportionally.

Determinism Note

Even when Temperature and Top-P are set completely to zero, the AI may not give the same exact output every time. This is due to randomness in GPU (graphics processing unit) calculations being done in the AI's "brain".

Conclusion

In conclusion, mastering settings like temperature, Top-P, maximum length, and others is essential when working with language models. These parameters allow for precise control of the model's output to cater to specific tasks or applications. They manage aspects such as randomness in responses, response length, and repetition frequency among other things—all contributing towards improving your interaction with the AI.

FAQ

What are different LLM settings that I can control?

Some LLM settings that can be updated in the OpenAI playground are Temperature, Top-P, maximum length, stop sequences, and frequency and presence penalties.

Why should I change my LLM settings?

Understanding how to control the parameters of your language models can help you develop a more complex and unique user interaction with your chatbots, as well as set configurations that can contribute to more reliable AI responses.

Partly written by jackdickens382 and evintunador

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Also known as Nucleus Sampling ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses