LLM - HypyerSetting

Posted Apr 24, 2023 Updated Dec 31, 2024

By Grace L

9 min read

LLM - HypyerSetting

Table of contents:

LLM - HypyerSetting

HypyerSetting

Both top_p and temperature are parameters used to control the randomness and creativity of text generated by a language model, but they do so in different ways:

Temperature
- affects how confident or random the model is overall
  .
- Lower values make the model more focused and repetitive;
- higher values make it more exploratory and varied.
- affects the predictability of each word:
- Lower temperatures lead to more predictable and repetitive outputs,
- higher temperatures increase the variety and creativity.
Top_p
- limits the pool of words the model can choose from based on their combined probabilities
  , focusing on the most likely options until their cumulative probability meets the threshold.
- affects the range of words the model considers:
- Higher top_p values allow for a broader selection of words,
- lower values restrict the model to the most likely words, reducing diversity.
Top_k
- limits the number of words the model considers at each step, whereas top_p sets a probability threshold for selecting words.

Easy Analogy:

Temperature: Think of it as adjusting a spice level in food. Low temperature is like making a dish very mild (predictable and consistent), while high temperature is like making it very spicy (random and varied flavors).
Top_p: Think of it as setting a cut-off for a buffet. Top_p = 0.9 is like saying you can choose from the top 90% of the most popular dishes, ignoring the less popular ones.
Top_k: Imagine you are at a voting booth with a list of candidates. If you have a list of the top 3 candidates (Low Top_k), you are limited to choosing from these few. If you have a list of the top 50 candidates (High Top_k), you have a lot more options to choose from, making the decision less predictable.
Length_penalty: Imagine you’re writing a story. A high length penalty makes you end your story quickly, while a negative length penalty encourages you to add more details and lengthen the story.
Frequency_penalty: Think of it as a reminder to avoid repeating the same words. With a high frequency penalty, you’re encouraged to use synonyms and vary your language.
Random_seed: Consider it a recipe book. Using the same recipe (seed) will always give you the same dish (output). Changing the recipe (seed) gives you a different dish, and not using a recipe gives you a new, unique dish each time.

Using these parameters, you can control the balance between creativity and predictability in the model’s outputs.

Top_k

What it does: Top_k controls the randomness of the model's output by limiting the number of possible next words the model can choose from to the top k most probable words.
How it works: When generating text, the model looks at all possible next words and ranks them by their probabilities. It then only considers the top k words for selection and discards the rest.
- a smaller k value means the model is more focused and less diverse,
- a larger k allows for more variability.
Examples:
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - Low Top_k (3): The model can choose from the top 3 most probable next words, leading to more focused and less diverse outputs. "Once upon a time, in a distant land, there lived a wise old owl who loved to teach young animals about the forest."
  - High Top_k (50): The model can choose from the top 50 most probable next words, allowing for more diverse and creative outputs. "Once upon a time, in a distant land, there lived a wise old owl who loved to explore ancient ruins, discover hidden treasures, and solve the mysteries of the night sky."

Temperature

What it does: Temperature controls the randomness of the model's output by scaling the probabilities of the next word choices .
How it works:
- When the temperature is set to a low value (e.g., 0.1), the model becomes more confident and deterministic, often choosing the most probable words, which leads to more conservative and repetitive outputs.
- When the temperature is high (e.g., 1.0 or higher), the model’s output becomes more random and diverse, as it gives more consideration to less probable words.
Example:
- Imagine you are trying to pick a word to complete the sentence "The cat is on the...":
  - Low temperature (0.1): The model might always pick "roof" if it’s the highest probability word.
  - High temperature (1.0): The model might pick "roof," "bed," "table," or even "mountain," providing more variety.
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - Low Temperature (0.2): The output will be more deterministic and less diverse. "Once upon a time, in a distant land, there lived a wise old owl who loved to teach young animals about the forest and help them learn new things."
  - High Temperature (1.0): The output will be more creative and diverse. "Once upon a time, in a distant land, there lived a wise old owl who loved to explore hidden caves, decipher ancient scrolls, and unravel the mysteries of the stars."

Top_p (Nucleus Sampling)

What it does: Top_p sets a threshold to choose from a subset of the most probable next words whose cumulative probability reaches a specified value (p).
How it works: The model sorts all possible next words by their probabilities and then selects from the smallest subset of these words that together have a cumulative probability of p.
- For example, if top_p is set to 0.9, the model will consider only the top 90% probable words.
Example:
- Imagine you are trying to pick a word to complete the sentence "The cat is on the...":
  - Top_p = 0.9: The model might choose between “roof,” “bed,” and “table” if these words together account for 90% of the probability distribution, ignoring less likely words like “mountain.”
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - High Top_p (0.9): The model considers a larger pool of words, leading to more variety. "Once upon a time, in a distant land, there lived a wise old owl who loved to watch the moonlight dance on the river, listen to the whispers of the wind, and share stories with the night creatures."
  - Low Top_p (0.5): The model considers a smaller, more probable set of words, leading to less diversity. "Once upon a time, in a distant land, there lived a wise old owl who loved to teach young animals and tell them stories about the forest."

Length_penalty

What it does: Length_penalty modifies the probability of generating longer or shorter sequences.
How it works: When generating text, the model typically prefers certain lengths for sequences. A length penalty adjusts the log probabilities of sequences based on their lengths.
- Positive values penalize longer sequences, encouraging shorter responses,
- negative values do the opposite.
Example:
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - Length_penalty = 1.0 (neutral): The model generates text without any bias toward length. "Once upon a time, in a distant land, there lived a wise old owl who loved to share stories."
  - Length_penalty = 2.0 (positive, penalizes length): The model prefers shorter outputs. "Once upon a time, there lived an owl."
  - Length_penalty = -1.0 (negative, rewards length): The model generates longer, more elaborate outputs. "Once upon a time, in a distant land surrounded by tall, ancient trees and flowing rivers, there lived a wise old owl who loved to share stories, explore the depths of the forest, and gather the animals for nightly tales."

Frequency_penalty:

What it does: Frequency_penalty discourages the model from repeating the same words or phrases.
How it works: When a word is used multiple times, the model reduces the probability of that word being used again. This helps generate more varied and interesting text.
Example:
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - Frequency_penalty = 0.0 (no penalty): The model may repeat words more often. "Once upon a time, in a distant land, there lived a wise old owl who loved to tell stories. The owl's stories were loved by all the animals."
  - Frequency_penalty = 1.0 (high penalty): The model avoids repeating words. "Once upon a time, in a distant land, there lived a wise old owl who enjoyed narrating tales. The wise bird's narratives were cherished by all creatures."

Random_seed:

What it does: Random_seed sets the starting point for the random number generator used in text generation.
How it works: By setting a specific seed value, you can ensure that the text generation process is repeatable.
- Using the same seed with the same input will always produce the same output.
- Different seeds produce different outputs.
Example:
- language model using the example prompt: "Once upon a time, in a distant land, there lived a wise old owl who loved to..."
  - Random_seed = 42: Generates a specific, repeatable output. "Once upon a time, in a distant land, there lived a wise old owl who loved to share wisdom with the forest creatures."
  - Random_seed = 123: Generates a different specific, repeatable output. "Once upon a time, in a distant land, there lived a wise old owl who enjoyed teaching the young animals about the mysteries of the forest."
  - Random_seed not set (random each time): Generates different outputs each time.
    - Run 1: "Once upon a time, in a distant land, there lived a wise old owl who spent its days exploring ancient ruins."
    - Run 2: "Once upon a time, in a distant land, there lived a wise old owl who delighted in singing songs to the stars."

Combined Effects:

Low Temperature (0.2) and High Top_p (0.9):
- “Once upon a time, in a distant land, there lived a wise old owl who loved to teach young animals about the forest, share wisdom, and help them grow."
High Temperature (1.0) and High Top_p (0.9):
- “Once upon a time, in a distant land, there lived a wise old owl who loved to fly over misty valleys, converse with mystical spirits, and discover new adventures with her forest friends."
Low Temperature (0.2) and Low Top_p (0.5):
- “Once upon a time, in a distant land, there lived a wise old owl who loved to teach young animals and help them understand the forest."
High Temperature (1.0) and Low Top_p (0.5):
- “Once upon a time, in a distant land, there lived a wise old owl who loved to teach, explore, and guide the forest creatures through their daily lives."

51AI, LLM

AI ML

This post is licensed under CC BY 4.0 by the author.

LLM - HypyerSetting

HypyerSetting

Top_k

Temperature

Top_p (Nucleus Sampling)

Length_penalty

Frequency_penalty:

Random_seed:

Combined Effects:

Trending Tags