Phase 1LLM Foundations·7 min read

Temperature, Top-P, and Frequency Penalties

Phase 1 of 8

Welcome back! Now that you understand how LLMs see text (tokenization) and process it (transformers), let's learn how to control the creativity and randomness of their outputs.

Think of these as the "personality knobs" for your AI!

Coming from Software Engineering? Temperature is like a randomness seed on steroids. If you've tuned randomization in load balancers or A/B testing, you'll get this: temperature=0 is deterministic (same input → same output, great for testing), while higher values add controlled randomness — useful when you want creative variety.


The Big Picture: How LLMs Generate Text

Before diving into the controls, let's understand what we're controlling.

When an LLM generates text, it doesn't just pick the "best" next word. Instead, it:

  1. Calculates a probability for EVERY possible next token
  2. Uses sampling parameters to pick from those probabilities
  3. Repeats until done

Temperature: The Creativity Dial

Temperature is the most important sampling parameter. It controls how "random" or "creative" the output is.

The Intuition

How Temperature Works (Simplified)

Temperature adjusts the probability distribution:

  • Low temperature (0-0.3): Makes high-probability tokens MUCH more likely
  • Medium temperature (0.5-0.8): Balanced selection
  • High temperature (1.0+): Flattens probabilities, more randomness
# script_id: day_004_temperature_and_sampling_part1/probability_distribution_concept
# Conceptual example (not actual API code)

# Original probabilities
probabilities = {
    "mat": 0.40,
    "floor": 0.25,
    "bed": 0.20,
    "roof": 0.10,
    "moon": 0.05
}

# With temperature = 0.1 (very focused)
# "mat" becomes almost certain (~95%)

# With temperature = 1.0 (neutral)
# Probabilities stay roughly the same

# With temperature = 2.0 (very random)
# All options become more equal (~20% each)

Visual Comparison

Code Example: Temperature in Action

# script_id: day_004_temperature_and_sampling_part1/temperature_in_action
from openai import OpenAI

client = OpenAI()

def generate_with_temperature(prompt: str, temperature: float):
    """Generate text with specified temperature."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=50
    )
    return response.choices[0].message.content

prompt = "Write a one-sentence story about a robot:"

# Let's see how temperature affects output
temperatures = [0.0, 0.5, 1.0, 1.5]

for temp in temperatures:
    print(f"\n--- Temperature: {temp} ---")
    # Generate 3 times to see variation
    for i in range(3):
        result = generate_with_temperature(prompt, temp)
        print(f"  {i+1}. {result}")

Expected Output:

--- Temperature: 0.0 ---
  1. A robot discovered it could dream and wondered what it meant to be alive.
  2. A robot discovered it could dream and wondered what it meant to be alive.
  3. A robot discovered it could dream and wondered what it meant to be alive.

--- Temperature: 0.5 ---
  1. A robot discovered it could dream and wondered what it meant to be alive.
  2. A lonely robot found friendship in an abandoned teddy bear.
  3. A robot discovered it could dream and pondered the meaning of existence.

--- Temperature: 1.0 ---
  1. The robot baked cookies for the first time and accidentally invented a new element.
  2. A robot wandered through the desert searching for its creator.
  3. In the year 3000, a robot learned to laugh at its own jokes.

--- Temperature: 1.5 ---
  1. Rusty the robot accidentally sneezed bolts into a time vortex of dancing electrons.
  2. A mechanical dreamer composed symphonies from static and stardust memories.
  3. Robot-7 transcended its circuits through interpretive welding dance.

Notice how:

  • Temp 0: Same output every time
  • Temp 0.5: Slight variations
  • Temp 1.0: Creative but coherent
  • Temp 1.5: Wild and unexpected

Temperature Use Cases


Top-P (Nucleus Sampling): The Probability Filter

Top-P is another way to control randomness, but with a different approach.

The Intuition

Instead of adjusting ALL probabilities (like temperature), Top-P cuts off the long tail of unlikely options.

How Top-P Works

  1. Sort all tokens by probability (highest first)
  2. Add up probabilities until you reach P (e.g., 0.9 = 90%)
  3. Only sample from those tokens
# script_id: day_004_temperature_and_sampling_part1/top_p_concept
# Conceptual example

probabilities = [
    ("mat", 0.40),
    ("floor", 0.25),
    ("bed", 0.20),
    ("roof", 0.10),
    ("moon", 0.03),
    ("banana", 0.02),
]

def apply_top_p(probs, p=0.9):
    """Keep only tokens that make up top P probability."""
    sorted_probs = sorted(probs, key=lambda x: x[1], reverse=True)

    cumulative = 0
    filtered = []

    for token, prob in sorted_probs:
        if cumulative < p:
            filtered.append((token, prob))
            cumulative += prob
        else:
            break

    return filtered

result = apply_top_p(probabilities, p=0.9)
print("Tokens kept:", result)
# Output: [('mat', 0.40), ('floor', 0.25), ('bed', 0.20), ('roof', 0.10)]
# "moon" and "banana" are excluded!

Top-P in API Calls

# script_id: day_004_temperature_and_sampling_part1/top_p_api_call
from openai import OpenAI

client = OpenAI()

def generate_with_top_p(prompt: str, top_p: float):
    """Generate text with specified top_p."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        top_p=top_p,
        temperature=1.0,  # Keep temperature neutral
        max_tokens=50
    )
    return response.choices[0].message.content

prompt = "Complete this sentence creatively: The scientist discovered that..."

# Compare different top_p values
print("--- Top-P = 0.1 (Very focused) ---")
print(generate_with_top_p(prompt, 0.1))

print("\n--- Top-P = 0.5 (Moderate) ---")
print(generate_with_top_p(prompt, 0.5))

print("\n--- Top-P = 0.95 (Open) ---")
print(generate_with_top_p(prompt, 0.95))

Warning: Temperature + Top-P Together OpenAI recommends setting only one of temperature or top_p, not both. When both are set, they interact in unpredictable ways — temperature reshapes the probability distribution, then top_p filters it. This double transformation makes output behavior hard to reason about. Best practice: Use temperature for most use cases (it's more intuitive). Only switch to top_p when you specifically need nucleus sampling behavior. If you must use both, keep one at its default value (temperature=1.0 or top_p=1.0).