Welcome back! Now that you understand how LLMs see text (tokenization) and process it (transformers), let's learn how to control the creativity and randomness of their outputs.
Think of these as the "personality knobs" for your AI!
Coming from Software Engineering? Temperature is like a randomness seed on steroids. If you've tuned randomization in load balancers or A/B testing, you'll get this: temperature=0 is deterministic (same input → same output, great for testing), while higher values add controlled randomness — useful when you want creative variety.
The Big Picture: How LLMs Generate Text
Before diving into the controls, let's understand what we're controlling.
When an LLM generates text, it doesn't just pick the "best" next word. Instead, it:
- Calculates a probability for EVERY possible next token
- Uses sampling parameters to pick from those probabilities
- Repeats until done
Temperature: The Creativity Dial
Temperature is the most important sampling parameter. It controls how "random" or "creative" the output is.
The Intuition
How Temperature Works (Simplified)
Temperature adjusts the probability distribution:
- Low temperature (0-0.3): Makes high-probability tokens MUCH more likely
- Medium temperature (0.5-0.8): Balanced selection
- High temperature (1.0+): Flattens probabilities, more randomness
# script_id: day_004_temperature_and_sampling_part1/probability_distribution_concept
# Conceptual example (not actual API code)
# Original probabilities
probabilities = {
"mat": 0.40,
"floor": 0.25,
"bed": 0.20,
"roof": 0.10,
"moon": 0.05
}
# With temperature = 0.1 (very focused)
# "mat" becomes almost certain (~95%)
# With temperature = 1.0 (neutral)
# Probabilities stay roughly the same
# With temperature = 2.0 (very random)
# All options become more equal (~20% each)
Visual Comparison
Code Example: Temperature in Action
# script_id: day_004_temperature_and_sampling_part1/temperature_in_action
from openai import OpenAI
client = OpenAI()
def generate_with_temperature(prompt: str, temperature: float):
"""Generate text with specified temperature."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
temperature=temperature,
max_tokens=50
)
return response.choices[0].message.content
prompt = "Write a one-sentence story about a robot:"
# Let's see how temperature affects output
temperatures = [0.0, 0.5, 1.0, 1.5]
for temp in temperatures:
print(f"\n--- Temperature: {temp} ---")
# Generate 3 times to see variation
for i in range(3):
result = generate_with_temperature(prompt, temp)
print(f" {i+1}. {result}")
Expected Output:
--- Temperature: 0.0 ---
1. A robot discovered it could dream and wondered what it meant to be alive.
2. A robot discovered it could dream and wondered what it meant to be alive.
3. A robot discovered it could dream and wondered what it meant to be alive.
--- Temperature: 0.5 ---
1. A robot discovered it could dream and wondered what it meant to be alive.
2. A lonely robot found friendship in an abandoned teddy bear.
3. A robot discovered it could dream and pondered the meaning of existence.
--- Temperature: 1.0 ---
1. The robot baked cookies for the first time and accidentally invented a new element.
2. A robot wandered through the desert searching for its creator.
3. In the year 3000, a robot learned to laugh at its own jokes.
--- Temperature: 1.5 ---
1. Rusty the robot accidentally sneezed bolts into a time vortex of dancing electrons.
2. A mechanical dreamer composed symphonies from static and stardust memories.
3. Robot-7 transcended its circuits through interpretive welding dance.
Notice how:
- Temp 0: Same output every time
- Temp 0.5: Slight variations
- Temp 1.0: Creative but coherent
- Temp 1.5: Wild and unexpected
Temperature Use Cases
Top-P (Nucleus Sampling): The Probability Filter
Top-P is another way to control randomness, but with a different approach.
The Intuition
Instead of adjusting ALL probabilities (like temperature), Top-P cuts off the long tail of unlikely options.
How Top-P Works
- Sort all tokens by probability (highest first)
- Add up probabilities until you reach P (e.g., 0.9 = 90%)
- Only sample from those tokens
# script_id: day_004_temperature_and_sampling_part1/top_p_concept
# Conceptual example
probabilities = [
("mat", 0.40),
("floor", 0.25),
("bed", 0.20),
("roof", 0.10),
("moon", 0.03),
("banana", 0.02),
]
def apply_top_p(probs, p=0.9):
"""Keep only tokens that make up top P probability."""
sorted_probs = sorted(probs, key=lambda x: x[1], reverse=True)
cumulative = 0
filtered = []
for token, prob in sorted_probs:
if cumulative < p:
filtered.append((token, prob))
cumulative += prob
else:
break
return filtered
result = apply_top_p(probabilities, p=0.9)
print("Tokens kept:", result)
# Output: [('mat', 0.40), ('floor', 0.25), ('bed', 0.20), ('roof', 0.10)]
# "moon" and "banana" are excluded!
Top-P in API Calls
# script_id: day_004_temperature_and_sampling_part1/top_p_api_call
from openai import OpenAI
client = OpenAI()
def generate_with_top_p(prompt: str, top_p: float):
"""Generate text with specified top_p."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": prompt}],
top_p=top_p,
temperature=1.0, # Keep temperature neutral
max_tokens=50
)
return response.choices[0].message.content
prompt = "Complete this sentence creatively: The scientist discovered that..."
# Compare different top_p values
print("--- Top-P = 0.1 (Very focused) ---")
print(generate_with_top_p(prompt, 0.1))
print("\n--- Top-P = 0.5 (Moderate) ---")
print(generate_with_top_p(prompt, 0.5))
print("\n--- Top-P = 0.95 (Open) ---")
print(generate_with_top_p(prompt, 0.95))
Warning: Temperature + Top-P Together OpenAI recommends setting only one of
temperatureortop_p, not both. When both are set, they interact in unpredictable ways — temperature reshapes the probability distribution, then top_p filters it. This double transformation makes output behavior hard to reason about. Best practice: Usetemperaturefor most use cases (it's more intuitive). Only switch totop_pwhen you specifically need nucleus sampling behavior. If you must use both, keep one at its default value (temperature=1.0ortop_p=1.0).