Take your agent from localhost to the world! This guide covers deploying to major cloud platforms.
Coming from Software Engineering? You've deployed services before — this is the same workflow with different service names. ECS/Cloud Run for containers, managed databases for persistence, secrets manager for API keys. PaaS options like Render and Railway are even simpler — push to git and it deploys, just like Heroku. The AI-specific consideration is that LLM-backed services have higher latency and cost-per-request than typical web services, so right-size your infrastructure accordingly.
Deployment Options
| Platform | Complexity | Cost | Best For |
|---|---|---|---|
| AWS | High | Variable | Enterprise, full control |
| GCP | High | Variable | ML workloads, BigQuery |
| Render | Low | Predictable | Startups, simple apps |
| Railway | Low | Predictable | Side projects, MVPs |
Preparing for Deployment
1. Environment Variables
# script_id: day_093_cloud_deployment/env_config
# config.py
import os
class Config:
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
DATABASE_URL = os.environ.get("DATABASE_URL")
ENVIRONMENT = os.environ.get("ENVIRONMENT", "development")
@classmethod
def validate(cls):
required = ["OPENAI_API_KEY"]
missing = [v for v in required if not getattr(cls, v)]
if missing:
raise ValueError(f"Missing env vars: {missing}")
2. Requirements File
# requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
openai==1.3.0
pydantic==2.5.0
python-dotenv==1.0.0
gunicorn==21.2.0
3. Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Expose port
EXPOSE 8000
# Run
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Deploy to Render
The easiest option for most cases:
1. Create render.yaml
# render.yaml
services:
- type: web
name: my-agent-api
env: docker
plan: starter # or standard for more resources
envVars:
- key: OPENAI_API_KEY
sync: false # Set manually in dashboard
- key: ENVIRONMENT
value: production
healthCheckPath: /health
autoDeploy: true
2. Deploy
# Connect GitHub repo to Render
# Or use Render CLI
render deploy
3. FastAPI Health Check
# script_id: day_093_cloud_deployment/health_check_render
from fastapi import FastAPI
app = FastAPI()
@app.get("/health")
def health_check():
return {"status": "healthy"}
Deploy to Railway
Developer-friendly platform:
1. railway.json (Optional)
{
"build": {
"builder": "DOCKERFILE"
},
"deploy": {
"startCommand": "uvicorn main:app --host 0.0.0.0 --port $PORT",
"healthcheckPath": "/health",
"restartPolicyType": "ON_FAILURE"
}
}
2. Deploy via CLI
# Install Railway CLI
npm install -g @railway/cli
# Login
railway login
# Initialize project
railway init
# Deploy
railway up
# Set environment variables
railway variables set OPENAI_API_KEY=sk-...
Deploy to AWS
For production workloads:
Option 1: AWS App Runner (Simplest)
# apprunner.yaml
version: 1.0
runtime: python3
build:
commands:
build:
- pip install -r requirements.txt
run:
command: uvicorn main:app --host 0.0.0.0 --port 8080
network:
port: 8080
Deploy:
aws apprunner create-service \
--service-name my-agent \
--source-configuration file://apprunner.yaml
Option 2: ECS with Fargate
# task-definition.json
{
"family": "agent-task",
"containerDefinitions": [
{
"name": "agent-container",
"image": "YOUR_ECR_IMAGE",
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "ENVIRONMENT",
"value": "production"
}
],
"secrets": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:..."
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/agent",
"awslogs-region": "us-east-1"
}
}
}
],
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512"
}
Option 3: Lambda (Serverless)
# script_id: day_093_cloud_deployment/lambda_handler
# handler.py
from mangum import Mangum
from main import app
handler = Mangum(app)
# serverless.yml
service: agent-api
provider:
name: aws
runtime: python3.11
functions:
api:
handler: handler.handler
events:
- http:
path: /{proxy+}
method: ANY
environment:
OPENAI_API_KEY: ${ssm:/agent/openai-key}
Deploy to GCP
Option 1: Cloud Run (Recommended)
# Build and push to GCR
gcloud builds submit --tag gcr.io/PROJECT_ID/agent
# Deploy to Cloud Run
gcloud run deploy agent \
--image gcr.io/PROJECT_ID/agent \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--set-env-vars "ENVIRONMENT=production" \
--set-secrets "OPENAI_API_KEY=openai-key:latest"
Option 2: Kubernetes (GKE)
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-deployment
spec:
replicas: 2
selector:
matchLabels:
app: agent
template:
metadata:
labels:
app: agent
spec:
containers:
- name: agent
image: gcr.io/PROJECT_ID/agent:latest
ports:
- containerPort: 8000
env:
- name: ENVIRONMENT
value: "production"
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: api-secrets
key: openai-key
resources:
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: agent-service
spec:
selector:
app: agent
ports:
- port: 80
targetPort: 8000
type: LoadBalancer
CI/CD Pipeline
GitHub Actions
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Run tests
run: |
pip install -r requirements.txt
pytest
- name: Deploy to Render
uses: johnbeynon/render-deploy-action@v0.0.8
with:
service-id: ${{ secrets.RENDER_SERVICE_ID }}
api-key: ${{ secrets.RENDER_API_KEY }}
Monitoring & Logging
Structured Logging
# script_id: day_093_cloud_deployment/structured_logging
import logging
import json
from datetime import datetime
class JSONFormatter(logging.Formatter):
def format(self, record):
log_data = {
"timestamp": datetime.utcnow().isoformat(),
"level": record.levelname,
"message": record.getMessage(),
"module": record.module,
}
if hasattr(record, "extra"):
log_data.update(record.extra)
return json.dumps(log_data)
# Setup
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger = logging.getLogger(__name__)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# Usage
logger.info("Agent started", extra={"agent_id": "123", "model": "gpt-4o"})
Health Checks
# script_id: day_093_cloud_deployment/health_checks
from fastapi import FastAPI
from datetime import datetime
app = FastAPI()
start_time = datetime.now()
@app.get("/health")
def health():
return {
"status": "healthy",
"uptime": (datetime.now() - start_time).total_seconds()
}
@app.get("/ready")
def readiness():
# Check dependencies
checks = {
"database": check_database(),
"openai": check_openai_connection()
}
all_healthy = all(checks.values())
return {
"ready": all_healthy,
"checks": checks
}
Observability & Tracing
Structured logging (above) is the minimum. For production AI systems, you need trace-level observability — seeing the full lifecycle of each request including LLM calls, tool executions, retrieval steps, and token costs.
LangSmith Integration
# script_id: day_093_cloud_deployment/langsmith_tracing
# pip install langsmith
import os
# Set environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGSMITH_API_KEY")
os.environ["LANGCHAIN_PROJECT"] = "production-agent"
# If using LangChain/LangGraph, tracing is automatic.
# For custom code, use the @traceable decorator:
from langsmith import traceable
@traceable(name="generate_response")
def generate_response(query: str) -> str:
"""This function's inputs, outputs, and timing are automatically traced."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": query}]
)
return response.choices[0].message.content
Langfuse (Open Source Alternative)
# script_id: day_093_cloud_deployment/langfuse_tracing
# pip install langfuse
from langfuse import Langfuse
from langfuse.decorators import observe
langfuse = Langfuse(
public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
host=os.getenv("LANGFUSE_HOST", "https://cloud.langfuse.com")
)
@observe()
def rag_pipeline(query: str) -> str:
"""Each step is automatically traced — retrieval, generation, tool calls."""
docs = retrieve_documents(query)
context = format_context(docs)
response = generate_with_context(query, context)
return response
What to Track in Production
| Metric | Why It Matters | Tool |
|---|---|---|
| Latency per step | Find bottlenecks (retrieval vs generation) | LangSmith / Langfuse |
| Token usage per request | Cost attribution, budget enforcement | Any tracing tool |
| Error rates by type | Distinguish LLM errors from infra errors | Structured logs + traces |
| User feedback signals | Ground truth for eval dataset | Custom + Langfuse |
| Retrieval relevance scores | RAG quality degradation alerts | Custom metrics |
Coming from Software Engineering? LangSmith / Langfuse are the Datadog APM of AI. Instead of tracing HTTP requests through microservices, you're tracing queries through retrieval → LLM → tool execution chains. Same observability mindset, different telemetry.
Cost Optimization
# script_id: day_093_cloud_deployment/cost_optimization
# Caching to reduce API calls
from functools import lru_cache
import hashlib
@lru_cache(maxsize=1000)
def cached_embedding(text_hash: str):
# Only called if not in cache
return generate_embedding(text)
def get_embedding(text: str):
text_hash = hashlib.md5(text.encode()).hexdigest()
return cached_embedding(text_hash)
Summary
Quick Reference
# Render
render deploy
# Railway
railway up
# AWS App Runner
aws apprunner create-service ...
# GCP Cloud Run
gcloud run deploy ...
# Docker
docker build -t agent .
docker run -p 8000:8000 agent
Congratulations!
You've completed the 6-month AI Agent curriculum! You now know how to:
- Build LLM-powered applications
- Create RAG systems with vector databases
- Design single and multi-agent architectures
- Evaluate and secure your agents
- Deploy to production
Keep building amazing AI agents! 🚀