Phase 3Single Agent·9 min read

Introduction to LangChain

Phase 3 of 8

You've built agents from scratch - now let's explore LangChain, a popular framework that provides pre-built components for LLM applications. It's like having LEGO blocks for AI development!

Coming from Software Engineering? LangChain is like Express.js or Flask for LLMs — it's a framework that gives you middleware, routing, and pre-built components so you don't start from scratch. Like any framework, it trades flexibility for speed. If you've debated 'framework vs library' before, the same considerations apply.


What is LangChain?

LangChain provides:

  • Models: Unified interface for different LLM providers
  • Prompts: Templating and management
  • Chains: Composable sequences of operations
  • Agents: Decision-making with tools
  • Memory: Conversation history management

Installation

pip install langchain langchain-openai langchain-community langchain-chroma

Core Concepts

1. Chat Models

# script_id: day_038_langchain_basics/chat_models
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# Initialize the model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Simple invocation
response = llm.invoke([
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What is Python?")
])

print(response.content)

2. Prompt Templates

# script_id: day_038_langchain_basics/prompt_templates
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Simple template
simple_prompt = ChatPromptTemplate.from_template(
    "Explain {topic} in simple terms for a {audience}."
)

# Format the prompt
formatted = simple_prompt.format(topic="machine learning", audience="beginner")
print(formatted)

# Chat prompt with system message
chat_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {role}. Be {tone}."),
    ("human", "{input}")
])

messages = chat_prompt.format_messages(
    role="teacher",
    tone="friendly",
    input="Explain recursion"
)

3. Output Parsers

# script_id: day_038_langchain_basics/output_parsers
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from pydantic import BaseModel, Field

# Simple string parser
parser = StrOutputParser()

# JSON parser with schema
class MovieReview(BaseModel):
    title: str = Field(description="Movie title")
    rating: int = Field(description="Rating from 1-10")
    summary: str = Field(description="Brief summary")

json_parser = JsonOutputParser(pydantic_object=MovieReview)

# Get format instructions for the prompt
print(json_parser.get_format_instructions())

Modern Alternative: In modern LangChain (0.2+, including the 1.x line), prefer model.with_structured_output(MyPydanticModel) for production-grade structured output. JsonOutputParser shown above is the portable fallback. For complex multi-step workflows, consider LangGraph over LCEL chains.


LangChain Expression Language (LCEL)

LCEL is LangChain's declarative way to compose chains using the pipe (|) operator:

If you've used Unix pipes (cat file | grep foo | sort), this is the same idea — each stage's output becomes the next stage's input, left to right. LangChain overloads Python's | so prompt | model | parser reads exactly like a shell pipeline (it is not bitwise-OR here).

# script_id: day_038_langchain_basics/lcel_basic_chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define components
prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
model = ChatOpenAI(model="gpt-4o-mini")
parser = StrOutputParser()

# Compose the chain with LCEL
chain = prompt | model | parser

# Run the chain
result = chain.invoke({"topic": "programming"})
print(result)

Chaining Multiple Steps

# script_id: day_038_langchain_basics/chaining_multiple_steps
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

model = ChatOpenAI(model="gpt-4o-mini")

# Chain 1: Generate an outline
outline_prompt = ChatPromptTemplate.from_template(
    "Create a brief outline for an article about {topic}. Return just bullet points."
)

# Chain 2: Expand the outline
expand_prompt = ChatPromptTemplate.from_template(
    "Expand this outline into a short article:\n{outline}"
)

# Compose chains
outline_chain = outline_prompt | model | StrOutputParser()
expand_chain = expand_prompt | model | StrOutputParser()

# A dict in an LCEL pipe runs each value and builds a dict of results for the
# next stage — here it feeds the {outline} placeholder in expand_prompt.
full_chain = {"outline": outline_chain} | expand_chain

result = full_chain.invoke({"topic": "AI in healthcare"})
print(result)

Working with Documents

Document Loaders

# script_id: day_038_langchain_basics/document_loaders
from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    WebBaseLoader,
    DirectoryLoader
)

# Load a text file
text_loader = TextLoader("document.txt")
docs = text_loader.load()

# Load a PDF
pdf_loader = PyPDFLoader("report.pdf")
pdf_docs = pdf_loader.load()

# Load from web
web_loader = WebBaseLoader("https://example.com/article")
web_docs = web_loader.load()

# Load all files from directory
dir_loader = DirectoryLoader("./documents", glob="**/*.txt")
all_docs = dir_loader.load()

# Each document has content and metadata
for doc in docs:
    print(f"Content: {doc.page_content[:100]}...")
    print(f"Metadata: {doc.metadata}")

Text Splitters

# script_id: day_038_langchain_basics/text_splitters
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter,
    CharacterTextSplitter,
    TokenTextSplitter
)

# Recursive splitter (recommended)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)

text = "Your long document text here..."
chunks = splitter.split_text(text)

# Or split documents directly
from langchain_community.document_loaders import TextLoader
loader = TextLoader("document.txt")
documents = loader.load()
split_docs = splitter.split_documents(documents)

print(f"Original: 1 document")
print(f"After split: {len(split_docs)} chunks")

Building a RAG Chain

These pieces (embeddings, vector store, retriever) come from Phase 2 — see Day 34. k=3 just means fetch the 3 most relevant chunks.

# script_id: day_038_langchain_basics/rag_chain
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Setup components
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(embedding_function=embeddings, persist_directory="./db")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
llm = ChatOpenAI(model="gpt-4o-mini")

# RAG prompt
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following context:

Context: {context}

Question: {question}

Answer:""")

# Helper to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Build RAG chain with LCEL.
# The dict keys (context, question) must match the {placeholders} in the prompt.
# RunnablePassthrough() is a no-op stage that forwards the chain input unchanged —
# here it copies the incoming question into the prompt, while
# retriever | format_docs fetches and formats the matching documents.
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Use the chain (with error handling for production)
try:
    answer = rag_chain.invoke("What is machine learning?")
    print(answer)
except Exception as e:
    # LangChain wraps API errors — always catch broadly and log
    print(f"Chain failed: {type(e).__name__}: {e}")
    # In production: return a fallback response, log the error, alert if repeated

Streaming with LCEL

# script_id: day_038_langchain_basics/streaming_lcel
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Write a story about {topic}")
model = ChatOpenAI(model="gpt-4o-mini", streaming=True)

chain = prompt | model

# Stream the response
for chunk in chain.stream({"topic": "a robot learning to paint"}):
    print(chunk.content, end="", flush=True)

Async Support

Each LLM call spends most of its time waiting on the network, so firing several at once with asyncio.gather cuts total wait from the sum of calls to roughly the slowest single call — the same reason you'd parallelize independent HTTP requests.

# script_id: day_038_langchain_basics/async_support
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Explain {topic} briefly")
model = ChatOpenAI(model="gpt-4o-mini")
chain = prompt | model

async def process_topics(topics: list):
    """Process multiple topics concurrently."""
    tasks = [chain.ainvoke({"topic": t}) for t in topics]
    results = await asyncio.gather(*tasks)
    return results

# Run async
topics = ["Python", "JavaScript", "Rust"]
results = asyncio.run(process_topics(topics))

for topic, result in zip(topics, results):
    print(f"{topic}: {result.content[:100]}...")

Comparison: Vanilla Python vs LangChain

# script_id: day_038_langchain_basics/vanilla_vs_langchain
# Vanilla Python
from openai import OpenAI
client = OpenAI()

def vanilla_rag(question, docs):
    context = "\n".join(docs)
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# LangChain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template("Context: {context}\n\nQuestion: {question}")
chain = prompt | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser()
result = chain.invoke({"context": docs, "question": question})
Aspect Vanilla LangChain
Setup Minimal More imports
Flexibility Maximum Structured
Composability Manual Built-in (LCEL)
Ecosystem DIY Rich integrations
Learning curve Lower Higher

When to Use LangChain


Checkpoint

Run the LCEL basic chain (chain = prompt | model | parser) with chain.invoke({"topic": "programming"}). You should get back a plain Python string — a joke — not a ChatMessage object or a dict. That's the StrOutputParser doing its job at the end of the pipe. If you get an object with a .content attribute instead of a string, you left StrOutputParser() off the end of the chain.

Summary


Quick Reference

# script_id: day_038_langchain_basics/quick_reference
# Basic chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
    ChatPromptTemplate.from_template("{input}")
    | ChatOpenAI()
    | StrOutputParser()
)
result = chain.invoke({"input": "Hello"})

# With retriever
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt | model | parser
)

Exercises

  1. Template Gallery: Create 5 different prompt templates for various tasks (summarization, translation, code review)

  2. Multi-Step Chain: Build a chain that researches a topic, creates an outline, and writes an article

  3. Streaming RAG: Implement a RAG chain with streaming responses

Solutions (approaches)
  1. Build each template with ChatPromptTemplate.from_template(...) and store them in a dict, e.g. templates = {"summarize": ChatPromptTemplate.from_template("Summarize:\n{text}"), "translate": ChatPromptTemplate.from_template("Translate to {lang}:\n{text}"), ...}. Pick one and pipe it: chain = templates["summarize"] | ChatOpenAI(model="gpt-4o-mini") | StrOutputParser().
  2. Reuse the chaining pattern from this lesson:
    model = ChatOpenAI(model="gpt-4o-mini")
    outline_chain = ChatPromptTemplate.from_template(
        "Outline an article about {topic}"
    ) | model | StrOutputParser()
    expand_chain = ChatPromptTemplate.from_template(
        "Write a short article from this outline:\n{outline}"
    ) | model | StrOutputParser()
    article_chain = {"outline": outline_chain} | expand_chain
    print(article_chain.invoke({"topic": "vector databases"}))
    
  3. Take the rag_chain from this lesson and call .stream(...) instead of .invoke(...), printing each chunk: for chunk in rag_chain.stream("What is RAG?"): print(chunk, end="", flush=True). With StrOutputParser() at the end, each chunk is a plain string.

What's Next?

Next (Day 39), we explore LlamaIndex — another framework focused on data ingestion and query engines!