While LangChain focuses on composable chains, LlamaIndex specializes in connecting LLMs with your data. It excels at data ingestion, indexing, and building powerful query engines.
Coming from Software Engineering? Choosing between LangChain, LlamaIndex, and vanilla Python is like choosing between Django, Flask, and raw WSGI. Each trades off differently on flexibility vs batteries-included. If you've made framework decisions before, apply the same criteria: team familiarity, project complexity, long-term maintenance.
LlamaIndex vs LangChain
| Aspect | LangChain | LlamaIndex |
|---|---|---|
| Primary focus | Chains & agents | Data & retrieval |
| Indexing | Basic | Advanced |
| Query types | Simple | Complex (SQL, graphs) |
| Best for | General LLM apps | Knowledge-heavy apps |
Installation
pip install llama-index llama-index-llms-openai llama-index-embeddings-openai
Quick Start
# script_id: day_039_llamaindex_and_framework_comparison/quick_start
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# Create index (automatically chunks, embeds, and stores)
index = VectorStoreIndex.from_documents(documents)
# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")
print(response)
That's it! LlamaIndex handles chunking, embedding, and retrieval automatically.
Core Components
1. Documents and Nodes
# script_id: day_039_llamaindex_and_framework_comparison/documents_and_nodes
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
# Create a document
doc = Document(
text="LlamaIndex is a data framework for LLM applications...",
metadata={"source": "manual", "author": "user"}
)
# Parse into nodes (chunks)
parser = SentenceSplitter(chunk_size=256, chunk_overlap=20)
nodes = parser.get_nodes_from_documents([doc])
print(f"Document split into {len(nodes)} nodes")
for node in nodes:
print(f" - {node.text[:50]}...")
2. Data Loaders
# script_id: day_039_llamaindex_and_framework_comparison/data_loaders
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
# Load from directory
dir_reader = SimpleDirectoryReader(
input_dir="./documents",
recursive=True,
required_exts=[".txt", ".pdf", ".md"]
)
docs = dir_reader.load_data()
# Load from web
web_reader = SimpleWebPageReader()
web_docs = web_reader.load_data(urls=["https://example.com/article"])
# Load from various sources using LlamaHub
# pip install llama-hub
from llama_index.readers.github import GithubRepositoryReader
from llama_index.readers.notion import NotionPageReader
# Over 100+ loaders available on LlamaHub!
3. Index Types
# script_id: day_039_llamaindex_and_framework_comparison/index_types
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding
embed_model = OpenAIEmbedding()
# Vector Index - semantic search (recommended primary path)
vector_index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
Note (LlamaIndex 0.13+):
SummaryIndex,TreeIndex, andKeywordTableIndexhave been deprecated. UseVectorStoreIndexas the primary index type. If you need summarization behaviour, use a query engine withresponse_mode="tree_summarize"on a vector index instead.
Query Engines
Basic Query Engine
# script_id: day_039_llamaindex_and_framework_comparison/basic_query_engine
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load and index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=3, # Retrieve top 3 chunks
response_mode="compact" # Compact response synthesis
)
response = query_engine.query("Explain the main concepts")
print(response)
print(f"\nSources: {len(response.source_nodes)}")
Response Modes
# script_id: day_039_llamaindex_and_framework_comparison/basic_query_engine
# Different ways to synthesize responses
query_engine = index.as_query_engine(
response_mode="refine" # Iteratively refine answer
)
query_engine = index.as_query_engine(
response_mode="compact" # Compact all chunks, answer once
)
query_engine = index.as_query_engine(
response_mode="tree_summarize" # Build summary tree
)
query_engine = index.as_query_engine(
response_mode="simple_summarize" # Simple concatenation
)
Customizing Retrieval
# script_id: day_039_llamaindex_and_framework_comparison/custom_retrieval
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
# Create custom retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10
)
# Add post-processing
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)
# Build custom query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[postprocessor]
)
response = query_engine.query("Your question here")
Chat Engines
For conversational interactions with memory:
# script_id: day_039_llamaindex_and_framework_comparison/chat_engine
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Create chat engine
chat_engine = index.as_chat_engine(
chat_mode="condense_question", # Reformulates questions with context
verbose=True
)
# Have a conversation
response1 = chat_engine.chat("What is this document about?")
print(response1)
response2 = chat_engine.chat("Can you tell me more about that?")
print(response2)
response3 = chat_engine.chat("How does it compare to alternatives?")
print(response3)
# Reset conversation
chat_engine.reset()
Chat Modes
# script_id: day_039_llamaindex_and_framework_comparison/chat_engine
# Different chat modes
chat_engine = index.as_chat_engine(chat_mode="simple") # Basic
chat_engine = index.as_chat_engine(chat_mode="condense_question") # Reformulates
chat_engine = index.as_chat_engine(chat_mode="context") # Always uses context
chat_engine = index.as_chat_engine(chat_mode="condense_plus_context") # Best of both
Persistence
# script_id: day_039_llamaindex_and_framework_comparison/persistence
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
# Create and persist index
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./storage")
# Load existing index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
loaded_index = load_index_from_storage(storage_context)
query_engine = loaded_index.as_query_engine()
Using Different Vector Stores
# script_id: day_039_llamaindex_and_framework_comparison/vector_stores
# ChromaDB
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
# Create index with custom vector store
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
Advanced: Composable Indices
Combine multiple indices:
# script_id: day_039_llamaindex_and_framework_comparison/composable_indices
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
llm = OpenAI(model="gpt-4o-mini")
embed_model = OpenAIEmbedding()
# Create different query engines for different purposes
docs = SimpleDirectoryReader("./data").load_data()
vector_index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)
# Create tools from query engines with different response modes
detail_tool = QueryEngineTool(
query_engine=vector_index.as_query_engine(llm=llm, response_mode="compact"),
metadata=ToolMetadata(
name="vector_search",
description="Useful for specific questions about details"
)
)
summary_tool = QueryEngineTool(
query_engine=vector_index.as_query_engine(llm=llm, response_mode="tree_summarize"),
metadata=ToolMetadata(
name="summary",
description="Useful for summarization questions"
)
)
# Router automatically selects the right tool
router_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(llm=llm),
query_engine_tools=[detail_tool, summary_tool]
)
# Ask questions - router picks the right engine!
response = router_engine.query("Give me a summary") # Uses tree_summarize mode
response = router_engine.query("What is the exact definition of X?") # Uses compact mode
Complete Example: Knowledge Base
# script_id: day_039_llamaindex_and_framework_comparison/knowledge_base
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
load_index_from_storage
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
import os
class KnowledgeBase:
"""A complete knowledge base using LlamaIndex."""
def __init__(self, data_dir: str, persist_dir: str = "./kb_storage"):
self.data_dir = data_dir
self.persist_dir = persist_dir
# Create model instances (passed directly to constructors, not via Settings)
self.llm = OpenAI(model="gpt-4o-mini", temperature=0)
self.embed_model = OpenAIEmbedding()
# Load or create index
self.index = self._load_or_create_index()
self.query_engine = self.index.as_query_engine(llm=self.llm, similarity_top_k=5)
self.chat_engine = self.index.as_chat_engine(llm=self.llm, chat_mode="condense_plus_context")
def _load_or_create_index(self):
"""Load existing index or create new one."""
if os.path.exists(self.persist_dir):
print("Loading existing index...")
storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
return load_index_from_storage(storage_context)
else:
print("Creating new index...")
documents = SimpleDirectoryReader(self.data_dir).load_data()
index = VectorStoreIndex.from_documents(documents, embed_model=self.embed_model)
index.storage_context.persist(persist_dir=self.persist_dir)
return index
def query(self, question: str) -> str:
"""One-off query."""
response = self.query_engine.query(question)
return str(response)
def chat(self, message: str) -> str:
"""Conversational query."""
response = self.chat_engine.chat(message)
return str(response)
def add_document(self, text: str, metadata: dict = None):
"""Add a new document to the index."""
from llama_index.core import Document
doc = Document(text=text, metadata=metadata or {})
self.index.insert(doc)
self.index.storage_context.persist(persist_dir=self.persist_dir)
def get_sources(self, question: str) -> list:
"""Get source nodes for a query."""
response = self.query_engine.query(question)
return [
{
"text": node.node.text[:200],
"score": node.score,
"metadata": node.node.metadata
}
for node in response.source_nodes
]
# Usage
kb = KnowledgeBase(data_dir="./documents")
# Query
answer = kb.query("What are the main topics covered?")
print(answer)
# Chat
response1 = kb.chat("Tell me about the first topic")
response2 = kb.chat("How does that relate to the second one?")
# Get sources
sources = kb.get_sources("What is machine learning?")
for source in sources:
print(f"Score: {source['score']:.3f} - {source['text'][:50]}...")
Summary
Quick Reference
# script_id: day_039_llamaindex_and_framework_comparison/quick_reference
# Quick start
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
response = index.as_query_engine().query("Question?")
# Persistence
index.storage_context.persist("./storage")
index = load_index_from_storage(StorageContext.from_defaults("./storage"))
# Chat
chat = index.as_chat_engine()
chat.chat("First message")
chat.chat("Follow-up")
What's Next?
You've learned both LangChain and LlamaIndex! Next, we'll explore LangGraph for building stateful agent workflows.
When to Use a Framework vs. Vanilla Python
You've learned LangChain, LlamaIndex, and built agents from scratch. Now the important question: when should you use each approach?
The Trade-offs
Decision Matrix
| Factor | Vanilla | LangChain | LlamaIndex |
|---|---|---|---|
| Simple chat app | ✅ Best | Overkill | Overkill |
| Complex chains | More work | ✅ Best | Possible |
| RAG application | More work | Good | ✅ Best |
| Agent with tools | More work | ✅ Best | Possible |
| Custom logic | ✅ Best | Harder | Harder |
| Speed to prototype | Slower | ✅ Fast | ✅ Fast |
| Production control | ✅ Best | Less | Less |
| Team knowledge | Universal | Specialized | Specialized |
Decision Flowchart
Code Comparison
Simple Chat
# script_id: day_039_llamaindex_and_framework_comparison/simple_chat_comparison
# Vanilla Python - Simple and clear
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
# LangChain - More setup for simple task
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
llm = ChatOpenAI()
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)
# Verdict: Vanilla wins for simplicity
RAG Application
# script_id: day_039_llamaindex_and_framework_comparison/rag_comparison
# Vanilla Python - Lots of code
from openai import OpenAI
import chromadb
client = OpenAI()
chroma = chromadb.EphemeralClient()
collection = chroma.create_collection("docs")
# Load documents
docs = load_documents() # You implement this
# Chunk documents
chunks = chunk_documents(docs) # You implement this
# Embed and store
for chunk in chunks:
embedding = client.embeddings.create(
model="text-embedding-3-small",
input=chunk
).data[0].embedding
collection.add(ids=[...], embeddings=[embedding], documents=[chunk])
# Query
query_emb = client.embeddings.create(...).data[0].embedding
results = collection.query(query_embeddings=[query_emb])
# Build prompt, call LLM...
# LlamaIndex - Few lines
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
index = VectorStoreIndex.from_documents(
SimpleDirectoryReader("./data").load_data()
)
response = index.as_query_engine().query("Question?")
# Verdict: LlamaIndex wins for RAG
Complex Agent
# script_id: day_039_llamaindex_and_framework_comparison/complex_agent_comparison
# Vanilla Python - Full control, more code
class Agent:
def __init__(self):
self.tools = {}
def add_tool(self, name, func):
self.tools[name] = func
def run(self, task):
# Implement ReAct loop
# Parse responses
# Execute tools
# Manage state
pass # 50+ lines of code
# LangChain - Pre-built components
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.tools import Tool
tools = [Tool(name="search", func=search_fn, description="...")]
agent = create_react_agent(ChatOpenAI(), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({"input": "task"})
# Verdict: LangChain wins for standard agents
# But vanilla wins if you need custom behavior
Hybrid Approach
Often the best solution combines approaches:
# script_id: day_039_llamaindex_and_framework_comparison/hybrid_approach
# Use LlamaIndex for data, vanilla for control
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from openai import OpenAI
# LlamaIndex for the heavy lifting
index = VectorStoreIndex.from_documents(
SimpleDirectoryReader("./data").load_data()
)
retriever = index.as_retriever(similarity_top_k=5)
# Vanilla Python for custom logic
client = OpenAI()
def custom_rag(question: str) -> str:
# Retrieve with LlamaIndex
nodes = retriever.retrieve(question)
context = "\n".join([n.text for n in nodes])
# Custom prompt logic
if len(context) < 100:
return "Not enough information found."
# Custom LLM call
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": question}
],
temperature=0.2 # Custom setting
)
# Custom post-processing
answer = response.choices[0].message.content
if "I don't know" in answer:
return fallback_response(question)
return answer
Framework Overhead
Production Considerations
| Consideration | Vanilla | Frameworks |
|---|---|---|
| Debugging | Easy - your code | Harder - framework internals |
| Upgrades | You control | Breaking changes possible |
| Performance | Optimized by you | May have overhead |
| Hiring | Any Python dev | Need framework knowledge |
| Documentation | Self-documenting | Depends on framework |
Recommendations
Use Vanilla Python When:
- Building simple chat applications
- Need maximum control over behavior
- Want minimal dependencies
- Team doesn't know frameworks
- Building for long-term maintenance
Use LangChain When:
- Building agents with tools
- Need complex chain compositions
- Want rapid prototyping
- Using many third-party integrations
- Building standard patterns
Use LlamaIndex When:
- Building RAG applications
- Working with lots of documents
- Need advanced retrieval strategies
- Building knowledge bases
- Want quick data-to-query setup
Use Hybrid When:
- Need best of both worlds
- Want framework convenience with custom control
- Building production systems that will evolve
Summary
Quick Decision Guide
Simple chat? → Vanilla
RAG app? → LlamaIndex
Agent with tools? → LangChain
Need control? → Vanilla
Quick prototype? → Framework
Production + maintenance? → Consider vanilla or hybrid
The best developers know when to use frameworks and when to write custom code. Master all approaches, then choose wisely!