Phase 3Single Agent·12 min read

Introduction to LlamaIndex

Phase 3 of 8

While LangChain focuses on composable chains, LlamaIndex specializes in connecting LLMs with your data. It excels at data ingestion, indexing, and building powerful query engines.

Coming from Software Engineering? Choosing between LangChain, LlamaIndex, and vanilla Python is like choosing between Django, Flask, and raw WSGI. Each trades off differently on flexibility vs batteries-included. If you've made framework decisions before, apply the same criteria: team familiarity, project complexity, long-term maintenance.


LlamaIndex vs LangChain

Aspect LangChain LlamaIndex
Primary focus Chains & agents Data & retrieval
Indexing Basic Advanced
Query types Simple Complex (SQL, graphs)
Best for General LLM apps Knowledge-heavy apps

Installation

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Quick Start

# script_id: day_039_llamaindex_and_framework_comparison/quick_start
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# Create index (automatically chunks, embeds, and stores)
index = VectorStoreIndex.from_documents(documents)

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What is this document about?")
print(response)

That's it! LlamaIndex handles chunking, embedding, and retrieval automatically.


Core Components

1. Documents and Nodes

# script_id: day_039_llamaindex_and_framework_comparison/documents_and_nodes
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter

# Create a document
doc = Document(
    text="LlamaIndex is a data framework for LLM applications...",
    metadata={"source": "manual", "author": "user"}
)

# Parse into nodes (chunks)
parser = SentenceSplitter(chunk_size=256, chunk_overlap=20)
nodes = parser.get_nodes_from_documents([doc])

print(f"Document split into {len(nodes)} nodes")
for node in nodes:
    print(f"  - {node.text[:50]}...")

2. Data Loaders

# script_id: day_039_llamaindex_and_framework_comparison/data_loaders
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader

# Load from directory
dir_reader = SimpleDirectoryReader(
    input_dir="./documents",
    recursive=True,
    required_exts=[".txt", ".pdf", ".md"]
)
docs = dir_reader.load_data()

# Load from web
web_reader = SimpleWebPageReader()
web_docs = web_reader.load_data(urls=["https://example.com/article"])

# Load from various sources using LlamaHub
# pip install llama-hub
from llama_index.readers.github import GithubRepositoryReader
from llama_index.readers.notion import NotionPageReader

# Over 100+ loaders available on LlamaHub!

3. Index Types

# script_id: day_039_llamaindex_and_framework_comparison/index_types
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()

# Vector Index - semantic search (recommended primary path)
vector_index = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

Note (LlamaIndex 0.13+): SummaryIndex, TreeIndex, and KeywordTableIndex have been deprecated. Use VectorStoreIndex as the primary index type. If you need summarization behaviour, use a query engine with response_mode="tree_summarize" on a vector index instead.


Query Engines

Basic Query Engine

# script_id: day_039_llamaindex_and_framework_comparison/basic_query_engine
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# Load and index
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create query engine
query_engine = index.as_query_engine(
    similarity_top_k=3,  # Retrieve top 3 chunks
    response_mode="compact"  # Compact response synthesis
)

response = query_engine.query("Explain the main concepts")
print(response)
print(f"\nSources: {len(response.source_nodes)}")

Response Modes

# script_id: day_039_llamaindex_and_framework_comparison/basic_query_engine
# Different ways to synthesize responses
query_engine = index.as_query_engine(
    response_mode="refine"  # Iteratively refine answer
)

query_engine = index.as_query_engine(
    response_mode="compact"  # Compact all chunks, answer once
)

query_engine = index.as_query_engine(
    response_mode="tree_summarize"  # Build summary tree
)

query_engine = index.as_query_engine(
    response_mode="simple_summarize"  # Simple concatenation
)

Customizing Retrieval

# script_id: day_039_llamaindex_and_framework_comparison/custom_retrieval
from llama_index.core import VectorStoreIndex
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# Create custom retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10
)

# Add post-processing
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)

# Build custom query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[postprocessor]
)

response = query_engine.query("Your question here")

Chat Engines

For conversational interactions with memory:

# script_id: day_039_llamaindex_and_framework_comparison/chat_engine
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)

# Create chat engine
chat_engine = index.as_chat_engine(
    chat_mode="condense_question",  # Reformulates questions with context
    verbose=True
)

# Have a conversation
response1 = chat_engine.chat("What is this document about?")
print(response1)

response2 = chat_engine.chat("Can you tell me more about that?")
print(response2)

response3 = chat_engine.chat("How does it compare to alternatives?")
print(response3)

# Reset conversation
chat_engine.reset()

Chat Modes

# script_id: day_039_llamaindex_and_framework_comparison/chat_engine
# Different chat modes
chat_engine = index.as_chat_engine(chat_mode="simple")  # Basic
chat_engine = index.as_chat_engine(chat_mode="condense_question")  # Reformulates
chat_engine = index.as_chat_engine(chat_mode="context")  # Always uses context
chat_engine = index.as_chat_engine(chat_mode="condense_plus_context")  # Best of both

Persistence

# script_id: day_039_llamaindex_and_framework_comparison/persistence
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

# Create and persist index
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./storage")

# Load existing index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
loaded_index = load_index_from_storage(storage_context)

query_engine = loaded_index.as_query_engine()

Using Different Vector Stores

# script_id: day_039_llamaindex_and_framework_comparison/vector_stores
# ChromaDB
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Create index with custom vector store
from llama_index.core import StorageContext

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

Advanced: Composable Indices

Combine multiple indices:

# script_id: day_039_llamaindex_and_framework_comparison/composable_indices
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

llm = OpenAI(model="gpt-4o-mini")
embed_model = OpenAIEmbedding()

# Create different query engines for different purposes
docs = SimpleDirectoryReader("./data").load_data()

vector_index = VectorStoreIndex.from_documents(docs, embed_model=embed_model)

# Create tools from query engines with different response modes
detail_tool = QueryEngineTool(
    query_engine=vector_index.as_query_engine(llm=llm, response_mode="compact"),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for specific questions about details"
    )
)

summary_tool = QueryEngineTool(
    query_engine=vector_index.as_query_engine(llm=llm, response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarization questions"
    )
)

# Router automatically selects the right tool
router_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(llm=llm),
    query_engine_tools=[detail_tool, summary_tool]
)

# Ask questions - router picks the right engine!
response = router_engine.query("Give me a summary")  # Uses tree_summarize mode
response = router_engine.query("What is the exact definition of X?")  # Uses compact mode

Complete Example: Knowledge Base

# script_id: day_039_llamaindex_and_framework_comparison/knowledge_base
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage
)
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
import os

class KnowledgeBase:
    """A complete knowledge base using LlamaIndex."""

    def __init__(self, data_dir: str, persist_dir: str = "./kb_storage"):
        self.data_dir = data_dir
        self.persist_dir = persist_dir

        # Create model instances (passed directly to constructors, not via Settings)
        self.llm = OpenAI(model="gpt-4o-mini", temperature=0)
        self.embed_model = OpenAIEmbedding()

        # Load or create index
        self.index = self._load_or_create_index()
        self.query_engine = self.index.as_query_engine(llm=self.llm, similarity_top_k=5)
        self.chat_engine = self.index.as_chat_engine(llm=self.llm, chat_mode="condense_plus_context")

    def _load_or_create_index(self):
        """Load existing index or create new one."""
        if os.path.exists(self.persist_dir):
            print("Loading existing index...")
            storage_context = StorageContext.from_defaults(persist_dir=self.persist_dir)
            return load_index_from_storage(storage_context)
        else:
            print("Creating new index...")
            documents = SimpleDirectoryReader(self.data_dir).load_data()
            index = VectorStoreIndex.from_documents(documents, embed_model=self.embed_model)
            index.storage_context.persist(persist_dir=self.persist_dir)
            return index

    def query(self, question: str) -> str:
        """One-off query."""
        response = self.query_engine.query(question)
        return str(response)

    def chat(self, message: str) -> str:
        """Conversational query."""
        response = self.chat_engine.chat(message)
        return str(response)

    def add_document(self, text: str, metadata: dict = None):
        """Add a new document to the index."""
        from llama_index.core import Document
        doc = Document(text=text, metadata=metadata or {})
        self.index.insert(doc)
        self.index.storage_context.persist(persist_dir=self.persist_dir)

    def get_sources(self, question: str) -> list:
        """Get source nodes for a query."""
        response = self.query_engine.query(question)
        return [
            {
                "text": node.node.text[:200],
                "score": node.score,
                "metadata": node.node.metadata
            }
            for node in response.source_nodes
        ]

# Usage
kb = KnowledgeBase(data_dir="./documents")

# Query
answer = kb.query("What are the main topics covered?")
print(answer)

# Chat
response1 = kb.chat("Tell me about the first topic")
response2 = kb.chat("How does that relate to the second one?")

# Get sources
sources = kb.get_sources("What is machine learning?")
for source in sources:
    print(f"Score: {source['score']:.3f} - {source['text'][:50]}...")

Summary


Quick Reference

# script_id: day_039_llamaindex_and_framework_comparison/quick_reference
# Quick start
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

docs = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(docs)
response = index.as_query_engine().query("Question?")

# Persistence
index.storage_context.persist("./storage")
index = load_index_from_storage(StorageContext.from_defaults("./storage"))

# Chat
chat = index.as_chat_engine()
chat.chat("First message")
chat.chat("Follow-up")

What's Next?

You've learned both LangChain and LlamaIndex! Next, we'll explore LangGraph for building stateful agent workflows.


When to Use a Framework vs. Vanilla Python

You've learned LangChain, LlamaIndex, and built agents from scratch. Now the important question: when should you use each approach?


The Trade-offs


Decision Matrix

Factor Vanilla LangChain LlamaIndex
Simple chat app ✅ Best Overkill Overkill
Complex chains More work ✅ Best Possible
RAG application More work Good ✅ Best
Agent with tools More work ✅ Best Possible
Custom logic ✅ Best Harder Harder
Speed to prototype Slower ✅ Fast ✅ Fast
Production control ✅ Best Less Less
Team knowledge Universal Specialized Specialized

Decision Flowchart


Code Comparison

Simple Chat

# script_id: day_039_llamaindex_and_framework_comparison/simple_chat_comparison
# Vanilla Python - Simple and clear
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

# LangChain - More setup for simple task
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI()
response = llm.invoke([HumanMessage(content="Hello!")])
print(response.content)

# Verdict: Vanilla wins for simplicity

RAG Application

# script_id: day_039_llamaindex_and_framework_comparison/rag_comparison
# Vanilla Python - Lots of code
from openai import OpenAI
import chromadb

client = OpenAI()
chroma = chromadb.EphemeralClient()
collection = chroma.create_collection("docs")

# Load documents
docs = load_documents()  # You implement this

# Chunk documents
chunks = chunk_documents(docs)  # You implement this

# Embed and store
for chunk in chunks:
    embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=chunk
    ).data[0].embedding
    collection.add(ids=[...], embeddings=[embedding], documents=[chunk])

# Query
query_emb = client.embeddings.create(...).data[0].embedding
results = collection.query(query_embeddings=[query_emb])
# Build prompt, call LLM...

# LlamaIndex - Few lines
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

index = VectorStoreIndex.from_documents(
    SimpleDirectoryReader("./data").load_data()
)
response = index.as_query_engine().query("Question?")

# Verdict: LlamaIndex wins for RAG

Complex Agent

# script_id: day_039_llamaindex_and_framework_comparison/complex_agent_comparison
# Vanilla Python - Full control, more code
class Agent:
    def __init__(self):
        self.tools = {}

    def add_tool(self, name, func):
        self.tools[name] = func

    def run(self, task):
        # Implement ReAct loop
        # Parse responses
        # Execute tools
        # Manage state
        pass  # 50+ lines of code

# LangChain - Pre-built components
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.tools import Tool

tools = [Tool(name="search", func=search_fn, description="...")]
agent = create_react_agent(ChatOpenAI(), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
result = executor.invoke({"input": "task"})

# Verdict: LangChain wins for standard agents
# But vanilla wins if you need custom behavior

Hybrid Approach

Often the best solution combines approaches:

# script_id: day_039_llamaindex_and_framework_comparison/hybrid_approach
# Use LlamaIndex for data, vanilla for control

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from openai import OpenAI

# LlamaIndex for the heavy lifting
index = VectorStoreIndex.from_documents(
    SimpleDirectoryReader("./data").load_data()
)
retriever = index.as_retriever(similarity_top_k=5)

# Vanilla Python for custom logic
client = OpenAI()

def custom_rag(question: str) -> str:
    # Retrieve with LlamaIndex
    nodes = retriever.retrieve(question)
    context = "\n".join([n.text for n in nodes])

    # Custom prompt logic
    if len(context) < 100:
        return "Not enough information found."

    # Custom LLM call
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": question}
        ],
        temperature=0.2  # Custom setting
    )

    # Custom post-processing
    answer = response.choices[0].message.content
    if "I don't know" in answer:
        return fallback_response(question)

    return answer

Framework Overhead


Production Considerations

Consideration Vanilla Frameworks
Debugging Easy - your code Harder - framework internals
Upgrades You control Breaking changes possible
Performance Optimized by you May have overhead
Hiring Any Python dev Need framework knowledge
Documentation Self-documenting Depends on framework

Recommendations

Use Vanilla Python When:

  • Building simple chat applications
  • Need maximum control over behavior
  • Want minimal dependencies
  • Team doesn't know frameworks
  • Building for long-term maintenance

Use LangChain When:

  • Building agents with tools
  • Need complex chain compositions
  • Want rapid prototyping
  • Using many third-party integrations
  • Building standard patterns

Use LlamaIndex When:

  • Building RAG applications
  • Working with lots of documents
  • Need advanced retrieval strategies
  • Building knowledge bases
  • Want quick data-to-query setup

Use Hybrid When:

  • Need best of both worlds
  • Want framework convenience with custom control
  • Building production systems that will evolve

Summary


Quick Decision Guide

Simple chat? → Vanilla
RAG app? → LlamaIndex
Agent with tools? → LangChain
Need control? → Vanilla
Quick prototype? → Framework
Production + maintenance? → Consider vanilla or hybrid

The best developers know when to use frameworks and when to write custom code. Master all approaches, then choose wisely!