Welcome to the Llama Stack Tutorial - your comprehensive guide to learning and using Llama Stack.
Llama Stack is an open-source framework that makes it easier to build, run, and experiment with large language models on your own infrastructure. It provides:
- Unified interface for different model providers (Ollama, vLLM, etc.)
- Tools and agents for complex AI workflows
- Safety and telemetry built-in
- CLI, Python SDK, and web playground for development
This repository contains hands-on tutorials covering:
- Getting started with CLI and Python SDK
- Using the web playground
- Basic model interactions
- Model Context Protocol (MCP) integration
- Retrieval Augmented Generation (RAG)
- Building interactive agents
- Safety and content filtering
- Telemetry and observability
- Production deployments
- Custom providers and extensions
- Advanced agent workflows
- Web Playground: http://localhost:8502
- Llama Stack API: http://localhost:8321
- Jaeger Telemetry: http://localhost:16686
- Tutorial Site: Run
./utilities/lab-serve
then http://localhost:8080
This tutorial is designed to help developers learn Llama Stack through practical examples. The content is built with Antora and can be edited in the content/modules/ROOT/pages/
directory.