Action Models for Robot Learning

by Pavan Kumar Kandapagari · open-access draft, updated continuously

A textbook on the policies that turn perception and language into robot behavior. The book traces the lineage that produced today's vision-language-action systems — RT-1, RT-2, OpenVLA, π₀, Helix, and GR00T N1 — and organizes the field around four families of action models: symbolic planners, geometric and inverse-dynamics controllers, value-based policies learned from reward, and learned policies trained from demonstration. We build the modern recipe from first principles, one section at a time, and end with what it actually takes to fine-tune and deploy a VLA on a real robot.

New to the underlying math or machine learning? The six appendices — linear algebra, probability, optimization, and the rest of the toolkit — are written to be read first.

Start reading → Chapter 1 Browse the table of contents →

The book at a glance

Part 1 — Foundations and a first taste of VLAs

Part 2 — The lineage that produced VLAs

Part 3 — Modern building blocks

Part 4 — Foundation action models in depth

11 The VLA recipe: from CLIP to RT-1
12 Scaling up: RT-2, OpenVLA, and Octo
13 Smooth control: π0 and flow-matching action heads
14 Dual-system architectures: Helix and GR00T N1
15 Datasets, benchmarks, and evaluation

Part 5 — Building with action models

16 Fine-tuning a VLA for your robot
17 Evaluation, safety, and deployment
18 Open problems and what comes next

About this book

This is a textbook for engineers and researchers entering the vision-language-action and robot-learning field in 2026, written from first principles and built up — chapter by chapter — to the frontier systems shipping today: RT-2, OpenVLA, π₀, Helix, and GR00T N1. It is being drafted openly, one section at a time; every new section appears on this site the day it is written, and the table of contents shows clearly what is finished and what is still pending. The intent is a single, coherent path from the basics of robot control to fine-tuning your own VLA — no paywalls, no waiting for the manuscript to be done.