Action Models for Robot Learning

by Pavan Kumar Kandapagari · open-access draft, updated continuously

A textbook on the policies that turn perception and language into robot behavior. The book traces the lineage that produced today's vision-language-action systems — RT-1, RT-2, OpenVLA, π0, Helix, and GR00T N1 — and organizes the field around four families of action models: symbolic planners, geometric and inverse-dynamics controllers, value-based policies learned from reward, and learned policies trained from demonstration. We build the modern recipe from first principles, one section at a time, and end with what it actually takes to fine-tune and deploy a VLA on a real robot.

Start reading → Chapter 1 Browse the table of contents →

Recently drafted

The book at a glance

Part 2 — The lineage that produced VLAs

  1. 4 Classical action models: planning and inverse dynamics
  2. 5 Learning from rewards: MDPs and reinforcement learning
  3. 6 Learning from demonstrations: behavior cloning and imitation learning
  4. 7 Deep RL for control: DQN to SAC and PPO

Part 3 — Modern building blocks

  1. 8 Sequence models meet control
  2. 9 World models and model-based learning
  3. 10 Diffusion and flow models for action generation

Part 4 — Foundation action models in depth

  1. 11 The VLA recipe: from CLIP to RT-1
  2. 12 Scaling up: RT-2, OpenVLA, and Octo
  3. 13 Smooth control: π0 and flow-matching action heads
  4. 14 Dual-system architectures: Helix and GR00T N1
  5. 15 Datasets, benchmarks, and evaluation

Part 5 — Building with action models

  1. 16 Fine-tuning a VLA for your robot
  2. 17 Evaluation, safety, and deployment
  3. 18 Open problems and what comes next

About this book

This is a textbook for engineers and researchers entering the vision-language-action and robot-learning field in 2026, written from first principles and built up — chapter by chapter — to the frontier systems shipping today: RT-2, OpenVLA, π0, Helix, and GR00T N1. It is being drafted openly, one section at a time; every new section appears on this site the day it is written, and the table of contents shows clearly what is finished and what is still pending. The intent is a single, coherent path from the basics of robot control to fine-tuning your own VLA — no paywalls, no waiting for the manuscript to be done.