A textbook on the policies that turn perception and language into robot behavior. The book traces the lineage that produced today's vision-language-action systems — RT-1, RT-2, OpenVLA, π0, Helix, and GR00T N1 — and organizes the field around four families of action models: symbolic planners, geometric and inverse-dynamics controllers, value-based policies learned from reward, and learned policies trained from demonstration. We build the modern recipe from first principles, one section at a time, and end with what it actually takes to fine-tune and deploy a VLA on a real robot.
Start reading → Chapter 1 Browse the table of contents →This is a textbook for engineers and researchers entering the vision-language-action and robot-learning field in 2026, written from first principles and built up — chapter by chapter — to the frontier systems shipping today: RT-2, OpenVLA, π0, Helix, and GR00T N1. It is being drafted openly, one section at a time; every new section appears on this site the day it is written, and the table of contents shows clearly what is finished and what is still pending. The intent is a single, coherent path from the basics of robot control to fine-tuning your own VLA — no paywalls, no waiting for the manuscript to be done.