About

An open-access textbook on action models — the policies that turn perception and language into robot behavior — drafted in public, one section at a time.

The book organizes the field around four families of action models — symbolic planners, geometric and inverse-dynamics controllers, value-based policies learned from reward, and policies learned from demonstration — and builds the modern recipe behind RT-2, OpenVLA, π₀, Helix, and GR00T N1 from first principles, rather than surveying the literature or following any single lab's recipe. The table of contents shows what is finished and what is still pending.

I am Pavan Kumar Kandapagari. I lead a foundation-models-for-robotics team in Munich, and I have spent the past three years working toward production vision-language-action policies — moving through LLM-as-planner systems, imitation-learning baselines from RT-1, RTX, and Octo, and diffusion- and flow-matching action decoders, before designing and pretraining the architecture our team now ships. I built and hired the fifteen-person research, infrastructure, and evaluation team behind that work, and the hardware benchmarking suite we use to evaluate our policies on real robots against open-source baselines like π₀ and GR00T. The book is the long-form version of the notes that work produced.

I am writing this in the open because the field moves fast enough that drafts read by strangers are more useful than a manuscript polished in private, and because every section that stands up to a public read is one less section I have to revise later. There is no paywall, and there is no plan to add one.

The book lives on GitHub at github.com/kandapagari/in-action-book — issues and pull requests are welcome. You can also reach me at pavan.kandapagari@gmail.com.