Table of contents
All 18 chapters, drafted openly — pending sections are visibly pending, never hidden behind a fake “coming soon.”
DRAFTED
24 / 131
sections
CHAPTERS COMPLETE
3 / 18
PARTS
5
PART ONE
Foundations and a first taste of VLAs
1
The robot learning problem
- 1.1 Why "action" is the hard part of robotics DRAFTED
- 1.2 Anatomy of an action model: inputs, outputs, training signal DRAFTED
- 1.3 A short history, from STRIPS to π0 DRAFTED
- 1.4 The four families of action models DRAFTED
- 1.5 What you will and will not find in this book DRAFTED
- 1.6 Summary DRAFTED
- 1.x Hands-on exercise + chapter references DRAFTED
2
Your first VLA, end-to-end
- 2.1 What we are going to build, and what is hidden inside DRAFTED
- 2.2 Setting up the environment (OpenVLA weights, LIBERO simulator) DRAFTED
- 2.3 Walking through the inference loop one line at a time DRAFTED
- 2.4 When it works and when it does not DRAFTED
- 2.5 What is left for the rest of the book DRAFTED
- 2.6 Summary DRAFTED
- 2.x Hands-on exercise + chapter references DRAFTED
3
Math and ML prerequisites in 30 minutes
- 3.1 Vectors, matrices, gradients, and why the chain rule rules robotics DRAFTED
- 3.2 Random variables, expectations, KL divergence DRAFTED
- 3.3 A 50-line PyTorch training loop, annotated DRAFTED
- 3.4 Three loss families: supervised, RL, self-supervised DRAFTED
- 3.5 Debugging a model that will not train DRAFTED
- 3.6 Summary DRAFTED
- 3.x Hands-on exercise + chapter references DRAFTED
PART TWO
The lineage that produced VLAs
4
Classical action models: planning and inverse dynamics
- 4.1 Symbolic actions: STRIPS, PDDL, and action schemas DRAFTED
- 4.2 Geometric actions: inverse kinematics and motion planning PENDING
- 4.3 Inverse dynamics and computed-torque control PENDING
- 4.4 Where classical methods are still load-bearing in modern robots PENDING
- 4.5 Summary PENDING
- 4.x Hands-on exercise + chapter references PENDING
5
Learning from rewards: MDPs and reinforcement learning
- 5.1 States, actions, rewards, and policies PENDING
- 5.2 Value iteration and policy iteration PENDING
- 5.3 Q-learning and the role of exploration PENDING
- 5.4 Why reward design is the hardest part PENDING
- 5.5 The MDP-to-robot translation problem PENDING
- 5.6 Summary PENDING
- 5.x Hands-on exercise + chapter references PENDING
6
Learning from demonstrations: behavior cloning and imitation learning
- 6.1 Why imitation is the dominant signal in modern robotics PENDING
- 6.2 Behavior cloning, step by step PENDING
- 6.3 Compounding error and DAgger PENDING
- 6.4 A glance at IRL and adversarial imitation PENDING
- 6.5 Choosing between BC, IRL, and RL PENDING
- 6.6 Summary PENDING
- 6.x Hands-on exercise + chapter references PENDING
7
Deep RL for control: DQN to SAC and PPO
- 7.1 Function approximation: from Q-tables to Q-networks PENDING
- 7.2 Policy gradients and the variance problem PENDING
- 7.3 PPO in 100 lines PENDING
- 7.4 Off-policy actor-critic: DDPG, TD3, SAC PENDING
- 7.5 Sim-to-real: domain randomization in one slide PENDING
- 7.6 Summary PENDING
- 7.x Hands-on exercise + chapter references PENDING
PART THREE
Modern building blocks
8
Sequence models meet control
- 8.1 The transformer in two pages, for control PENDING
- 8.2 Decision Transformer: control as conditional sequence modeling PENDING
- 8.3 Trajectory Transformer and beam-search planning PENDING
- 8.4 What gets tokenized: states, actions, returns, language PENDING
- 8.5 Bridge to foundation action models — and to the SSM alternative (RoboMamba) PENDING
- 8.6 Summary PENDING
- 8.x Hands-on exercise + chapter references PENDING
9
World models and model-based learning
- 9.1 What is a world model, really PENDING
- 9.2 Latent dynamics: RSSM and Dreamer PENDING
- 9.3 Planning in latent space PENDING
- 9.4 Video-prediction world models (Genie, V-JEPA) PENDING
- 9.5 World models vs. VLAs: the architecture debate PENDING
- 9.6 Summary PENDING
- 9.x Hands-on exercise + chapter references PENDING
10
Diffusion and flow models for action generation
- 10.1 A 10-minute introduction to diffusion models PENDING
- 10.2 Diffusion Policy and ACT PENDING
- 10.3 Flow matching and rectified flow for action PENDING
- 10.4 Trade-offs: latency, multimodality, smoothness PENDING
- 10.5 Action-head choices in modern VLAs PENDING
- 10.6 Summary PENDING
- 10.x Hands-on exercise + chapter references PENDING
PART FOUR
Foundation action models in depth
11
The VLA recipe: from CLIP to RT-1
- 11.1 CLIP and the multimodal pretraining moment PENDING
- 11.2 Language-conditioned imitation: BC-Z, RT-1 PENDING
- 11.3 Action tokenization: a small idea with large consequences PENDING
- 11.4 What RT-1 changed and what it did not PENDING
- 11.5 The data side: when does scale start to pay off PENDING
- 11.6 Summary PENDING
- 11.x Hands-on exercise + chapter references PENDING
12
Scaling up: RT-2, OpenVLA, and Octo
- 12.1 RT-2: a VLM that also outputs actions PENDING
- 12.2 OpenVLA: an open-source 7B-parameter VLA PENDING
- 12.3 Octo: a generalist policy with a diffusion head PENDING
- 12.4 Open X-Embodiment: the dataset that made all of this possible PENDING
- 12.5 What "emergent" really means in this context PENDING
- 12.6 Summary PENDING
- 12.x Hands-on exercise + chapter references PENDING
13
Smooth control: π0 and flow-matching action heads
- 13.1 The trouble with discrete action tokens PENDING
- 13.2 π0's architecture, end to end PENDING
- 13.3 Flow matching as a control objective PENDING
- 13.4 What π0 can do that earlier VLAs cannot PENDING
- 13.5 Open questions in continuous-action foundation models PENDING
- 13.6 Summary PENDING
- 13.x Hands-on exercise + chapter references PENDING
14
Dual-system architectures: Helix and GR00T N1
- 14.1 Why a single forward pass is not always enough PENDING
- 14.2 Helix: a high-level VLM and a low-level sensorimotor model PENDING
- 14.3 GR00T N1: humanoid-flavored dual systems PENDING
- 14.4 Latency budgets and real-time control PENDING
- 14.5 Deployment case studies (Figure 02, GR00T-enabled humanoids) PENDING
- 14.6 Summary PENDING
- 14.x Hands-on exercise + chapter references PENDING
15
Datasets, benchmarks, and evaluation
- 15.1 What a robot dataset looks like, by example PENDING
- 15.2 Open X-Embodiment in detail PENDING
- 15.3 Sim benchmarks (LIBERO, CALVIN, RoboCasa, SimplerEnv) PENDING
- 15.4 Real-robot evaluation: variance, success rate, time-to-completion PENDING
- 15.5 Building your own evaluation PENDING
- 15.6 Summary PENDING
- 15.x Hands-on exercise + chapter references PENDING
PART FIVE
Building with action models
16
Fine-tuning a VLA for your robot
- 16.1 Picking a base model PENDING
- 16.2 Building a teleop dataset that does not waste your time PENDING
- 16.3 LoRA vs. full fine-tuning vs. action-head-only PENDING
- 16.4 Sim-to-real fine-tuning loops PENDING
- 16.5 A recipe card for new embodiments PENDING
- 16.6 Summary PENDING
- 16.x Hands-on exercise + chapter references PENDING
17
Evaluation, safety, and deployment
- 17.1 Safety as a layer, not a property PENDING
- 17.2 Runtime monitors and shielding PENDING
- 17.3 A/B evaluation on hardware PENDING
- 17.4 Logging, alerting, and rollback PENDING
- 17.5 What we still cannot certify PENDING
- 17.6 Summary PENDING
- 17.x Hands-on exercise + chapter references PENDING
18
Open problems and what comes next
- 18.1 Generalization across embodiments PENDING
- 18.2 Long-horizon and dexterous tasks PENDING
- 18.3 Video-pretrained action models PENDING
- 18.4 Reasoning + action: LLM chains of thought meet control PENDING
- 18.5 What to read next, and how to contribute PENDING
- 18.6 Summary PENDING
- 18.x Hands-on exercise + chapter references PENDING
APPENDICES
A
Linear algebra refresher PENDING
B
Probability and information theory PENDING
C
PyTorch and JAX primer PENDING
D
Setting up a robotics simulator PENDING
E
Canonical references (in TOC docx) DRAFTED
F