Action Models for Robot Learning

All 18 chapters, drafted openly — pending sections are visibly pending, never hidden behind a fake “coming soon.”

DRAFTED 72 / 131 sections

CHAPTERS COMPLETE 9 / 18

PARTS 5

PART ONE

Foundations and a first taste of VLAs

The robot learning problem

1.1 Why "action" is the hard part of robotics DRAFTED
1.2 Anatomy of an action model: inputs, outputs, training signal DRAFTED
1.3 A short history, from STRIPS to π0 DRAFTED
1.4 The four families of action models DRAFTED
1.5 What you will and will not find in this book DRAFTED
1.6 Summary DRAFTED
1.x Hands-on exercise + chapter references DRAFTED

Your first VLA, end-to-end

2.1 What we are going to build, and what is hidden inside DRAFTED
2.2 Setting up the environment (OpenVLA weights, LIBERO simulator) DRAFTED
2.3 Walking through the inference loop one line at a time DRAFTED
2.4 When it works and when it does not DRAFTED
2.5 What is left for the rest of the book DRAFTED
2.6 Summary DRAFTED
2.x Hands-on exercise + chapter references DRAFTED

Math and ML prerequisites in 30 minutes

3.1 Vectors, matrices, gradients, and why the chain rule rules robotics DRAFTED
3.2 Random variables, expectations, KL divergence DRAFTED
3.3 A 50-line PyTorch training loop, annotated DRAFTED
3.4 Three loss families: supervised, RL, self-supervised DRAFTED
3.5 Debugging a model that will not train DRAFTED
3.6 Summary DRAFTED
3.x Hands-on exercise + chapter references DRAFTED

PART TWO

The lineage that produced VLAs

Classical action models: planning and inverse dynamics

4.1 Symbolic actions: STRIPS, PDDL, and action schemas DRAFTED
4.2 Geometric actions: inverse kinematics and motion planning DRAFTED
4.3 Inverse dynamics and computed-torque control DRAFTED
4.4 Where classical methods are still load-bearing in modern robots DRAFTED
4.5 Summary DRAFTED
4.x Hands-on exercise + chapter references DRAFTED

Learning from rewards: MDPs and reinforcement learning

5.1 States, actions, rewards, and policies DRAFTED
5.2 Value iteration and policy iteration DRAFTED
5.3 Q-learning and the role of exploration DRAFTED
5.4 Why reward design is the hardest part DRAFTED
5.5 The MDP-to-robot translation problem DRAFTED
5.6 Summary DRAFTED
5.x Hands-on exercise + chapter references DRAFTED

Learning from demonstrations: behavior cloning and imitation learning

6.1 Why imitation is the dominant signal in modern robotics DRAFTED
6.2 Behavior cloning, step by step DRAFTED
6.3 Compounding error and DAgger DRAFTED
6.4 A glance at IRL and adversarial imitation DRAFTED
6.5 Choosing between BC, IRL, and RL DRAFTED
6.6 Summary DRAFTED
6.x Hands-on exercise + chapter references DRAFTED

Deep RL for control: DQN to SAC and PPO

7.1 Function approximation: from Q-tables to Q-networks DRAFTED
7.2 Policy gradients and the variance problem DRAFTED
7.3 PPO in 100 lines DRAFTED
7.4 Off-policy actor-critic: DDPG, TD3, SAC DRAFTED
7.5 Sim-to-real: domain randomization in one slide DRAFTED
7.6 Summary DRAFTED
7.x Hands-on exercise + chapter references DRAFTED

PART THREE

Modern building blocks

Sequence models meet control

8.1 The transformer in two pages, for control DRAFTED
8.2 Decision Transformer: control as conditional sequence modeling DRAFTED
8.3 Trajectory Transformer and beam-search planning DRAFTED
8.4 What gets tokenized: states, actions, returns, language DRAFTED
8.5 Bridge to foundation action models — and to the SSM alternative (RoboMamba) DRAFTED
8.6 Summary DRAFTED
8.x Hands-on exercise + chapter references DRAFTED

World models and model-based learning

9.1 What is a world model, really DRAFTED
9.2 Latent dynamics: RSSM and Dreamer DRAFTED
9.3 Planning in latent space DRAFTED
9.4 Video-prediction world models (Genie, V-JEPA) DRAFTED
9.5 World models vs. VLAs: the architecture debate DRAFTED
9.6 Summary DRAFTED
9.x Hands-on exercise + chapter references DRAFTED

Diffusion and flow models for action generation

10.1 A 10-minute introduction to diffusion models DRAFTED
10.2 Diffusion Policy and ACT DRAFTED
10.3 Flow matching and rectified flow for action DRAFTED
10.4 Trade-offs: latency, multimodality, smoothness DRAFTED
10.5 Action-head choices in modern VLAs PENDING
10.6 Summary PENDING
10.x Hands-on exercise + chapter references PENDING

PART FOUR

Foundation action models in depth

The VLA recipe: from CLIP to RT-1

11.1 CLIP and the multimodal pretraining moment PENDING
11.2 Language-conditioned imitation: BC-Z, RT-1 PENDING
11.3 Action tokenization: a small idea with large consequences PENDING
11.4 What RT-1 changed and what it did not PENDING
11.5 The data side: when does scale start to pay off PENDING
11.6 Summary PENDING
11.x Hands-on exercise + chapter references PENDING

Scaling up: RT-2, OpenVLA, and Octo

12.1 RT-2: a VLM that also outputs actions PENDING
12.2 OpenVLA: an open-source 7B-parameter VLA PENDING
12.3 Octo: a generalist policy with a diffusion head PENDING
12.4 Open X-Embodiment: the dataset that made all of this possible PENDING
12.5 What "emergent" really means in this context PENDING
12.6 Summary PENDING
12.x Hands-on exercise + chapter references PENDING

Smooth control: π0 and flow-matching action heads

13.1 The trouble with discrete action tokens PENDING
13.2 π0's architecture, end to end PENDING
13.3 Flow matching as a control objective PENDING
13.4 What π0 can do that earlier VLAs cannot PENDING
13.5 Open questions in continuous-action foundation models PENDING
13.6 Summary PENDING
13.x Hands-on exercise + chapter references PENDING

Dual-system architectures: Helix and GR00T N1

14.1 Why a single forward pass is not always enough PENDING
14.2 Helix: a high-level VLM and a low-level sensorimotor model PENDING
14.3 GR00T N1: humanoid-flavored dual systems PENDING
14.4 Latency budgets and real-time control PENDING
14.5 Deployment case studies (Figure 02, GR00T-enabled humanoids) PENDING
14.6 Summary PENDING
14.x Hands-on exercise + chapter references PENDING

Datasets, benchmarks, and evaluation

15.1 What a robot dataset looks like, by example PENDING
15.2 Open X-Embodiment in detail PENDING
15.3 Sim benchmarks (LIBERO, CALVIN, RoboCasa, SimplerEnv) PENDING
15.4 Real-robot evaluation: variance, success rate, time-to-completion PENDING
15.5 Building your own evaluation PENDING
15.6 Summary PENDING
15.x Hands-on exercise + chapter references PENDING

PART FIVE

Building with action models

Fine-tuning a VLA for your robot

16.1 Picking a base model PENDING
16.2 Building a teleop dataset that does not waste your time PENDING
16.3 LoRA vs. full fine-tuning vs. action-head-only PENDING
16.4 Sim-to-real fine-tuning loops PENDING
16.5 A recipe card for new embodiments PENDING
16.6 Summary PENDING
16.x Hands-on exercise + chapter references PENDING

Evaluation, safety, and deployment

17.1 Safety as a layer, not a property PENDING
17.2 Runtime monitors and shielding PENDING
17.3 A/B evaluation on hardware PENDING
17.4 Logging, alerting, and rollback PENDING
17.5 What we still cannot certify PENDING
17.6 Summary PENDING
17.x Hands-on exercise + chapter references PENDING

Open problems and what comes next

18.1 Generalization across embodiments PENDING
18.2 Long-horizon and dexterous tasks PENDING
18.3 Video-pretrained action models PENDING
18.4 Reasoning + action: LLM chains of thought meet control PENDING
18.5 What to read next, and how to contribute PENDING
18.6 Summary PENDING
18.x Hands-on exercise + chapter references PENDING

Table of contents