Table of contents

All 18 chapters, drafted openly — pending sections are visibly pending, never hidden behind a fake “coming soon.”

DRAFTED 24 / 131 sections
CHAPTERS COMPLETE 3 / 18
PARTS 5

PART ONE

Foundations and a first taste of VLAs

PART TWO

The lineage that produced VLAs

4

Classical action models: planning and inverse dynamics

  1. 4.1 Symbolic actions: STRIPS, PDDL, and action schemas DRAFTED
  2. 4.2 Geometric actions: inverse kinematics and motion planning PENDING
  3. 4.3 Inverse dynamics and computed-torque control PENDING
  4. 4.4 Where classical methods are still load-bearing in modern robots PENDING
  5. 4.5 Summary PENDING
  6. 4.x Hands-on exercise + chapter references PENDING
5

Learning from rewards: MDPs and reinforcement learning

  1. 5.1 States, actions, rewards, and policies PENDING
  2. 5.2 Value iteration and policy iteration PENDING
  3. 5.3 Q-learning and the role of exploration PENDING
  4. 5.4 Why reward design is the hardest part PENDING
  5. 5.5 The MDP-to-robot translation problem PENDING
  6. 5.6 Summary PENDING
  7. 5.x Hands-on exercise + chapter references PENDING
6

Learning from demonstrations: behavior cloning and imitation learning

  1. 6.1 Why imitation is the dominant signal in modern robotics PENDING
  2. 6.2 Behavior cloning, step by step PENDING
  3. 6.3 Compounding error and DAgger PENDING
  4. 6.4 A glance at IRL and adversarial imitation PENDING
  5. 6.5 Choosing between BC, IRL, and RL PENDING
  6. 6.6 Summary PENDING
  7. 6.x Hands-on exercise + chapter references PENDING
7

Deep RL for control: DQN to SAC and PPO

  1. 7.1 Function approximation: from Q-tables to Q-networks PENDING
  2. 7.2 Policy gradients and the variance problem PENDING
  3. 7.3 PPO in 100 lines PENDING
  4. 7.4 Off-policy actor-critic: DDPG, TD3, SAC PENDING
  5. 7.5 Sim-to-real: domain randomization in one slide PENDING
  6. 7.6 Summary PENDING
  7. 7.x Hands-on exercise + chapter references PENDING

PART THREE

Modern building blocks

8

Sequence models meet control

  1. 8.1 The transformer in two pages, for control PENDING
  2. 8.2 Decision Transformer: control as conditional sequence modeling PENDING
  3. 8.3 Trajectory Transformer and beam-search planning PENDING
  4. 8.4 What gets tokenized: states, actions, returns, language PENDING
  5. 8.5 Bridge to foundation action models — and to the SSM alternative (RoboMamba) PENDING
  6. 8.6 Summary PENDING
  7. 8.x Hands-on exercise + chapter references PENDING
9

World models and model-based learning

  1. 9.1 What is a world model, really PENDING
  2. 9.2 Latent dynamics: RSSM and Dreamer PENDING
  3. 9.3 Planning in latent space PENDING
  4. 9.4 Video-prediction world models (Genie, V-JEPA) PENDING
  5. 9.5 World models vs. VLAs: the architecture debate PENDING
  6. 9.6 Summary PENDING
  7. 9.x Hands-on exercise + chapter references PENDING
10

Diffusion and flow models for action generation

  1. 10.1 A 10-minute introduction to diffusion models PENDING
  2. 10.2 Diffusion Policy and ACT PENDING
  3. 10.3 Flow matching and rectified flow for action PENDING
  4. 10.4 Trade-offs: latency, multimodality, smoothness PENDING
  5. 10.5 Action-head choices in modern VLAs PENDING
  6. 10.6 Summary PENDING
  7. 10.x Hands-on exercise + chapter references PENDING

PART FOUR

Foundation action models in depth

11

The VLA recipe: from CLIP to RT-1

  1. 11.1 CLIP and the multimodal pretraining moment PENDING
  2. 11.2 Language-conditioned imitation: BC-Z, RT-1 PENDING
  3. 11.3 Action tokenization: a small idea with large consequences PENDING
  4. 11.4 What RT-1 changed and what it did not PENDING
  5. 11.5 The data side: when does scale start to pay off PENDING
  6. 11.6 Summary PENDING
  7. 11.x Hands-on exercise + chapter references PENDING
12

Scaling up: RT-2, OpenVLA, and Octo

  1. 12.1 RT-2: a VLM that also outputs actions PENDING
  2. 12.2 OpenVLA: an open-source 7B-parameter VLA PENDING
  3. 12.3 Octo: a generalist policy with a diffusion head PENDING
  4. 12.4 Open X-Embodiment: the dataset that made all of this possible PENDING
  5. 12.5 What "emergent" really means in this context PENDING
  6. 12.6 Summary PENDING
  7. 12.x Hands-on exercise + chapter references PENDING
13

Smooth control: π0 and flow-matching action heads

  1. 13.1 The trouble with discrete action tokens PENDING
  2. 13.2 π0's architecture, end to end PENDING
  3. 13.3 Flow matching as a control objective PENDING
  4. 13.4 What π0 can do that earlier VLAs cannot PENDING
  5. 13.5 Open questions in continuous-action foundation models PENDING
  6. 13.6 Summary PENDING
  7. 13.x Hands-on exercise + chapter references PENDING
14

Dual-system architectures: Helix and GR00T N1

  1. 14.1 Why a single forward pass is not always enough PENDING
  2. 14.2 Helix: a high-level VLM and a low-level sensorimotor model PENDING
  3. 14.3 GR00T N1: humanoid-flavored dual systems PENDING
  4. 14.4 Latency budgets and real-time control PENDING
  5. 14.5 Deployment case studies (Figure 02, GR00T-enabled humanoids) PENDING
  6. 14.6 Summary PENDING
  7. 14.x Hands-on exercise + chapter references PENDING
15

Datasets, benchmarks, and evaluation

  1. 15.1 What a robot dataset looks like, by example PENDING
  2. 15.2 Open X-Embodiment in detail PENDING
  3. 15.3 Sim benchmarks (LIBERO, CALVIN, RoboCasa, SimplerEnv) PENDING
  4. 15.4 Real-robot evaluation: variance, success rate, time-to-completion PENDING
  5. 15.5 Building your own evaluation PENDING
  6. 15.6 Summary PENDING
  7. 15.x Hands-on exercise + chapter references PENDING

PART FIVE

Building with action models

16

Fine-tuning a VLA for your robot

  1. 16.1 Picking a base model PENDING
  2. 16.2 Building a teleop dataset that does not waste your time PENDING
  3. 16.3 LoRA vs. full fine-tuning vs. action-head-only PENDING
  4. 16.4 Sim-to-real fine-tuning loops PENDING
  5. 16.5 A recipe card for new embodiments PENDING
  6. 16.6 Summary PENDING
  7. 16.x Hands-on exercise + chapter references PENDING
17

Evaluation, safety, and deployment

  1. 17.1 Safety as a layer, not a property PENDING
  2. 17.2 Runtime monitors and shielding PENDING
  3. 17.3 A/B evaluation on hardware PENDING
  4. 17.4 Logging, alerting, and rollback PENDING
  5. 17.5 What we still cannot certify PENDING
  6. 17.6 Summary PENDING
  7. 17.x Hands-on exercise + chapter references PENDING
18

Open problems and what comes next

  1. 18.1 Generalization across embodiments PENDING
  2. 18.2 Long-horizon and dexterous tasks PENDING
  3. 18.3 Video-pretrained action models PENDING
  4. 18.4 Reasoning + action: LLM chains of thought meet control PENDING
  5. 18.5 What to read next, and how to contribute PENDING
  6. 18.6 Summary PENDING
  7. 18.x Hands-on exercise + chapter references PENDING

APPENDICES

A

Linear algebra refresher PENDING

B

Probability and information theory PENDING

C

PyTorch and JAX primer PENDING

D

Setting up a robotics simulator PENDING

E

Canonical references (in TOC docx) DRAFTED

F

VLA model zoo (in TOC docx) DRAFTED