Walking robots fall down. Not because their hardware is bad, but because nobody has figured out how to prove, mathematically, that the AI running them will stay upright when something unexpected happens. A reinforcement learning controller can learn to walk in simulation. Getting it to stay walking when the terrain changes, or a door opens, or someone steps in front of it, is a different problem. The robot either adapts or it does not, and until now, the only way to know was to watch it happen.
A team at Caltech and North Carolina State University has published what they describe as a solution. Their system, called HALO, learns a compact representation of a robot's dynamics directly from motion data, then analyzes that representation for stability boundaries. The key move is using Poincaré maps — a technique borrowed from dynamical systems theory — to study the robot's behavior at one discrete moment per step rather than tracking every millisecond of joint motion. That compression makes the math tractable. From that analysis, HALO can predict the region of attraction: the set of perturbations the robot can recover from. If the robot starts inside that region, it stays up. Outside it, it falls, according to a preprint posted to arXiv.
The researchers demonstrated the approach on a simulated hopping robot and a full-body Unitree G1 humanoid. They trained a reinforcement learning policy for the G1 — a 23-degree-of-freedom humanoid that costs roughly $30,000 and is sold as a research platform — using the learned model to guide the policy toward stable walking. The policy outputs joint position commands at 50 hertz, tracked by lower-level controllers, the paper states. In simulation, the learned model correctly predicted when the full robot would fall: stability properties inferred in the compressed latent space transferred back to the actual state space, even with an RL policy in the loop.
The code is open source on GitHub, implemented in JAX and Flax with MuJoCo physics simulation. The paper appeared on arXiv on April 20, 2026.
That is the claim. The caveat is significant: everything has been tested in simulation, not on physical hardware. The researchers have not yet run HALO on an actual robot and measured whether the predicted region of attraction matches what happens in the real world. Aaron Ames, a co-author at Caltech who has a track record building walking controllers for real robots, including the AMBER robot and earlier work on leader-follower locomotion, is the person most likely to close that gap. He did not respond to a request for comment by publication time.
The industrial context matters. Reinforcement learning controllers for legged robots are not theoretical. Companies including Boston Dynamics, Figure, and 1X have deployed or are deploying RL-based locomotion systems in their products. The controllers work. What nobody has had — until now, in simulation — is a way to certify that they do. Lyapunov stability analysis, the mathematical framework HALO draws on, is the tool control engineers reach for when they need formal guarantees about a system's behavior. The problem is that traditional Lyapunov analysis does not scale to high-dimensional, contact-rich dynamics like a humanoid robot making and breaking contact with the ground at every step. HALO's contribution is a data-driven workaround: learn a low-dimensional model where the analysis is possible, then lift the results back to the full system.
Whether the robotics industry actually adopts formal stability certification for learned controllers is an open question. There is no regulatory requirement for it. Commercial deployments of legged robots have proceeded on the basis of testing and simulation, not mathematical proof. But as robots move out of labs and into environments where people work alongside them — warehouses, construction sites, domestic settings — the question of what "safe enough" means becomes harder to avoid. A robot that falls is a liability. A robot whose stability boundaries are known and predictable is a different product.
The next step is hardware. The authors have built real robots before. The code ships. The math is on arXiv. What the field is waiting for is someone to run the experiment that either confirms or refutes whether HALO's predictions hold on a machine that can actually be touched.