This Beijing Startup Used Primitive Building Blocks to Teach a Humanoid Athletic Skills

This Beijing Startup Used Primitive Building Blocks to Teach a Humanoid Athletic Skills — type0 | type0

A humanoid robot built by a Beijing robotics startup can now play a sustained game of tennis against a human — and the trick isn't better hardware, it's how you teach it.

Researchers from Tsinghua University, Peking University, Galbot, and Shanghai AI Laboratory published a preprint on arXiv on March 13 describing LATENT, a system that learns athletic skills from imperfect human motion capture data rather than requiring precise kinematic records or hours of court-side filming. The approach: capture five hours of primitive tennis movements from five amateur players in a space roughly the size of a bedroom, let a reinforcement learning policy experiment at speed in simulation, and deploy the resulting skill set on a Unitree G1 humanoid robot. The result, per the researchers: around 90 percent forehand accuracy and 78 percent backhand accuracy in real-world play, with the robot able to sustain multi-shot rallies with human opponents.

The data efficiency is the headline. Earlier approaches to robotic tennis required motion capture across full court dimensions or relied on AI systems like Nvidia's Vid2Player3D to extract technique from multi-camera TV footage — what the LATENT paper calls a pipeline requiring substantial expertise and engineering effort. LATENT compresses the problem: give the robot the building blocks of tennis motion (forehands, backhands, shuffle steps, crossover steps) and let a high-level policy figure out how to deploy them against an incoming ball. The policy operates in a latent action space — it doesn't directly command motor torques, it selects and adjusts from a learned repertoire of human-like motions. Two novel designs in that latent space allow the policy to correct and compose imperfect primitives in real time.

The Unitree G1, Galbot's 26-joint, open-source humanoid platform, ran the resulting policy in the real world. In simulation, LATENT achieved 96.5 percent accuracy; in testing against live humans, the numbers dropped but held — roughly 90 percent forehand, 78 percent backhand. Galbot posted video of the system in action on March 16, showing sustained rallies with natural whole-body motion and what the team describes as millisecond-order reaction times.

The preprint has not yet been peer reviewed — a caveat worth naming before the demo video racks up views. The gap between controlled demo and chaotic real-court conditions (wind, lighting variance, ball spin) remains untested. And five amateur players, even if they provided enough variety to generalize, is not a large training set.

But the approach could transfer. The LATENT team writes that the framework generalizes to other athletic skills beyond tennis — football and badminton get mentions in the paper. Galbot, which has raised roughly $150 million in funding and counts NVIDIA's Jetson Thor platform as a hardware partner, has open-sourced the code on GitHub. If the method holds across sports and labs, it could lower the data barrier for teaching humanoid robots dynamic physical skills that have historically required either exhaustive motion capture or elaborate simulation pipelines.

For context: UBTech's Walker S2 robot demonstrated tennis capability in January 2026, but with a more constrained rally setup. The LATENT result is qualitatively different — multi-shot, open-ended rallies rather than scripted exchanges — and the path to getting there used less data. Whether that translates to a robot that could hold its own on a real court, let alone against a player with any actual skill, is a question for the next round of experiments.