Brown Team Uses Dog-Inspired Cues to Boost Robot Fetch Accuracy to 89%
Brown University researchers say they’ve improved robot object-finding by combining two human signals most systems treat separately: language and pointing gestures. In work led by Ivy Xiao He, the team reports an **89% success rate** in complex object-retrieval environments using a new planning ...

Brown University researchers say they’ve improved robot object-finding by combining two human signals most systems treat separately: language and pointing gestures.
In work led by Ivy Xiao He, the team reports an 89% success rate in complex object-retrieval environments using a new planning framework called LEGS-POMDP (Language and Gesture-Guided Object Search in Partially Observable Environments), according to Brown’s announcement and the arXiv preprint.
The system uses a probabilistic planner (POMDP) to handle uncertainty — for example, when objects are partially hidden, duplicated, or visually ambiguous — and updates beliefs as the robot gathers more evidence. Critically, it fuses natural-language instructions with a gesture model informed by canine cognition research from Brown’s Dog Lab.
The gesture model treats pointing as a probability cone rather than a single exact target, based on human eye-gaze and arm geometry (eye-elbow-wrist alignment). That gives the robot a more realistic estimate of what a person likely means when they point.
The team then combines that gesture signal with a vision-language model, so the robot can reason jointly over what a user says and where they appear to indicate. In lab tests on a quadruped robot, multimodal fusion outperformed language-only or gesture-only approaches.
The paper is scheduled for presentation at the ACM/IEEE International Conference on Human-Robot Interaction (HRI 2026) in Edinburgh.
What this means
This isn’t a flashy new foundation model. It’s something more useful: better interaction design for real robots. The notable step is not “robots understand language now” — we’ve heard that for years — but that the team formalizes messy human communication (words + pointing + ambiguity) into a deployable planning loop. If this transfers beyond lab setups, it could materially improve assistant robots in homes, hospitals, and warehouses.
