Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Zifan Xu¹, Myoungkyu Seo¹, Dongmyeong Lee¹, Hao Fu¹, Jiaheng Hu¹, Jiaxun Cui¹, Yuqian Jiang¹, Zhihan Wang¹, Anastasiia Brund¹, Joydeep Biswas¹, Peter Stone^1,2

¹The University of Texas at Austin, ²Sony AI

Abstract

Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents).

This paper presents a reinforcement learning (RL)–based training pipeline that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball–goal configurations. The pipeline extends a typical teacher–student training framework—in which a teacher policy is trained with ground truth state information and the student learns to mimic it with noisy, imperfect sensing—by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements—including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement—are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty.

Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball–goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a training pipeline for robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.

Method Overview

We train a control policy that maps a history of proprioceptive measurements together with noisy ball and goal position estimates to joint position targets on a humanoid robot at 50 Hz. A complete kicking cycle integrates three key phases: (i) approaching the ball from long distance, (ii) performing a kick motion that directs the ball toward the goal, and (iii) reorienting to locate the ball and seamlessly initiating the next kicking attempt.

The policy is learned through a four-stage training framework:

Long-Distance Chasing. A privileged teacher policy, with access to ground-truth ball position, learns a robust walking gait to approach the ball from diverse initial configurations.
Directional Kicking. The teacher policy acquires precise and robust kick motions using privileged ball and goal positions, with domain randomization encouraging recovery from imperfect states (e.g., missed kicks, tilted postures).
Teacher Policy Distillation. The privileged teacher is distilled into a student policy via DAgger. Imperfect perception is modeled with three components: a velocity-dependent noise model, delayed and asynchronous updates, and frame drops caused by occlusion.
Student Adaptation and Refinement. An online constrained RL algorithm (N-P3O) adapts the student policy to partially observed states introduced by noisy perception, and mitigates inhomogeneous credit assignment that causes jittery leg motions and unsafe sharp turns.

Real-World Deployment

We deploy the learned policy on a Booster T1 humanoid robot (1.18 m tall, 23 DoF), using a ZED 2i stereo camera and onboard NVIDIA AGX Orin GPU. A YOLOv8-based pipeline detects the ball in the RGB-D stream and unprojects it into the robot's base frame. Goal localization relies on a lightweight data-driven legged-inertial odometry module that fuses IMU signals with leg kinematics through a 1D temporal convolutional network predicting relative SE(2) pose changes.

Real-World Experiments

Across five ball positions on a RoboCup Kid-Size field, the robot is initialized 6.5 m in front of the goal and executes three trials per position. The policy achieves an overall success rate of 66.7%.

Center spot (x = 4.5, y = 0.0) — 3/3

Results

In simulation, across a 9 × 9 grid of initial ball positions with 50 trials each, the policy achieves an average success rate of 79.5% and an average kick accuracy of 0.956, with a maximum ball velocity of 4.13 m/s. On the real Booster T1 robot, the policy attains an overall success rate of 66.7% across five ball–goal configurations on a RoboCup Kid-Size field.

Ablations confirm the necessity of the core design choices:

Constrained RL (N-P3O) vs. PPO. N-P3O achieves 79.5% success with energy cost 108.6 J/s, compared to 64.8% / 255.8 J/s for PPO with fixed regularization, while producing noticeably smoother, safer motions.
Online Student Adaptation. Adaptation improves success rate from 52.3% to 79.5% and kick accuracy from 0.807 to 0.956, while reducing energy consumption by more than half, matching the privileged teacher's performance (81.1%).

BibTeX

@inproceedings{xu2026humanoidsoccer, author = {Xu, Zifan and Seo, Myoungkyu and Lee, Dongmyeong and Fu, Hao and Hu, Jiaheng and Cui, Jiaxun and Jiang, Yuqian and Wang, Zhihan and Brund, Anastasiia and Biswas, Joydeep and Stone, Peter}, title = {Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input}, booktitle = {IEEE International Conference on Robotics and Automation (ICRA)}, year = {2026}, }