Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents).
This paper presents a reinforcement learning (RL)–based training pipeline that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball–goal configurations. The pipeline extends a typical teacher–student training framework—in which a teacher policy is trained with ground truth state information and the student learns to mimic it with noisy, imperfect sensing—by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements—including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement—are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty.
Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball–goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a training pipeline for robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.
A continuous ball-kicking rollout on the real Booster T1 robot. Each cycle integrates long-distance chasing, a directional kick, and reorienting to locate the ball for the next attempt.
We train a control policy that maps a history of proprioceptive measurements together with noisy ball and goal position estimates to joint position targets on a humanoid robot at 50 Hz. A complete kicking cycle integrates three key phases: (i) approaching the ball from long distance, (ii) performing a kick motion that directs the ball toward the goal, and (iii) reorienting to locate the ball and seamlessly initiating the next kicking attempt.
The policy is learned through a four-stage training framework:
We deploy the learned policy on a Booster T1 humanoid robot (1.18 m tall, 23 DoF), using a ZED 2i stereo camera and onboard NVIDIA AGX Orin GPU. A YOLOv8-based pipeline detects the ball in the RGB-D stream and unprojects it into the robot's base frame. Goal localization relies on a lightweight data-driven legged-inertial odometry module that fuses IMU signals with leg kinematics through a 1D temporal convolutional network predicting relative SE(2) pose changes.
Across five ball positions on a RoboCup Kid-Size field, the robot is initialized 6.5 m in front of the goal and executes three trials per position. The policy achieves an overall success rate of 66.7%.
Center spot (x = 4.5, y = 0.0) — 3/3
In simulation, across a 9 × 9 grid of initial ball positions with 50 trials each, the policy achieves an average success rate of 79.5% and an average kick accuracy of 0.956, with a maximum ball velocity of 4.13 m/s. On the real Booster T1 robot, the policy attains an overall success rate of 66.7% across five ball–goal configurations on a RoboCup Kid-Size field.
Ablations confirm the necessity of the core design choices:
@inproceedings{xu2026humanoidsoccer,
author = {Xu, Zifan and Seo, Myoungkyu and Lee, Dongmyeong and Fu, Hao and
Hu, Jiaheng and Cui, Jiaxun and Jiang, Yuqian and Wang, Zhihan and
Brund, Anastasiia and Biswas, Joydeep and Stone, Peter},
title = {Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input},
booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
year = {2026},
}