Bipedal walking robot using Deep Deterministic Policy Gradient

Computational Intelligence Laboratory, Aerospace Engineering Division, IISc.

Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem.

The autonomous stable walking of the planar bipedal walker in simulation environment is achieved using Deep Deterministic Policy Gradient(DDPG) based on an actor-critic learning framework [for continuos action space].

Fig.1 Actor-Critic Learning network.

The State space includes robot joint angles, joint angular velocities, waist height, etc. The action space covers the predicted joint torques.

Fig.2 Learning to walk - initial baby steps.

The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s [see Fig.2]. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern.

Fig.3 Stable Bipedal walking robot.

References:

Lillicrap, Timothy P., et al.Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).[pdf]
Silver, David, et al.Deterministic Policy Gradient Algorithms. ICML (2014).[pdf]

Code Paper Video