Optimizing Flappy Bird Game Agents

In this game, the game is to fly through a course of gates (or pipes) for two minutes without crashing. The horizontal speed is constant and the pipes are equally spaced; a discrete flapping action pushes the bird up with a constant force, and gravity pulls it continuously down.

The first video shows the AI-controlled bird flying not in the actual game, but against a surrogate. It rewards approximate behaviors so that the agent can get better gradually: finding the opening in the pipes, getting through it, anticipating pipes that follow, expanding to wider variation of gap locations. The second video shows the entire population of agents through one trial. One after another the candidates hit pipes and are eliminated, until only the best ones remain, representing the best approaches so far. The third video is the final agent running through the entire course. Its behavior is highly sophisticate, better than most humans.

The demo illustrates how ESP uses the surrogate to make learning easier and to regularize the behaviors, allowing them to generalize better. In contrast, direct evolution would overfit to the nonlinearities of the game and not discover as proficient strategies.