ESP Visualization

This video compares ESP with direct evolution and the standard reinforcement learning method PPO in a function approximation domain where their progress can be easily visualized. The ground truth is a sine function, i.e. optimal actions (in y) are a sine function of the contexts (in x). While the Predictor learns to approximate the ground truth, it is rather irregular. However, the prescriptor already finds actions that regularize and are better than could be expected from the Predictor. In contrast, direct evolution and PPO cannot find as good solutions even when they utilize 10x as many samples.