The ESP Framework for Optimizing Decision Making

ESP consists of two models: The Predictor can be any machine learning model, such as a random forest or a neural network. It is trained with gradient descent on historical data to map contexts and actions to outcomes. The Prescriptor is a neural network that is constructed through evolution: Because the optimal actions are not know gradient descent cannot be used. The Prescriptor maps the contexts to actions that lead to desirable (optimized) Outcomes.

The Predictor can be trained first with a static dataset if such a set is available, and the Prescriptor evolved against it. However, it is also possible to embed the system in an outer loop with the real world, where occasionally the prescriptions are implemented in the world, obtaining more training data for the Predictor.

The ESP framework and its components have been applied to several machine learning benchmarks, as well as real-world applications including optimizing growth recipes for agriculture, and designing webpages that maximize conversions. In this site it is demonstrated in sequential decision making benchmarks as well as a topical real-world application of optimizing non-pharmaceutical interventions in the COVID-19 pandemic.