How Asynchronous Multi-Arm Bandit Algorithms Optimize Performance Over the Entire Campaign

In some cases of evolutionary optimization in uncertain domains, the goal is to maintain a high overall conversion rate during the optimization process instead of identifying a single best candidate at the end of optimization. These cases can be seen as limited-duration campaigns, such as those in advertising.  The new asynchronous multi-armed bandit (MAB) algorithm can be instantiated in Campaign Mode to optimize the overall performance across the entire campaign.

This demo illustrates the effectiveness of asynchronous MAB algorithm presented in Paper 2 in increasing the overall conversion rate over the entire campaign in an Ascend simulation. The results are averaged over 500 independent runs. Thompson Sampling (TS), which is a classical MAB algorithm, is modified to implement an asynchronous version. The overall conversion rates of asynchronous TS, original TS and traditional approach (i.e. a standard EA with a uniform traffic allocation) are compared over generations. As can be seen from the graph the asynchronous TS performs significantly better than both original TS and traditional approach in terms of overall conversion rate. The performance of asynchronous TS starts to diverge from original TS at generation 2. This pattern demonstrates the effectiveness of statistic preservation in the new asynchronous mechanism. The algorithm allocates more traffic to the existing elites without reevaluating them from scratch, thus improving overall conversion rate.