How Asynchronous Multi-Arm Bandit Algorithms Make Candidate Evaluations Reliable and Cost-Effective.

In many real-world domains, such as designing web interfaces to maximize conversions, the candidates need to be evaluated through sampling. The standard evaluation mechanism is to allocate samples uniformly to all the candidates. There are two problems with this approach: First, a fair amount of traffic may be wasted on testing bad designs, increasing the overall cost for candidate evaluations. Second, the performance estimations of candidates are unreliable because the traffic budget is limited. To address these issues, a new mechanism called asynchronous multi-armed bandit (MAB) algorithm was developed in Paper 2 for allocating traffic during optimization. The demos on this page illustrate this algorithm and show how it makes candidate evaluations more reliable and cost-effective. Demos 2.2 and 2.3 then demonstrate how it can be instantiated to identify the winner of evolution more reliably, and how it can be used to improve overall performance in campaign mode (where every evaluation counts).

At the start of optimization, no prior knowledge is available for the asynchronous MAB allocation. As an initial exploration, the method therefore assigns traffic randomly to the five candidates (A..E) shown above. At this very early stage, the behavior of the new mechanism is similar as that in standard uniform allocation. Since only a small amount of traffic is used, the performance estimations are inaccurate for all the candidates.

As the candidates receive more traffic, their performance estimations become more reliable. In contrast with the standard uniform allocation, the asynchronous MAB method assigns more traffic to the more promising candidates (A, B, and C) while occasionally evaluating less promising ones (D and E). A reasonable balance between exploration and exploitation is achieved in this way. The overall conversion rates of the two methods start to diverge.

At the end of Generation 1, an obvious difference between the two methods can be seen: The asynchronous MAB method has assigned most traffic to the best candidate while the standard allocation mechanism has simply allocated traffic uniformly to each candidate. Two significant advantages of asynchronous MAB are evident at this point: First, the overall conversion rate of the optimization process has been significantly increased, because more traffic is allocated to the good candidates. Second, the performance estimations of the good candidates are more reliable since they have received more traffic.  Such reliability is beneficial in evolutionary algorithms, because the best candidates play an important role in creating the next generation of candidates. To illustrate:

In order to generate the second generation of candidates, crossover and mutation operations are performed on the top candidates A, B and C.  Candidates A and B perform better than others, so they survive to next generation as well. The worst candidates C, D, and E are discarded to make room for the new candidates.

At the start of Generation 2, candidates A and B will preserve their statistics in generation 1, and use this information to initialize their reward models in the MAB algorithm. The reward models for the  newly generated candidates AB, BA and AC will be built from scratch. This special initialization mechanism makes the MAB allocation run in an asynchronous manner.

Early during Generation 2, all the candidates receive some amount of traffic except for candidate B. Candidate A preserves its reward model that performed best in Generation 1, so the asynchronous MAB algorithm continues to allocate traffic to A before any other candidate outperforms it. The newly generated candidates BA, AC and AB also have a good chance to receive traffic because the asynchronous MAB algorithm needs to explore them. Since candidate B performs significantly worse than candidate A in Generation 1, the asynchronous MAB algorithm remembers this information and does not spend too much traffic on further exploring it. Once the actual best candidate AB shows promising performance, most traffic is allocated to it, leading to an increase in overall conversion rate and a reliable performance estimation of AB. Candidate A and B also maintain their accurate performance estimations due to preservation of reward models. In contrast to asynchronous MAB algorithm, standard uniform allocation overestimates the performance of A while at the same time it underestimates that of AB. The optimization performance therefore decreases. As in Generation 1, a big difference in overall conversion rate between the two methods can be observed.

In this manner, the Asynchronous MAB method makes it possible to identify good candidates more reliably, to estimate their performance more accurately, and to increase performance during optimization.