How Best Arm Identification Mode Selects Good Solutions Reliably

One traditional goal for optimization in uncertain domains, like designing web interfaces to maximize conversion rate, is to identify a single best design that can then be deployed for long-term use. It is therefore critical that this design is chosen correctly (so it performs well in the future), and that its performance is estimated reliably (so its performance is not disappointing). Based on the new asynchronous multi-armed bandit (MAB) algorithm presented in Paper 2, a Best Arm Identification (BAI) Mode was developed to increase the reliability and quality of the optimization result. In BAI Mode, an elite pool is maintained by collecting good candidates over the optimization process. At the end of optimization, an additional BAI phase is conducted by running a pure-exploration MAB algorithm on the elite pool. Candidates in the elite pool will be eliminated one after one until only one candidate survives. A final winner will be returned after BAI phase with better guarantee of reliability and quality. The demo below demonstrates the effectiveness of BAI Mode.

This demo shows the performance difference between the BAI Mode and traditional approach (i.e. the standard EA with uniform traffic allocation) in a simulation of Ascend conversion rate optimization task. The performance is measured via best conversion rate, which is the actual conversion rate of the candidate that is estimated to be best by the algorithms. The box-and-whisker plots show the distributions of the best conversion rates over 500 independent runs for both approaches. The box extends from the 25 to 75 quartile values of the data, with a line at the median. The whiskers extend from the box to show the range of the data. Flier points are those past the end of the whiskers, indicating the outliers. The solid lines depict the movement of average. From the plot it is clear that the BAI mode outperforms the traditional approach consistently in all the aspects: worst-case, median, average and best-case performance (the theoretical optimal performance for the testing environment is 0.08494, which is found by the BAI mode only). Even if the traditional approach starts to converge after generation 6, the BAI mode still has power to improve the quality of candidates. All these results demonstrate that the BAI mode can improve the reliability of optimization substantially.