Z Wang, « Practical and theoretical advances in Bayesian optimization », Oxford Research Archive, ID : 10670/1.559801...
Bayesian optimization forms a set of powerful tools that allows efficient blackbox optimization and has general applications in a large variety of fields. In this work we seek to advance Bayesian optimization both in the theoretical and the practical fronts as well as apply Bayesian optimization to novel and difficult problems in order to advance the state of the art. Chapter 1 gives a broad overview of Bayesian optimization. We start by covering the published applications of Bayesian optimization. The chapter then proceeds to introduce the essential ingredients of Bayesian optimization in depth. After going through some practical considerations, the theory and history of Bayesian optimization, we end the chapter with a discussion on the latest extensions and open problems. In Chapters 2-4, we solve three outstanding problems in the Bayesian optimization literature. Traditional Bayesian optimization approaches need to solve an auxiliary non-convex global optimization problem in the inner loop. The difficulties in solving this auxiliary optimization problem not only break the assumptions of most theoretical works in this area but also lead to computationally inefficient solutions. In Chapter 2, we propose the first algorithm in Bayesian optimization that does not need to solve auxiliary optimization problems and prove its convergence. In Bayesian optimization, it is often important to tune the hyper-parameters of the underlying Gaussian processes. There did not exist theoretical results that allow noisy observations and at the same time varying hyper-parameters. Chapter 3, proves the first such result. Bayesian optimization is very effective when the dimensionality of the problem is low. Scaling Bayesian optimization to high dimensionality, however, has been a long standing open problem of the field. In Chapter 4, we develop an algorithm that extends Bayesian optimization to very high dimensionalities where the underlying problems have low intrinsic dimensionality. We also prove theoretical guarantees of the proposed algorithm. In Chapter 5, we turn our attention to improving an essential component of Bayesian optimization: acquisition functions. Acquisition functions form a critical component of Bayesian optimization and yet there does not exist an optimal acquisition function that is easily computable. Instead of relying on one acquisition function, we develop a new information-theoretic portfolio of acquisition functions. We show empirically that our approach is more effective than any single acquisition function in the portfolio. Last but not least, in Chapter 6 we adapt Bayesian optimization to derive an adaptive Hamiltonian Monte Carlo sampler. Hamiltonian Monte Carlo is one of the most effective MCMC algorithms. It is, however, notoriously difficult to tune. In this chapter, we follow the approach of adapting Markov chains in order to improve their convergence where our adaptive strategy is based on Bayesian optimization. We provide theoretical analysis as well as a comprehensive set of experiments demonstrating the effectiveness of our proposed algorithm.