Linearly Parameterized Bandits
<-- Return to the list
Date: 12-02-2008
Start Time:
1:00pm
End Time: 2:00pm
Speaker: Paat Rusmevichientong, School of Operations Research and Information Engineering: Cornell University
Location: Uris 333
ABSTRACT
Motivated by applications in e-commerce and revenue management, we consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of a multivariate random vector. The objective is to choose a sequence of arms to minimize the cumulative regret and Bayes risk. We propose a policy based on least squares estimation and uncertainty ellipsoids, which generalizes the upper confidence index approach pioneered by Lai and Robbins (1985). The cumulative regret and Bayes risk under our proposed policy are independent of the number of arms. We also establish lower bounds on the regret and risk, showing that our proposed policy is nearly optimal.
This is a joint work with John Tsitsiklis (MIT).
BIO
Paat Rusmevichientong is an Assistant Professor in the School of Operations Research and Information Engineering at Cornell University. Before joining the faculty at Cornell, he worked in the data mining and personalization group at Amazon.com. His research interests include data mining, information technology, and stochastic optimization, with applications to supply chain and revenue management. He has received the NSF CAREER Award and the Cornell Sonny Yau '72 Excellence in Teaching Award.