Press "Enter" to skip to content

Saddle-Point Perspective of Reinforcement Learning (Dr. Donghwan Lee)

Speaker

Donghwan Lee (Post-doc, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign)

Time & Location

Mar. 22 (Fri) 15:00 / Building 133 Room 316-1

Abstract

Reinforcement learning (RL) is a subfield of machine learning which addresses the problem of how a decision maker can learn an optimal decision making rule (policy) to maximize long-term cumulative rewards (value function), while interacting with unknown environment. Lying at the core of many sequential decision problems, such as robotics and artificial intelligence (AI), RL has recently captured significant attentions in the AI and control community for outperforming human in several challenging tasks, such as Atari video games and AlphaGo (human versus computer competition). In the first part of the talk, we present a new RL algorithm, that is capable of finding a near-optimal policy with state-action observations from arbitrary behavior policies. Our algorithm, called the stochastic primal-dual Q-learning (SPD Q-learning), hinges upon a new linear programming formulation and a dual perspective of the standard Q-learning. We prove a first-of-its-kind result that the SPD Q-learning guarantees a certain convergence rate, even when the state-action distribution is time-varying but sub-linearly converges to a stationary distribution. In the second part of the talk, we focus on a multi-agent RL algorithm. In our multi-agent RL each agent receives local reward through a local processing, while information exchange over sparse stochastic communication network allows all agents to learn the global value function corresponding to a global reward, which is a sum of local rewards. An application of our multi-agent RL includes big data and IoT (internet of things), where AI determines personal advertisement policies to maximize profits based on the internet users’ consumption patterns and poll, which are private. We prove that our algorithm, called the stochastic primal-dual distributed RL, guarantees a certain convergence rate. Numerical experiments are provided to demonstrate its learning performance.

Biography

Donghwan Lee is a postdoctoral research associate in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. He received his M.S. in Mathematics and Ph.D. in Electrical and Computer Engineering from Purdue University in 2017. He obtained his B.S. in Electrical Engineering and Computer Sciences from Konkuk University in 2008 and his M.S. in Electrical Engineering and Computer Sciences from Yonsei University in 2010. He was an Associate Editor of IEEE Transactions on Fuzzy Systems in 2015. His research interests lie broadly in the areas of optimization and control theory. His current research focuses on developing rigorous frameworks for dynamic programming and reinforcement learning that enable efficient operations of stochastic systems, such as building control with human interactions and multi-agent systems. For more information, see his website: https://sites.google.com/site/donghwanleehome

Comments are closed.