Department of Computer Science and Engineering,

Sogang University, Seoul, Korea.

*Markov Decision Process*,*Stochastic Game Theory*,*Stochastic Optimization*,*Computational Intelligence & Learning Theory*

**H. S. Chang**, J. Hu, M. Fu, and S. I. Marcus, "Simulation-based Algorithms for Markov Decision Processes,"

*Springer*, 2nd Edition, ISBN: 978-1-4471-5021-3, 2013. (1st Edition: ISBN: 978-1-84628-689-6, 2007).

- R. L. Givan, E. K. P. Chong, and
**H. S. Chang**, "Scheduling Multiclass Packet Streams to Minimize Weighted Loss,"*Queueing Systems*, Vol. 41, No.3, 2002, pp. 241-270. **H. S. Chang**, P. Fard, S. I. Marcus, and M. Shayman, "Multi-time Scale Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 48, No. 6, 2003, pp. 976-987.**H. S. Chang**and S. I. Marcus, "Approximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case,"*J. of Mathematical Analysis and Applications*, Vol. 286, No. 2, 2003, pp. 636-651.**H. S. Chang**, "Localization and A Distributed Local Optimal Solution Algorithm for a Class of Multi-agent Markov Decision Processes,"*Int. J. of Control, Automation, and Systems*, Vol. 1, No. 3, 2003, pp. 358-367.**H. S. Chang**and S. I. Marcus, "Two-Person Zero-Sum Markov Games: Receding Horizon Approach,"*IEEE Trans. on Automatic Control*, Vol. 48, No. 11, 2003, pp. 1951-1961.**H. S. Chang**, R. L. Givan, and E. K. P. Chong, "Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes,"*Discrete Event Dynamic Systems*, Vol. 14, No. 3, 2004, pp. 309-341.**H. S. Chang**, "On Ordinal Comparison of Policies in Markov Reward Processes,"*J. of Optimization Theory and Applications*, Vol. 122, No. 1, 2004, pp. 207-217.**H. S. Chang**, "Multi-policy Iteration with a Distributed Voting,"*Mathematical Methods of Operations Research*, Vol. 60, No. 2, 2004.11, pp. 299-310.**H. S. Chang**, "A Model for Multi-time Scaled Sequential Decision Making Processes with Adversary,"*Mathematical and Computer Modeling of Dynamical Systems*, Vol. 10, No. 3-4, 2004.12, pp. 287-302.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "An Adaptive Sampling Algorithm for Solving Markov Decision Processes,"*Operations Research*, Vol. 53, No. 1, 2005.1, pp. 126-139.**ORMS Today**, Vol. 43, No. 5, 2016.10, pp. 24-29:*Google DeepMind's Alphago**O.R.'s unheralded role in the path-breaking achievement***UMD News:***Contribution to Google's AlphaGo AI system*|*Researchers write about Google's Alphago for OR/MS Today***WIKIPEDIA:***Monte Carlo Tree Search*(MCTS) (cf., history of MCTS and exploration and exploitation)- The paper, "
*AlphaGo and Monte Carlo Tree Search: The Simulation Optimization Perspective*," by Michael Fu,

*in the Proceedings of the 2016 Winter Simulation Conf.***Abstract:**In March of 2016, Google DeepMind's AlphaGo, a computer Go-playing program, defeated the reigning human world champion Go player, 4-1, a feat far more impressive than previous victories by computer programs in chess (IBM's Deep Blue) and Jeopardy (IBM's Watson). AlphaGo combines machine learning approaches, specifically deep neural networks and reinforcement learning, with a technique called Monte Carlo tree search. Current versions of Monte Carlo tree search used in Go-playing algorithms are based on a version developed for games that traces its roots back to the adaptive multi-stage sampling simulation optimization algorithm for estimating the value function in finite-horizon Markov decision processes (MDPs) introduced by Chang et al. (2005), which was the first to use Upper Confidence Bounds (UCBs) for Monte Carlo simulation-based solution of MDPs. We illustrate Monte Carlo tree search by connecting it to simulation optimization through the use of two simple examples: a decision tree and tic-tac-toe.

**H. S. Chang**, "Multi-policy Improvement in Stochastic Optimization with Forward Recursive Function Criteria,"*J. of Mathematical Analysis and Applications*, Vol. 305, No. 1, 2005.5, pp. 130-139.**H. S. Chang**, "On the Probability of Correct Selection by Distributed Voting in Stochastic Optimization,"*J. of Optimization Theory and Applications*, Vol. 125, No. 1, 2005.4, pp. 231-240.**H. S. Chang**, "Error Bounds for Finite Step Approximations for Solving Infinite Horizon Controlled Markov Set-Chains,"*IEEE Trans. on Automatic Control*, Vol. 50, No. 9, 2005.9, pp. 1413-1418.**H. S. Chang**, H-G. Lee, M. Fu, and S. I. Marcus, "Evolutionary Policy Iteration for Solving Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 50, No. 11, 2005.11, pp. 1804-1808.**H. S. Chang**, "Converging Marriage in Honey-Bees Optimization and Application to Stochastic Dynamic Programming,"*J. of Global Optimization*, Vol. 35, No. 3, 2006.7, pp. 423-441.**H. S. Chang**, "On Convergence Rate of the Shannon Entropy Rate of Ergodic Markov Chains via Sample-path Simulation,"*Statistics & Probability Letters*, Vol. 76, No. 12, 2006.7, pp. 1261-1264.**H. S. Chang**, "Perfect Information Two-Person Zero-Sum Markov Games with Imprecise Transition Probabilities,"*Mathematical Methods of Operations Research*, Vol. 64, No. 2, 2006.10, pp. 335-351.**H. S. Chang**, "A Policy Improvement Method in Constrained Stochastic Dynamic Programming,"*IEEE Trans. on Automatic Control*, Vol. 51, No. 9, 2006.9, pp. 1523-1526.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "A Survey of Some Simulation-Based Algorithms for Markov Decision Processes,"*Communications in Information and Systems*, Vol. 7, No. 1, 2007.7, pp. 59-92.**H. S. Chang**, "A Policy Improvement Method for Constrained Average Markov Decision Processes,"*Operations Research Letters*, Vol. 35, No. 4, 2007.7, pp. 434-438.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming,"*IEEE Trans. on Automatic Control*, Vol. 52, No.1, 2007.1, pp. 89-94.**H. S. Chang**and E. K. P. Chong, "Solving Controlled Markov Set-Chains with Discounting via Multi-policy Improvement,"*IEEE Trans. on Automatic Control*, Vol. 52, No.3, 2007.3, pp. 564-569.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "Recursive Learning Automata Approach to Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 52, No.7, 2007.7, pp. 1349-1355.**H. S. Chang**, "Finite Step Approximation Error Bounds for Solving Average Reward Controlled Markov Set-Chains,"*IEEE Trans. on Automatic Control*, Vol. 53, No.1, 2008.2, pp. 350-355.**H. S. Chang**, "Converging Co-Evolutionary Algorithm for Two-Person Zero-Sum Discounted Markov Games with Perfect Information,"*IEEE Trans. on Automatic Control*, Vol. 53, No.2, 2008.3, pp. 596-601.**H. S. Chang**, "Decentralized Learning in Finite Markov Chains: Revisited,"*IEEE Trans. on Automatic Control*, Vol. 54, No.7, 2009.7, pp. 1648-1653.**H. S. Chang**, J. Hu, M. Fu, and S. I. Marcus, "Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games,"*IEEE Trans. on Automatic Control*, Vol. 55, No. 2, 2010.2, pp. 463-468.- J. Hu,
**H. S. Chang**, M. Fu, and S. I. Marcus, "Dynamic Sample Budget Allocation in Model-Based Optimization,"*J. of Global Optimization*, Vol. 50, No. 4, 2011.8, pp. 575-596. - J. Hu, P. Hu, and
**H. S. Chang**, "A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms,"*IEEE Trans. on Automatic Control*, Vol. 57, No. 1, 2012.1, pp. 165-178. **H. S. Chang**, "A Policy Iteration Heuristic for Constrained Discounted Controlled Markov Chains,"*Optimization Letters*, Vol. 6, No. 7, 2012.10, pp. 1573-1577.- J. Hu and
**H. S. Chang**, "Approximate Stochastic Annealing for Online Control of Infinite Horizon Markov Decision Processes,"*Automatica*, Vol. 48, No. 9, 2012.9, pp. 2182-2188. **H. S. Chang**and J. Hu, "On the Probability of Correct Selection in Ordinal Comparison over Dynamic Networks,"*J. of Optimization Theory and Applications*, Vol. 155, No. 2, 2012.11, pp. 594-604.**H. S. Chang**, "On Functional Equations for*K*th Best Policies in Markov Decision Processes,"*Automatica*, Vol. 49, No. 1, 2013.1, pp. 297-300.**H. S. Chang**, "Policy Set Iteration for Markov Decision Processes,"*Automatica*, Vol. 49, No. 12, 2013.12, pp. 3687-3689.**H. S. Chang**, "A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games,"*Game Theory*, Vol. 2013, Article ID 290427, 5 pages, 2013.12.**H. S. Chang**, "On Modification of Population-based Search Algorithms for Convergence in Stochastic Combinatorial Optimization,"*Optimization*, Vol. 64, No. 7, 2015.7, pp. 1647-1655.**H. S. Chang**, "An Exact Iterative Search Algorithm for Constrained Markov Decision Processes,"*Automatica*, Vol. 50, No. 5, 2014.5, pp. 1531-1534.**H. S. Chang**, "Value Set Iteration for Markov Decision Processes,"*Automatica*, Vol. 50, No. 7, 2014.7, pp. 1940-1943. Corrigendum**H. S. Chang**, "Sleeping Experts and Bandits Approach to Constrained Markov Decision Processes,"*Automatica*, Vol. 63, No. 1, 2016.1, pp. 182-186.**H. S. Chang**and S. Choe, "Combining Multiple Strategies for Multi-armed Bandit Problems and Asymptotic Optimality,"*Journal of Control Science and Engineering*, Vol. 2015, Article ID 264953, 7 pages, 2015.3.**H. S. Chang**, "Random Search for Constrained Markov Decision Processes with Multi-policy Improvement,"*Automatica*, Vol. 58, No. 8, 2015.8, pp. 127-130.**H. S. Chang**, "Value Set Iteration for Two-Person Zero-Sum Markov Games,"*Automatica*, Vol. 76, No. 2, 2017.2, pp. 61-64.

- Associate Editor,
*IEEE Transactions on Automatic Control*, 2010.11 ~ 2015.12. - Associate Editor,
*Automatica*, 2014.11 ~. - Senior Member,
*IEEE*, 2007.4 ~. - Member of the Technical Program Committee, the 46th/50th
*IEEE Conf. on Decision and Control*, 2007, 2011. - Member of the Technical Program Committee, the 20th
*Int. Conf. on Methods and Models in Automation and Robotics*, 2015. - Member of the Best Student Paper Award Committee, the 54th
*IEEE Conf. on Decision and Control*, 2015. - Visiting Assoc. Prof., the Institute for Systems Research and the Dept of ECE, Univ. of Maryland, 2009.
- Other Papers

- Quotes || The Life and Work of
A. Markov in
*Linear Algebra and its Applications*, Vol. 386, 2004. - An Appreciation of Prof. R. Bellman in
*J. of Optimization Theory and Applications*, Vol. 32, 1980. - On the Birth of Dynamic Programming by S. Dreyfus, in the 50th Anniversary issue of
*Operations Research*. - The Theory of Dynamic Programming by R. Bellman, The RAND Corp., 1954.
- On the Origin and Application of MDPs by R. Howard, in the 50th Anniversary issue of
*Operations Research*. - Non-Cooperative Games by J. Nash, Ph.D. Thesis, Princeton Univ., 1950.
- Stochastic Games by L. Shapley, in the
*Proc. of National Academy of Science*, 1953.