Department of Computer Science and Engineering,

Sogang University, Seoul, Korea.

*Markov Decision Process*,*Stochastic Game Theory*,*Stochastic Optimization*,*Computational Intelligence & Learning Theory*

**H. S. Chang**, J. Hu, M. Fu, and S. I. Marcus, "Simulation-based Algorithms for Markov Decision Processes,"

*Springer*, 2nd Edition, ISBN: 978-1-4471-5021-3, 2013. (1st Edition: ISBN: 978-1-84628-689-6, 2007).

- R. L. Givan, E. K. P. Chong, and
**H. S. Chang**, "Scheduling Multiclass Packet Streams to Minimize Weighted Loss,"*Queueing Systems*, Vol. 41, No.3, 2002, pp. 241-270. **H. S. Chang**, P. Fard, S. I. Marcus, and M. Shayman, "Multi-time Scale Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 48, No. 6, 2003, pp. 976-987.**H. S. Chang**and S. I. Marcus, "Approximate Receding Horizon Approach for Markov Decision Processes: Average Reward Case,"*J. of Mathematical Analysis and Applications*, Vol. 286, No. 2, 2003, pp. 636-651.**H. S. Chang**, "Localization and A Distributed Local Optimal Solution Algorithm for a Class of Multi-agent Markov Decision Processes,"*Int. J. of Control, Automation, and Systems*, Vol. 1, No. 3, 2003, pp. 358-367.**H. S. Chang**and S. I. Marcus, "Two-Person Zero-Sum Markov Games: Receding Horizon Approach,"*IEEE Trans. on Automatic Control*, Vol. 48, No. 11, 2003, pp. 1951-1961.**H. S. Chang**, R. L. Givan, and E. K. P. Chong, "Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes,"*Discrete Event Dynamic Systems*, Vol. 14, No. 3, 2004, pp. 309-341.**H. S. Chang**, "On Ordinal Comparison of Policies in Markov Reward Processes,"*J. of Optimization Theory and Applications*, Vol. 122, No. 1, 2004, pp. 207-217.**H. S. Chang**, "Multi-policy Iteration with a Distributed Voting,"*Mathematical Methods of Operations Research*, Vol. 60, No. 2, 2004.11, pp. 299-310.**H. S. Chang**, "A Model for Multi-time Scaled Sequential Decision Making Processes with Adversary,"*Mathematical and Computer Modeling of Dynamical Systems*, Vol. 10, No. 3-4, 2004.12, pp. 287-302.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "An Adaptive Sampling Algorithm for Solving Markov Decision Processes,"*Operations Research*, Vol. 53, No. 1, 2005.1, pp. 126-139.**ORMS Today**, Vol. 43, No. 5, 2016.10, pp. 24-29:*Google DeepMind's Alphago**O.R.'s unheralded role in the path-breaking achievement***UMD News:***Contribution to Google's AlphaGo AI system*|*Researchers write about Google's Alphago for OR/MS Today***WIKIPEDIA:***Monte Carlo Tree Search*(MCTS) (cf., history of MCTS and exploration and exploitation)- The paper, "
*Markov Decision Processes, AlphaGo, and Monte Carlo Tree Search: Back to the Future*," by Michael Fu,

Leading Developments from INFORMS Communities :*INFORMS Tutorials in Operations Research*, 2017.10, pp. 68-88**Abstract:**In 2016, a computer Go-playing program called AlphaGo stunned the (human) world by winning a match (4 games to 1) against the reigning human world champion, a feat more impressive than previous victories by computer programs in chess (Deep Blue) and the TV game show Jeopardy! (Watson). The main engine behind AlphaGo combines machine learning approaches in the form of deep neural networks with a technique called Monte Carlo tree search, whose roots can be traced back to an adaptive multistage sampling simulation-based algorithm for Markov decision processes (MDPs) published in Operations Research in 2005 [H. S. Chang, M. C. Fu, J. Hu, and S. I. Marcus. An adaptive sampling algorithm for solving Markov decision processes. Operations Research 53(1):126–139, 2005] (and introduced even earlier in 2002). This tutorial describes AlphaGo and the simulation-based MDP algorithm, as well as providing contextual and historical background material for both, and uses simple examples to illustrate the main ideas behind Monte Carlo tree search.

**H. S. Chang**, "Multi-policy Improvement in Stochastic Optimization with Forward Recursive Function Criteria,"*J. of Mathematical Analysis and Applications*, Vol. 305, No. 1, 2005.5, pp. 130-139.**H. S. Chang**, "On the Probability of Correct Selection by Distributed Voting in Stochastic Optimization,"*J. of Optimization Theory and Applications*, Vol. 125, No. 1, 2005.4, pp. 231-240.**H. S. Chang**, "Error Bounds for Finite Step Approximations for Solving Infinite Horizon Controlled Markov Set-Chains,"*IEEE Trans. on Automatic Control*, Vol. 50, No. 9, 2005.9, pp. 1413-1418.**H. S. Chang**, H-G. Lee, M. Fu, and S. I. Marcus, "Evolutionary Policy Iteration for Solving Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 50, No. 11, 2005.11, pp. 1804-1808.**H. S. Chang**, "Converging Marriage in Honey-Bees Optimization and Application to Stochastic Dynamic Programming,"*J. of Global Optimization*, Vol. 35, No. 3, 2006.7, pp. 423-441.**H. S. Chang**, "On Convergence Rate of the Shannon Entropy Rate of Ergodic Markov Chains via Sample-path Simulation,"*Statistics & Probability Letters*, Vol. 76, No. 12, 2006.7, pp. 1261-1264.**H. S. Chang**, "Perfect Information Two-Person Zero-Sum Markov Games with Imprecise Transition Probabilities,"*Mathematical Methods of Operations Research*, Vol. 64, No. 2, 2006.10, pp. 335-351.**H. S. Chang**, "A Policy Improvement Method in Constrained Stochastic Dynamic Programming,"*IEEE Trans. on Automatic Control*, Vol. 51, No. 9, 2006.9, pp. 1523-1526.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "A Survey of Some Simulation-Based Algorithms for Markov Decision Processes,"*Communications in Information and Systems*, Vol. 7, No. 1, 2007.7, pp. 59-92.**H. S. Chang**, "A Policy Improvement Method for Constrained Average Markov Decision Processes,"*Operations Research Letters*, Vol. 35, No. 4, 2007.7, pp. 434-438.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "An Asymptotically Efficient Simulation-Based Algorithm for Finite Horizon Stochastic Dynamic Programming,"*IEEE Trans. on Automatic Control*, Vol. 52, No.1, 2007.1, pp. 89-94.**H. S. Chang**and E. K. P. Chong, "Solving Controlled Markov Set-Chains with Discounting via Multi-policy Improvement,"*IEEE Trans. on Automatic Control*, Vol. 52, No.3, 2007.3, pp. 564-569.**H. S. Chang**, M. Fu, J. Hu, and S. I. Marcus, "Recursive Learning Automata Approach to Markov Decision Processes,"*IEEE Trans. on Automatic Control*, Vol. 52, No.7, 2007.7, pp. 1349-1355.**H. S. Chang**, "Finite Step Approximation Error Bounds for Solving Average Reward Controlled Markov Set-Chains,"*IEEE Trans. on Automatic Control*, Vol. 53, No.1, 2008.2, pp. 350-355.**H. S. Chang**, "Converging Co-Evolutionary Algorithm for Two-Person Zero-Sum Discounted Markov Games with Perfect Information,"*IEEE Trans. on Automatic Control*, Vol. 53, No.2, 2008.3, pp. 596-601.**H. S. Chang**, "Decentralized Learning in Finite Markov Chains: Revisited,"*IEEE Trans. on Automatic Control*, Vol. 54, No.7, 2009.7, pp. 1648-1653.**H. S. Chang**, J. Hu, M. Fu, and S. I. Marcus, "Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games,"*IEEE Trans. on Automatic Control*, Vol. 55, No. 2, 2010.2, pp. 463-468.- J. Hu,
**H. S. Chang**, M. Fu, and S. I. Marcus, "Dynamic Sample Budget Allocation in Model-Based Optimization,"*J. of Global Optimization*, Vol. 50, No. 4, 2011.8, pp. 575-596. - J. Hu, P. Hu, and
**H. S. Chang**, "A Stochastic Approximation Framework for a Class of Randomized Optimization Algorithms,"*IEEE Trans. on Automatic Control*, Vol. 57, No. 1, 2012.1, pp. 165-178. **H. S. Chang**, "A Policy Iteration Heuristic for Constrained Discounted Controlled Markov Chains,"*Optimization Letters*, Vol. 6, No. 7, 2012.10, pp. 1573-1577.- J. Hu and
**H. S. Chang**, "Approximate Stochastic Annealing for Online Control of Infinite Horizon Markov Decision Processes,"*Automatica*, Vol. 48, No. 9, 2012.9, pp. 2182-2188. **H. S. Chang**and J. Hu, "On the Probability of Correct Selection in Ordinal Comparison over Dynamic Networks,"*J. of Optimization Theory and Applications*, Vol. 155, No. 2, 2012.11, pp. 594-604.**H. S. Chang**, "On Functional Equations for*K*th Best Policies in Markov Decision Processes,"*Automatica*, Vol. 49, No. 1, 2013.1, pp. 297-300.**H. S. Chang**, "Policy Set Iteration for Markov Decision Processes,"*Automatica*, Vol. 49, No. 12, 2013.12, pp. 3687-3689.**H. S. Chang**, "A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games,"*Game Theory*, Vol. 2013, Article ID 290427, 5 pages, 2013.12.**H. S. Chang**, "On Modification of Population-based Search Algorithms for Convergence in Stochastic Combinatorial Optimization,"*Optimization*, Vol. 64, No. 7, 2015.7, pp. 1647-1655.**H. S. Chang**, "An Exact Iterative Search Algorithm for Constrained Markov Decision Processes,"*Automatica*, Vol. 50, No. 5, 2014.5, pp. 1531-1534.**H. S. Chang**, "Value Set Iteration for Markov Decision Processes,"*Automatica*, Vol. 50, No. 7, 2014.7, pp. 1940-1943. Corrigendum**H. S. Chang**, "Sleeping Experts and Bandits Approach to Constrained Markov Decision Processes,"*Automatica*, Vol. 63, No. 1, 2016.1, pp. 182-186.**H. S. Chang**and S. Choe, "Combining Multiple Strategies for Multi-armed Bandit Problems and Asymptotic Optimality,"*Journal of Control Science and Engineering*, Vol. 2015, Article ID 264953, 7 pages, 2015.3.**H. S. Chang**, "Random Search for Constrained Markov Decision Processes with Multi-policy Improvement,"*Automatica*, Vol. 58, No. 8, 2015.8, pp. 127-130.**H. S. Chang**, "Value Set Iteration for Two-Person Zero-Sum Markov Games,"*Automatica*, Vol. 76, No. 2, 2017.2, pp. 61-64.

- Associate Editor,
*IEEE Transactions on Automatic Control*, 2010.11 ~ 2015.12. - Associate Editor,
*Automatica*, 2014.11 ~. - Senior Member,
*IEEE*, 2007.4 ~. - Member of the Technical Program Committee, the 46th/50th
*IEEE Conf. on Decision and Control*, 2007, 2011. - Member of the Technical Program Committee, the 20th
*Int. Conf. on Methods and Models in Automation and Robotics*, 2015. - Member of the Best Student Paper Award Committee, the 54th
*IEEE Conf. on Decision and Control*, 2015. - Visiting Assoc. Prof., the Institute for Systems Research and the Dept of ECE, Univ. of Maryland, 2009.
- Other Papers

- Quotes || The Life and Work of
A. Markov in
*Linear Algebra and its Applications*, Vol. 386, 2004. - An Appreciation of Prof. R. Bellman in
*J. of Optimization Theory and Applications*, Vol. 32, 1980. - On the Birth of Dynamic Programming by S. Dreyfus, in the 50th Anniversary issue of
*Operations Research*. - The Theory of Dynamic Programming by R. Bellman, The RAND Corp., 1954.
- On the Origin and Application of MDPs by R. Howard, in the 50th Anniversary issue of
*Operations Research*. - Non-Cooperative Games by J. Nash, Ph.D. Thesis, Princeton Univ., 1950.
- Stochastic Games by L. Shapley, in the
*Proc. of National Academy of Science*, 1953.