Decision Making and Reinforcement Learning
About this Course
This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.Created by: Columbia University
Related Online Courses
This specialization is primarily aimed at first- and second-year undergraduates interested in psychology, data analysis, ethics in research, and quantitative research methods along with high school... more
This course aims at assisting you in interpreting financial accounting information related to inter-corporate investments. The course will cover three main topics over four modules: (i) accounting... more
This specialization is for learners who are interested in first-year, university-level physics. Through three courses, you will gain a foundation in mechanics including motion, forces, energy,... more
Overview: In this specialization, you will delve into the world of decentralized application (DApp) development on the Ethereum blockchain. Through a series of hands-on projects and expert-led... more
This comprehensive course series is perfect for individuals with programming knowledge such as software developers, data scientists, and researchers. You\'ll acquire critical MLOps skills,... more