Decision Making Under Uncertainty Pdf 180608 | Book Item Download 2023-01-30 14-20-14

Partial capture of text on file.

                                                                                   1
                         Decision Making Under Uncertainty and
                                    Reinforcement Learning
                               Christos Dimitrakakis      Ronald Ortner
                                             April 8, 2021
                2
                           Contents
                           1 Introduction                                                                          9
                               1.1   Uncertainty and probability . . . . . . . . . . . . . . . . . . . . .        10
                               1.2   The exploration-exploitation trade-oﬀ        . . . . . . . . . . . . . . .   11
                               1.3   Decision theory and reinforcement learning         . . . . . . . . . . . .   12
                               1.4   Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . .         14
                           2 Subjective probability and utility                                                   15
                               2.1   Subjective probability . . . . . . . . . . . . . . . . . . . . . . . .       16
                                     2.1.1   Relative likelihood . . . . . . . . . . . . . . . . . . . . . .      16
                                     2.1.2   Subjective probability assumptions . . . . . . . . . . . . .         17
                                     2.1.3   Assigning unique probabilities* . . . . . . . . . . . . . . .        18
                                     2.1.4   Conditional likelihoods . . . . . . . . . . . . . . . . . . . .      19
                                     2.1.5   Probability elicitation . . . . . . . . . . . . . . . . . . . .      20
                               2.2   Updating beliefs: Bayes’ theorem . . . . . . . . . . . . . . . . . .         21
                               2.3   Utility theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     22
                                     2.3.1   Rewards and preferences . . . . . . . . . . . . . . . . . . .        22
                                     2.3.2   Preferences among distributions        . . . . . . . . . . . . . .   23
                                     2.3.3   Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    24
                                     2.3.4   Measuring utility* . . . . . . . . . . . . . . . . . . . . . .       26
                                     2.3.5   Convex and concave utility functions . . . . . . . . . . . .         27
                                     2.3.6   Decision diagrams . . . . . . . . . . . . . . . . . . . . . .        28
                               2.4   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   30
                           3 Decision problems                                                                    33
                               3.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     34
                               3.2   Rewards that depend on the outcome of an experiment . . . . . .              34
                                     3.2.1   Formalisation of the problem setting . . . . . . . . . . . .         35
                                     3.2.2   Decision diagrams . . . . . . . . . . . . . . . . . . . . . .        37
                                     3.2.3   Statistical estimation* . . . . . . . . . . . . . . . . . . . .      38
                               3.3   Bayes decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . .      39
                                     3.3.1   Convexity of the Bayes-optimal utility* . . . . . . . . . .          40
                               3.4   Statistical and strategic decision making . . . . . . . . . . . . . .        43
                                     3.4.1   Alternative notions of optimality . . . . . . . . . . . . . .        44
                                     3.4.2   Solving minimax problems* . . . . . . . . . . . . . . . . .          45
                                     3.4.3   Two-player games . . . . . . . . . . . . . . . . . . . . . .         47
                               3.5   Decision problems with observations . . . . . . . . . . . . . . . .          49
                                     3.5.1   Decision problems in classiﬁcation . . . . . . . . . . . . .         53
                                     3.5.2   Calculating posteriors . . . . . . . . . . . . . . . . . . . .       56
                                                                       3
                                        4                                                                          CONTENTS
                                            3.6   Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        56
                                            3.7   Exercises    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   58
                                                  3.7.1   Problems with no observations . . . . . . . . . . . . . . .          58
                                                  3.7.2   Problems with observations . . . . . . . . . . . . . . . . .         58
                                                  3.7.3   An insurance problem . . . . . . . . . . . . . . . . . . . .         59
                                                  3.7.4   Medical diagnosis . . . . . . . . . . . . . . . . . . . . . . .      61
                                        4 Estimation                                                                          65
                                            4.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     66
                                            4.2   Suﬃcient statistics . . . . . . . . . . . . . . . . . . . . . . . . . .      66
                                                  4.2.1   Suﬃcient statistics . . . . . . . . . . . . . . . . . . . . . .      67
                                                  4.2.2   Exponential families . . . . . . . . . . . . . . . . . . . . .       68
                                            4.3   Conjugate priors . . . . . . . . . . . . . . . . . . . . . . . . . . .       69
                                                  4.3.1   Bernoulli-Beta conjugate pair . . . . . . . . . . . . . . . .        69
                                                  4.3.2   Conjugates for the normal distribution . . . . . . . . . . .         73
                                                  4.3.3   Conjugates for multivariate distributions . . . . . . . . . .        78
                                            4.4   Credible intervals . . . . . . . . . . . . . . . . . . . . . . . . . . .     81
                                            4.5   Concentration inequalities . . . . . . . . . . . . . . . . . . . . . .       84
                                                  4.5.1   Chernoﬀ-Hoeﬀding bounds . . . . . . . . . . . . . . . . .            86
                                            4.6   Approximate Bayesian approaches . . . . . . . . . . . . . . . . .            87
                                                  4.6.1   Monte-Carlo inference . . . . . . . . . . . . . . . . . . . .        87
                                                  4.6.2   Approximate Bayesian Computation . . . . . . . . . . . .             88
                                                  4.6.3   Analytic approximations of the posterior . . . . . . . . . .         89
                                                  4.6.4   Maximum Likelihood and Empirical Bayes methods . . .                 90
                                        5 Sequential sampling                                                                 91
                                            5.1   Gains from sequential sampling . . . . . . . . . . . . . . . . . . .         92
                                                  5.1.1   An example: sampling with costs . . . . . . . . . . . . . .          93
                                            5.2   Optimal sequential sampling procedures . . . . . . . . . . . . . .           96
                                                  5.2.1   Multi-stage problems . . . . . . . . . . . . . . . . . . . . .       99
                                                  5.2.2   Backwards induction for bounded procedures . . . . . . .             99
                                                  5.2.3   Unbounded sequential decision procedures . . . . . . . . . 100
                                                  5.2.4   The sequential probability ratio test . . . . . . . . . . . . 101
                                                  5.2.5   Wald’s theorem . . . . . . . . . . . . . . . . . . . . . . . . 104
                                            5.3   Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
                                            5.4   Markov processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
                                            5.5   Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
                                        6 Experiment design and Markov decision processes                                    109
                                            6.1   Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
                                            6.2   Bandit problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
                                                  6.2.1   An example: Bernoulli bandits . . . . . . . . . . . . . . . 112
                                                  6.2.2   Decision-theoretic bandit process . . . . . . . . . . . . . . 113
                                            6.3   Markov decision processes and reinforcement learning           . . . . . . 115
                                                  6.3.1   Value functions . . . . . . . . . . . . . . . . . . . . . . . . 118
                                            6.4   Finite horizon, undiscounted problems . . . . . . . . . . . . . . . 119
                                                  6.4.1   Policy evaluation . . . . . . . . . . . . . . . . . . . . . . . 119
                                                  6.4.2   Backwards induction policy evaluation . . . . . . . . . . . 120
                                                  6.4.3   Backwards induction policy optimisation . . . . . . . . . . 121
                                            6.5   Inﬁnite-horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

The words contained in this file might help you see if this file matches what you are looking for:

...Decision making under uncertainty and reinforcement learning christos dimitrakakis ronald ortner april contents introduction probability the exploration exploitation trade o theory acknowledgements subjective utility relative likelihood assumptions assigning unique probabilities conditional likelihoods elicitation updating beliefs bayes theorem rewards preferences among distributions measuring convex concave functions diagrams exercises problems that depend on outcome of an experiment formalisation problem setting statistical estimation decisions convexity optimal strategic alternative notions optimality solving minimax two player games with observations in classication calculating posteriors summary no insurance medical diagnosis sucient statistics exponential families conjugate priors bernoulli beta pair conjugates for normal distribution multivariate credible intervals concentration inequalities cherno hoeding bounds approximate bayesian approaches monte carlo inference computation ...

Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area