04145nam a22004937i 4500 000729036 CZ-PrVSE 20260120142922.0 ta 251217t20192019xxua fr 001 0 eng d X-LITERATURA V SYLABECH 978-1-886529-39-7 (vázáno) ABA006 cze ABA006 ABA006 rda 519.1/.8 Kombinatorika. Teorie grafů. Matematická statistika. Operační výzkum. Matematické modelování Konspekt 13 519 Probabilities and applied mathematics Conspectus 13 519.816-022.218 MRF_2003 519.85 MRF_2003 004.055 MRF_2003 004.825 MRF_2014 (048.8) MRF_2003 519.8BER Bertsekas, Dimitri P. vut2010439815 aut Reinforcement learning and optimal control / by Dimitri P. Bertsekas Belmont : Athena Scientific, [2019] ©2019 xiv, 373 stran : ilustrace text txt rdacontent bez média n rdamedia svazek nc rdacarrier Obsahuje bibliografii a rejstřík In this book we consider large and challenging multistage decision prob- lems, which can be solved in principle by dynamic programming (DP for short), but their exact solution is computationally intractable. We discuss solution methods that rely on approximations to produce suboptimal poli- cies with adequate performance. These methods are collectively referred to as reinforcement learning, and also by alternative names such as approxi- mate dynamic programming, and neuro-dynamic programming. Our subject has benefited greatly from the interplay of ideas from optimal control and from artificial intelligence. One of the aims of the book is to explore the common boundary between these two fields and to form a bridge that is accessible by workers with background in either field. Our primary focus will be on approximation in value space. Here, the control at each state is obtained by optimization of the cost over a limited horizon, plus an approximation of the optimal future cost, starting from the end of this horizon. The latter cost, which we generally denote by ˜J, is a function of the state where we may be at the end of the horizon. It may be computed by a variety of methods, possibly involving simulation and/or some given or separately derived heuristic/suboptimal policy. The use of simulation often allows for implementations that do not require a mathematical model, a major idea that has allowed the use of DP beyond its classical boundaries. https://eclass.uoa.gr/modules/document/file.php/DI437/Reinforcement_Learning_Bertsekas_Draft.pdf Publikace se věnuje řešení rozsáhlých a komplexních vícestupňových rozhodovacích problémů, u nichž je přesné řešení pomocí dynamického programování výpočetně neproveditelné. Autor představuje metody založené na aproximacích, které umožňují nalézt suboptimální, avšak prakticky dobře použitelné řídicí strategie. Tyto přístupy jsou souhrnně označovány jako posilované učení, případně aproximované či neuro-dynamické programování. Kniha systematicky propojuje teorii optimálního řízení s koncepty umělé inteligence a vytváří srozumitelný most mezi oběma oblastmi. Hlavní důraz je kladen na aproximaci hodnotové funkce a na metody využívající simulace, které často nevyžadují explicitní matematický model systému. vícekriteriální rozhodování ph127397 czenas matematická optimalizace ph122672 czenas optimalizační metody ph171359 czenas generativní umělá inteligence ph1268083 czenas monografie fd132842 czenas multicriteria decision making eczenas mathematical optimization eczenas optimization methods eczenas generative artificial intelligence eczenas monographs eczenas