03660ntm a22004937i 4500 000723731 CZ-PrVSE 20250621144446.0 m d cr n|||||||||| 250621s2025 xr fsbm 000 0 eng d NEZPRACOVANÝ IMPORT ABA006 cze ABA006 ABA006 rda Kolomazník, Vojtěch ISIS:162128 dis Deep Reinforcement Learning pro karetní hru Dominion eng Deep Reinforcement Learning for the Card Game Dominion / Vojtěch Kolomazník 2025 ?? stran : digital, PDF soubor Vedoucí práce: Ondřej Zamazal Bakalářská práce (Bc.)—Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky, 2025 Obsahuje bibliografii Textový (vysokoškolská kvalifikační práce) Rok obhajoby 2025 This thesis aims to propose an approach using deep reinforcement learning (deep RL) for the card game Dominion. Existing deep RL approaches are studied and compared, so that a deep RL agent can be implemented. After that, the thesis attempts to determine whether a properly trained agent can perform above the level of a simple heuristic, find out which adjustments allow the agent to perform better in a partially observable environment and study the choices made by the agent. Dominion is a deck-building card game, which presents many challenges to the RL community. It violates the Markov property because the state is not fully observable to the agent. Moreover, there is inherent stochasticity because the agent’s deck gets shuffled between turns. Current academic research has attempted to apply deep Q-networks and policy gradients to solve the game with varying success. One of the biggest unanswered questions relates to the calculation of the rewards. Both Monte Carlo and temporal-difference methods yield sufficient performance in different scenarios. For the purpose of this thesis, several modifications were added to decrease the complexity of the problem. The selection of cards used in the game was frozen between games and the action phase of an agent’s turn was handled with a heuristic. Only the buying phase was controlled directly by the agent. For this thesis, REINFORCE method relying on policy gradients and Monte Carlo returns displayed superior performance compared to deep Q-networks and actor-critic architectures using temporal-difference returns. It obtained a 95 percent win rate against Big Money heuristic and an 88 percent win rate against Smithy heuristic after only 20 thousand simulated games. Způsob přístupu: Internet data analytics [obor bakal. práce] bakalářské práce fd132403 czenas bachelor's theses eczenas reinforcement learning neural networks markov decision process Zamazal, Ondřej ISIS:7282 ths Máša, Petr ISIS:17194 opn Vysoká škola ekonomická v Praze. Fakulta informatiky a statistiky kn20010709399 dgg https://insis.vse.cz/zp/90633/podrobnosti VŠKP v InSIS https://insis.vse.cz/zp/90633 Hlavní práce https://insis.vse.cz/zp/90633/posudek/vedouci Hodnocení vedoucího https://insis.vse.cz/zp/90633/posudek/oponent/85703 Oponentura https://insis.vse.cz/zp/90633/podrobnosti dc:identifier NEPOSILAT VSKP vse90633 250617 90633