Master's thesis topics
Supervisor: Mike Preuss
Summary: The success of AlphaStar for StarCraft has shown that competitive agents learning “together” can enhance the generality of a solution immensely. Can we also use this scheme to improve strategy learning for much simpler settings as in MCTS or similar approaches on the much smaller game microRTS (https://github.com/santiontanon/microrts)?
Single-agent Curriculum Learning
Supervisor: Aske Plaat
Summary: AlphaZero has taught itself to play world-class Go from scratch using a form of reinforcement learning called curriculum learning. Can we apply curriculum learning in single-agent puzzles such as Sokoban, and thus, in principle, to some of the world’s most important optimization problems?
Planning through Inpainting
Supervisor: Thomas Moerland
Summary: Standard planning approaches use forward methods, i.e., they start at the beginning of a problem and repeatedly unfold a search tree in the forward direction. However, that seems counterintuitive to the way humans tend to plan (outside of boardgames like Chess). Instead, we first sample a distant goal, and then repeatedly inpaint different trajectories between start and goal state, for example starting by fixing a goal halfway. In this project, we will look at this new approach towards hierarchical planning (see, e.g., https://arxiv.org/pdf/2006.13205.pdf for a first attempt).
Competence-based Intrinsic Motivation
Supervisor: Thomas Moerland
Summary: Exploration is a key topic in reinforcement learning. A popular approach is through the use of intrinsic motivation, which for example explores like children based on curiosity and novelty. However, few of these methods have been goal-conditioned and based on learning progress, better known as competence-based intrinsic motivation (https://arxiv.org/pdf/2012.09830.pdf). In this project, we will investigate new approaches for competence-based exploration, for example extending from http://proceedings.mlr.press/v87/laversanne-finot18a/laversanne-finot18a.pdf.
Beware what you wish for: specification gaming and value alignment in AI and RL
Supervisor: Peter van der Putten , Aske Plaat, Mike Preuss
Summary: AI techniques such as reinforcement learning can be very powerful methods to optimize certain given objectives, but unfortunately humans are notable bad at stating their objectives and constraints well enough, or overseeing the side effects of maximizing these objectives. In reinforcement learning this problem is known as specification gaming – which actually is a misnomer as the AI is simply blindly trying to optimize the given objective. A more general term is value alignment – how can we make sure that the AI aligns its values with ours. In this project we want to provide a compelling example of specification gaming, to make the public further aware of this key existential risk of AI. Optionally, we can look into finding solutions, for example by accepting that objectives specifications are initially flawed, but humans can adapt these on the way.