Master's thesis topics

If you are a student ready for your master's thesis, feel free to contact any member of the RL group, which you may find under Team. Below you can find a list of open master's thesis topics.


Supervisor: Mike Preuss

Summary: The success of AlphaStar for StarCraft has shown that competitive agents learning “together” can enhance the generality of a solution immensely. Can we also use this scheme to improve strategy learning for much simpler settings as in MCTS or similar approaches on the much smaller game microRTS (

Single-agent Curriculum Learning

Supervisor: Aske Plaat

Summary: AlphaZero has taught itself to play world-class Go from scratch using a form of reinforcement learning called curriculum learning. Can we apply curriculum learning in single-agent puzzles such as Sokoban, and thus, in principle, to some of the world’s most important optimization problems?

Planning through Inpainting

Supervisor: Thomas Moerland

Summary: Standard planning approaches use forward methods, i.e., they start at the beginning of a problem and repeatedly unfold a search tree in the forward direction. However, that seems counterintuitive to the way humans tend to plan (outside of boardgames like Chess). Instead, we first sample a distant goal, and then repeatedly inpaint different trajectories between start and goal state, for example starting by fixing a goal halfway. In this project, we will look at this new approach towards hierarchical planning (see, e.g., for a first attempt).

Competence-based Intrinsic Motivation

Supervisor: Thomas Moerland

Summary: Exploration is a key topic in reinforcement learning. A popular approach is through the use of intrinsic motivation, which for example explores like children based on curiosity and novelty. However, few of these methods have been goal-conditioned and based on learning progress, better known as competence-based intrinsic motivation ( In this project, we will investigate new approaches for competence-based exploration, for example extending from

Hierarchical reinforcement learning

Supervisor: Thomas Moerland

Summary: Hierarchical RL attemps to learn a high-level, hierarchical action space, which extends over multiple timesteps. Humans clearly have this ability as well, i.e., we decide to go to the supermarket (on a high-level), and then execute all the subroutines that belong to this high-level action, without redeciding on the higher-level action. However, it is hard to learn a good higher-level space, which should for example cover the entire state space, respect the true task, and set goals within reach on the particular level. We will investigate new hierarchical RL methods, for example building on and

Beware what you wish for: specification gaming and value alignment in AI and RL

Supervisor: Peter van der Putten , Aske Plaat, Mike Preuss

Summary: AI techniques such as reinforcement learning can be very powerful methods to optimize certain given objectives, but unfortunately humans are notable bad at stating their objectives and constraints well enough, or overseeing the side effects of maximizing these objectives. In reinforcement learning this problem is known as specification gaming – which actually is a misnomer as the AI is simply blindly trying to optimize the given objective. A more general term is value alignment – how can we make sure that the AI aligns its values with ours. In this project we want to provide a compelling example of specification gaming, to make the public further aware of this key existential risk of AI. Optionally, we can look into finding solutions, for example by accepting that objectives specifications are initially flawed, but humans can adapt these on the way.