3 | Franck Djeumou

Safety and performance are often two competing objectives in sequential decision-making problems. Our goal is to blend a performant and a safe controller to generate a single controller that is safer than the performant and accumulates higher rewards than the safe controller. To this end, we propose a blending algorithm using the framework of contextual multi-armed multi-objective bandits.