Most AI research is concerned with best case (single demo) or average case performance. However, given in many safety-critical tasks such as robotics, average case performance is often not good enough — we don’t want autonomous cars that perform well on average but instead that act safely with high confidence. Here, we consider high confidence policy learning as an evaluation criterion for good systems and propose to investigate new algorithms for meeting this criterion.

As AI-based agent proliferate in the world, they will increasingly face the significant challenge of customization — the ability to perform user-specified tasks in many different unstructured environments. In response to this need, learning from demonstration (LfD) has emerged as a paradigm that allows users to quickly and naturally program agents by simply showing them how to perform a task, rather than by writing code. This methodology aims to allow non-expert users to customize agents for their desired purposes, as well as to communicate embodied knowledge that is difficult to translate into formal code. However, the most successful LfD methods to date do not provide any form of performance or safety guarantees.

In recent years, so-called “high-confidence” learning algorithms have enjoyed success in application areas with high-quality models and plentiful data. However, applying these algorithms to domains with little data and without reliable models (e.g., robotics) is still a challenging problem. Our aim is to develop safe, model-free LfD algorithms that can work with small amounts of data. We will draw on our interdisciplinary team to show that these general-purpose techniques can work well in multiple domains, from robots that perform household tasks to semi-automated agents used to address problems in national security.