Most AI research is concerned with best case (single demo) or average case performance. However, in many safety-critical tasks such as robotics, average case performance is often not good enough. For example, we don’t want autonomous cars that perform well on average but cars that act safely with high confidence. With this project, the research team considers high confidence policy learning as an evaluation criterion for good systems and investigates new algorithms for meeting this criterion.
As AI-based agents proliferate in the world, they will increasingly face the significant challenge of customization — the ability to perform user-specified tasks in many different unstructured environments. In response to this need, learning from demonstration (LfD) has emerged as a paradigm that allows users to quickly and naturally program agents by simply showing them how to perform a task, rather than by writing code. This methodology aims to allow non-expert users to customize agents for their desired purposes, as well as to communicate embodied knowledge that is difficult to translate into formal code. However, the most successful LfD methods to date do not provide any form of performance or safety guarantees.
In recent years, so-called “high-confidence” learning algorithms have enjoyed success in application areas with high-quality models and plentiful data. However, applying these algorithms to domains with little data and without reliable models (e.g., robotics) is still a challenging problem. This project develops safe, model-free LfD algorithms that can work with small amounts of data. Good Systems' interdisciplinary team explores these general-purpose techniques in multiple domains, from robots that perform household tasks to semi-automated agents used to address problems in national security.
The research team developed an algorithm called Bayesian REX, which provides state-of-the-art imitation learning results, while also providing the ability to reason about uncertainty and risk in imitation learning for the first time in challenging, high-dimensional problems. Furthermore, in continuing work, the team has been investigating how to incorporate additional modalities of data such as human gaze and natural language to help the agent learn imitate more efficiently and with greater accuracy.
D.S. Brown, R. Coleman, R. Srinivasan, and S. Niekum. Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences. International Conference on Machine Learning (ICML). July 2020.
D.S. Brown, W. Goo, and S. Niekum. Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations. Conference on Robot Learning (CoRL). October 2019.