Autonomous driving has seen incredible progress of-late. Recent workshops at top conferences in robotics, computer vision, and machine learning have primarily showcased the technological advancements in the field. This workshop provides an platform to investigate and discuss the methods by which progress in autonomous driving is evaluated, benchmarked, and verified.
Given that the workshop was held virtually, we used a format that is engaging and concise. We held a sequence of four 1.5-hour moderated round-table discussions (including an introduction) centered around 4 themes.
This workshop took place on October 25, 2020 in conjunction with IROS 2020
Theme 1: Assessing progress for the field of autonomous vehicles (AVs)
At present, regulators rely on self-driving car companies to report statistics based on metrics that they use to make decisions about how safe the technology is.
- Given an implementation of an AV, how do we score it?
- At present, how are we doing along these metrics?
- Were we too optimistic in the past? Why?
- What are the limitations of these evaluation methods? Are different ones required?
Moderator: Andrea Censi
- Emilio Frazzoli (ETH Zürich / Motional)
- Alex Kendall (Wayve)
- Jane Lappin (Chair, Transportation Research Board Standing Committee on Vehicle-Highway Automation)
- Atul Acharya (Director, AV Strategy at AAA Northern California)
Theme 2: How to evaluate AV risk from the perspective of real world deployment (public acceptance, insurance, liability, …)?
There is a difference between metrics used by the AV industry for the purpose of development, and the metrics that will be used to evaluate AV risks “externally”, for example for the purpose of obtaining insurance premiums, which are likely going to be standardized and of a black-box nature.
- What are the biggest societal hurdles for the integration of AVs? How do we evaluate how well we are doing at crossing these hurdles?
- What metrics are needed for the insurance industry?
- What metrics would be included in future safety standards from the liability perspective?
- What metrics can we use to convince society at large that AVs are “safe enough”?
Moderator: Jacopo Tani
- Bryant Walker Smith (USC Faculty of Law)
- Luigi Di Lillo (Swiss Reinsurance Company, Ltd)
- John Leonard (MIT)
- Srihari Yamanoor (DesignAbly): Society and Autonomous Driving
Theme 3: Best practices for AV benchmarking
An alternative but related concept to creating metrics is to create benchmarks that must be passed. Ideally, not all of the benchmarks would require evaluation on the real hardware platform, but could include the use of logs and simulations.
- How can we best leverage data logs to evaluate self-driving algorithms?
- How to evaluate the “value” of a specific data log?
- What are the right annotations to these data logs?
- How can we best leverage simulation to evaluate self-driving algorithms?
- How to compose component-level benchmarks into system-level benchmarks?
- How to decompose a system-level benchmark into measurable and tractable benchmarks on the component level?
Moderator: Liam Paull
- Fabio Bonsignorio (Heron Robots)
- Michael Milford (QUT)
- Oscar Beijbom (Motional)
- Marcelo H. Ang Jr (National University of Singapore)
- Yue Linn Chong, H. Leong, Christina Lee Dao Wen, and Marcelo H. Ang, Jr. (NUS): Benchmarking Sensing and Motion Planning Algorithms for Autonomous Driving
- Neil Traft, Skanda Shridhar, Galen Clark Haynes (Uber ATG): Motion Prediction for Self-Driving Needs a Metric Specific to Self-Driving
- Yiluan Guo (Motional), Holger Caesar (Motional), Oscar Beijbom (Motional), Jonah Philion (University of Toronto / NVIDIA), Sanja Fidler (University of Toronto / NVIDIA): pdfs/The efficacy of Neural Planning Metrics: A meta-analysis of PKL on nuScenes
- Peter Du (UIUC), Katherine Driggs-Campbell (UIUC): Evaluation of Autonomous Vehicle Policies Using Adaptive Search
Theme 4: Do we need new paradigms for AV development?
While our focus here is not centrally on the algorithms developed for self-driving cars, the types of algorithms and paradigms used will have an impact on our ability to benchmark and evaluate them. This facet of algorithms is often forsaken for performance.
- Are some algorithm types more challenging to evaluate?
- What additional challenges are presented by data-driven approaches?
- Are the paradigms that we have currently (model-free, model-based, reinforcement, imitation, etc.) sufficient or do we need fundamentally different approaches?
Moderator: Matt Walter
- Raquel Urtasun (U of Toronto / Uber ATG)
- Edwin Olson (May Mobility)
- Ram Vasudevan (University of Michigan)
- Sertac Karaman (MIT / Optimus Ride)