Progress on the ImageNet dataset seeded much of the excitement around the machine learning revolution of the past decade. In this talk, we analyze this progress in order to understand the obstacles blocking the path towards safe, dependable, and secure machine learning.
First, we compare the performance of current ImageNet models to five trained human labelers. We measure accuracy both on ImageNet and under small distribution shift in the form of ImageNetV2. Humans still outperform the best machine models available in 2020, for instance by a margin of more than 4 and 11 percentage points on the 590 object classes in ImageNet and ImageNetV2, respectively.
Second, we investigate to what extent current robustness interventions help on naturally occurring distribution shifts. We conduct a large benchmark involving more than 200 models and distribution shifts. While there has been substantial progress on various synthetic distribution shifts, only models trained on data beyond ImageNet provide substantial robustness on ImageNetV2 or ObjectNet. The primary example is the recent CLIP model from OpenAI, which achieves unprecedented robustness on all natural distribution shifts in our testbed.
Based on joint work with Nicholas Carlini, Achal Dave, Alex Fang, Horia Mania, Ben Recht, Rebecca Roelofs, Vaishaal Shankar, and Rohan Taori.