Michael W. Mahoney is at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He is also a faculty scientist at the Lawrence Berkeley National Laboratory. He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, computational methods for neural network analysis, physics informed machine learning, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council’s Committee on the Analysis of Massive Data, he co-organized the Simons Institute’s fall 2013 and 2018 programs on the foundations of data science, he ran the Park City Mathematics Institute’s 2016 PCMI Summer Session on The Mathematics of Data, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. More information is available at https://www.stat.berkeley.edu/~mmahoney/.
Working with state-of-the-art (SOTA) neural network (NN) models is a practical business, and it demands a practical theory. It also provides us with an opportunity to ask questions at the foundations of data, including what is the role of theory and how theory can be formulated for models that depend so strongly on the data. We’ll review empirical results on nearly every publicly-available pre-trained SOTA model, and we’ll identify ubiquitous heavy-tailed structure in the correlations of weight matrices. Based on this, we’ll describe the theory of Heavy-Tailed Self-Regularization (HT-SR), which makes strong use of statistical mechanics and heavy-tailed Random Matrix Theory. HT-SR is a theory that can be used on SOTA NN models. For example, we can use it to identify multiple qualitatively-different phases of learning. We can use it to predict trends in the quality of SOTA NNs without access to training or testing data. Finally, by examining the complementary role of norm-based size metrics versus heavy-tailed shape metrics, we can use it to identify Simpson’s paradoxes in public contests aimed at finding metrics that were causally informative of generalization. We’ll conclude by describing several parallel theoretical results on more idealized models: using surrogate random design to provide exact expressions for double descent and implicit regularization; identifying a precise phase transition and the corresponding double descent for a random Fourier features model in the thermodynamic limit; how multiplicative noise leads to heavy tails in stochastic optimization; and a novel methodology to compute precisely the full distribution of test errors among.