Kyunghyun Cho is an associate professor of computer science and data science at New York University and CIFAR Fellow of Learning in Machines & Brains. He is also a senior director of frontier research at the Prescient Design team within Genentech Research & Early Development (gRED). He was a research scientist at Facebook AI Research from June 2017 to May 2020 and a postdoctoral fellow at University of Montreal until summer 2015 under the supervision of Prof. Yoshua Bengio, after receiving PhD and MSc degrees from Aalto University April 2011 and April 2014, respectively, under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.
Data augmentation has been found as a key aspect in modern machine learning. Especially in domains and problems in which there is no knowledge of important invariances and equivariances, a data augmentation procedure can be designed to encourage a machine learning model to encode those invariances and equivairances. In the case of natural language processing, it is unfortunately difficult to come up with such a data augmentation procedure, as the knowledge of invariances and equivariances is limited. Instead, one needs to rely on a vast amount of unlabelled data to “learn to augment” data. In his presentation, Kyunghyun will talk about two different approaches. In the first approach, a standard masked language model is used to produce a set of samples given a training sequence to augment the data along the data (text) manifold learned by the masked language model. In the second approach, an algorithm is designed that learns to interpolate two text snippets, allowing to use a successful data augmentation method, called mixup, which requires a mechanism to mix in contents from two different examples. At the end, Professor Kyunghyun Cho will talk briefly about how this learned data augmentation can be used to predict generalization as well.