I'm interested in finding all the downstream evaluations/measurements that matter and finding that which scales best according to all those downstream evaluations ~simultaneously, primarily in the largest scale regime (largest compute, largest dataset, ~largest model). These interests encompass all aspects of Artificial Neural Networks (unsupervised learning, reinforcement learning, capabilities, alignment, all modalities, science of deep learning, etc.).
I'm currently a PhD student at Mila working mostly with David Krueger, Irina Rish, and Blake Richards. Before I started focusing on the scaling perspective, I mostly worked on out-of-distribution generalization and generalization theory with Yoshua's and Aaron's students.
x = ethan ;
y = victor ;
z = caballero