Ethan Caballero

I'm interested in finding that which scales best according to all downstream evaluations/measurements that matter (~simultaneously), primarily in the largest scale regime (largest compute, largest dataset, ~largest model). These interests encompass all aspects of Artificial Neural Networks (unsupervised learning, reinforcement learning, capabilities, alignment, all modalities, science of deep learning, etc.).

I'm currently at Mila working mostly with David Krueger and Irina Rish. Before I started focusing on the scaling perspective, I mostly worked on out-of-distribution generalization and generalization theory with Yoshua's and Aaron's students.

x = ethan ; y = victor ; z = caballero

email adresses: x.y.z@gmail.com ; x.z@mila.quebec

email / cv / linkedin / twitter / google_scholar / github

(fork of this website)