Ethan Caballero

I'm interested in finding that which scales best according to all downstream evaluations/measurements that matter (~simultaneously), primarily in the largest scale regime (largest compute, ~largest dataset, ~largest model). These interests encompass all aspects of Artificial Neural Networks (unsupervised learning, reinforcement learning, capabilities, alignment, all modalities, science of deep learning, etc.).

I'm currently a PhD student at Mila. Before I started focusing on the scaling perspective, I mostly worked on out-of-distribution generalization and generalization theory with Yoshua's and Aaron's students.

x = ethan ; y = victor ; z = caballero

email adresses: x.y.z@gmail.com ; x.z@mila.quebec

email / cv / linkedin / twitter / google_scholar / github

(fork of this website)