Ethan Caballero
I'm interested in finding that which scales best according to all downstream evaluations/measurements that matter (~simultaneously), primarily in the largest scale regime (largest compute, largest dataset, ~largest model). These interests encompass all aspects of Artificial Neural Networks (unsupervised learning, reinforcement learning, capabilities, alignment, all modalities, science of deep learning, etc.).
I'm currently a graduate student at Mila working mostly with David Krueger and Irina Rish. Before I started focusing on the scaling perspective, I mostly worked on out-of-distribution generalization and generalization theory with Yoshua's and Aaron's students.
x = ethan ;
y = victor ;
z = caballero
email adresses:
x.y.z@gmail.com ;
x.z@mila.quebec
email /
cv /
linkedin /
twitter /
google_scholar /
github
|
|