I'm a math undergrad at Stanford and a researcher with the NLP group at SAIL. Previously, I built software at AWS and caught frogs working in outdoor swimming pool management and operations.
We introduce a bilevel optimization framework to train language models to self-adapt at inference time using reinforcement learning and meta-gradients.