I am an ivado postdoctoral researcher at MILA, working with Sarath Chandar. My recent research interests span vision-language-action (VLA) world models, (continual/temporal) learning on streaming data, and real-world verification of generative model outputs.
I completed my PhD at UNSW Sydney in August 2025, where I was advised by Lina Yao and Dong Gong. During the latter half of my PhD, I worked as an applied research scientist at openstream.ai, and as a research intern at Sony (hosted by Shiqi Yang and Shusuke Takahashi ) and Tencent (hosted by Shengju Qian).
Prior to my PhD, I worked on continual learning with Joost van de Weijer, and did my Erasmus Mundus Joint Master's Degree (EMJMD) in Advanced Systems Dependability at the University of St Andrews, the UK and l'Université de Lorraine, France. During my master's, I interned in Emmanuel Vincent's group at Inria Nancy. I once wrote this medium blog documenting my EMJMD experience to help guide future aspirants.
In my free time, I enjoy learning to travel, snorkelling, hiking, and binge-watching. I grew up in eastern Nepal, and every couple of years, I like to plan week-long treks in the Nepalese Himalayas.
I am always keen on hearing from students and collaborators who are interested in my research. Please feel free to reach out with any ideas/questions.
Research aligned with advancing Canada's R3AI initiative.
Infra-focus: Developed production-grade conversational LLM agents for enterprise clients.
ML-focus: Implemented & shipped a POC for neuro-symbolic verification of multi-agent systems.
Worked on controllable image generation and preference optimization for multi-modal LLMs.
Worked on continual personalization of pre-trained text-to-image diffusion models.
Worked on rehearsal-free continual learning for Vision Transformers (ViTs).
Worked on learning domain-specific language models for speech recognition.
Worked on improving FactSet's named entity recognition service with acronym disambiguation and neural topic modeling.
We propose Verification through Spatial Assertions (ViSA), a proposer-solver method that enables faithful test-time verification of world model views for enhancing the spatial reasoning in existing VLMs.
ICLR 2025
We propose using diffusion classifier scores for regularizing the parameter-space and function-space of text-to-image diffusion models, to achieve continual personalization.
Our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling frame- work over visual-guided text features per task, thus providing more calibrated CL finetuning.
We propose a neural process-based continual learning approach with task-specific modules arranged in a hierarchical latent variable model. We tailor regularizers on the learned latent distributions to alleviate forgetting.
We investigate the continual learning of Vision Transformers (ViT) for the challenging exemplar-free scenario, with special focus on how to efficiently distill the knowledge of its crucial self-attention mechanism.