Research Highlights
Selected recent projects and publications that capture my interests in reinforcement learning,
strategic decision making and LLM post‑training. Additional research I did before my PhD can be found on
ResearchGate and
Google Scholar.
Episodic Zero‑Sum Game Learning
IEEE TAC 2026
Develops sample‑efficient algorithms for learning in episodic zero‑sum games under unknown rewards and
dynamics, with provable regret bounds and empirical performance on simulated autonomous driving scenarios.
Fine‑Tuning LLMs for Strategic Multi‑Agent Reasoning
AISTATS 2026
Introduces a framework for aligning large language models with game‑theoretic reasoning tasks,
combining preference optimisation with reinforcement learning in Markov games.
Generalized Quantal Response Equilibrium: Existence and Efficient Learning
NeurIPS 2025
We establish existence proofs and propose a scalable learning algorithm for a broad class of quantal
response equilibria, enabling robust modelling of bounded‑rational agents in multi‑agent systems.
End‑to‑End Learning for Non‑Markovian Control
ICML 2024
Presents an RL framework for solving non‑Markovian optimal control problems by learning latent
representations and controllers jointly, bridging model‑based control and deep RL.
Experience
• Built & optimized LLM search; improved top‑k relevance by 35% & latency by 40%.
• Designed & deployed LLM recommender; boosted CTR by 20%.
• Developed multi‑agent general‑ and zero‑sum game algorithms for LLM post‑training & autonomous systems.
• Designed multi‑agent & multi‑objective direct preference optimisation for LLM alignment.
• Post‑trained an LLM on 1 000 docs for 100+ users, enabling friendships & dating.
• Built personalised event AI with LangGraph; retention +500%.
• Built models to forecast European real‑estate growth at 500 m resolution.
• Improved test accuracy by 50% with deep nets; added interpretability via boosting & tailored metrics.
• Proposed IP approximation algorithms beating state‑of‑the‑art on NP‑hard problems.
• Developed ML scheme maximizing patient survival across treatments; improved AUC to 80%.
• Built NLP models summarising alternative data & quantifying keywords.
• Built scoring system for 500+ deals in primary‑market investments.
• Awards include the Ming Hsieh Fellowship, Viterbi Graduate Fellowship, Annenberg School Award and Early Career Innovator Award.
• Second place in the MIT Analytics Lab Contest (40 teams); Top 10 in the MIT Hackathon (1,000+ attendees).
• Awards: first place in the California Actuarial League competition; finalist in the Mathematics Contest in Modeling.
• Honors Thesis: Data Science Honors – Time series modelling on Finance; EECS Honors – PCA & Facial Recognition.