Andy Zhao

I am a PhD candidate in Electrical Engineering & Computer Science at the University of Southern California’s Center of Autonomy & Artificial Intelligence and AutoDrive Lab, advised by Professor Rahul Jain. My research blends reinforcement learning, multi‑agent game theory and post‑training large foundation models, with an emphasis on strategic reasoning and efficient policy optimisation.

Previously I was a graduate student at the Massachusetts Institute of Technology (MIT) in the Operations Research Center. My work there focused on approximation algorithms for NP‑hard optimisation problems under Professor Dimitris Bertsimas. Well before that, I obtained quadruple degrees in Computer Science (EECS), Data Science, Pure Mathematics and Statistics from the University of California, Berkeley. During my time at Berkeley I was fortunate to work with Professor James Demmel on eigenfaces and federated learning, Nobel laureate Saul Perlmutter on the Public Editor Project, and Dr. Jiahao Yao on machine learning applications in finance.

Andy Zhao

News

Research Highlights

Selected recent projects and publications that capture my interests in reinforcement learning, strategic decision making and LLM post‑training. Additional research I did before my PhD can be found on ResearchGate and Google Scholar.

Episodic Zero‑Sum Game Learning figure
Episodic Zero‑Sum Game Learning
IEEE TAC 2026

Develops sample‑efficient algorithms for learning in episodic zero‑sum games under unknown rewards and dynamics, with provable regret bounds and empirical performance on simulated autonomous driving scenarios.

Fine‑Tuning LLMs for Strategic Multi‑Agent Reasoning figure
Fine‑Tuning LLMs for Strategic Multi‑Agent Reasoning
AISTATS 2026

Introduces a framework for aligning large language models with game‑theoretic reasoning tasks, combining preference optimisation with reinforcement learning in Markov games.

Generalized Quantal Response Equilibrium figure
Generalized Quantal Response Equilibrium: Existence and Efficient Learning
NeurIPS 2025

We establish existence proofs and propose a scalable learning algorithm for a broad class of quantal response equilibria, enabling robust modelling of bounded‑rational agents in multi‑agent systems.

End‑to‑End Learning for Non‑Markovian Control figure
End‑to‑End Learning for Non‑Markovian Control
ICML 2024

Presents an RL framework for solving non‑Markovian optimal control problems by learning latent representations and controllers jointly, bridging model‑based control and deep RL.

Experience

Samsung Research America
Mountain View, CA
Generative AI Research Intern, Advanced ML Lab (AWS Bedrock, Kendra, LangChain/LangGraph)
May 2025 – Present

• Built & optimized LLM search; improved top‑k relevance by 35% & latency by 40%.

• Designed & deployed LLM recommender; boosted CTR by 20%.

University of Southern California
Los Angeles, CA
Research Assistant
Aug 2024 – Present

• Developed multi‑agent general‑ and zero‑sum game algorithms for LLM post‑training & autonomous systems.

• Designed multi‑agent & multi‑objective direct preference optimisation for LLM alignment.

LLM Stealth Startup (MIT Momentum Accelerator)
Boston, MA & San Francisco, CA
Founding Engineer
Aug 2023 – Aug 2024

• Post‑trained an LLM on 1 000 docs for 100+ users, enabling friendships & dating.

• Built personalised event AI with LangGraph; retention +500%.

Ameriprise Financial
Boston, MA
Data Scientist
Feb 2023 – Aug 2023

• Built models to forecast European real‑estate growth at 500 m resolution.

• Improved test accuracy by 50% with deep nets; added interpretability via boosting & tailored metrics.

MIT Operations Research Center
Boston, MA
Research Assistant
Sep 2022 – Feb 2023

• Proposed IP approximation algorithms beating state‑of‑the‑art on NP‑hard problems.

• Developed ML scheme maximizing patient survival across treatments; improved AUC to 80%.

Alpha Square Group
New York, NY
Quantitative Research Intern
Jun 2022 – Aug 2022

• Built NLP models summarising alternative data & quantifying keywords.

• Built scoring system for 500+ deals in primary‑market investments.

Education (Teaching / Coursework / Website)

University of Southern California
Los Angeles, CA
PhD Candidate in Electrical Engineering & Computer Science, Center of Autonomy & Artificial Intelligence, GPA:4/4
2023 – Present

• Awards include the Ming Hsieh Fellowship, Viterbi Graduate Fellowship, Annenberg School Award and Early Career Innovator Award.

Massachusetts Institute of Technology
Cambridge, MA
Master of Science in Business Analytics & Operations Research, GPA:5.0/5.0
2022 – 2023

• Second place in the MIT Analytics Lab Contest (40 teams); Top 10 in the MIT Hackathon (1,000+ attendees).

University of California, Berkeley
Berkeley, CA
Bachelor of Science in Computer Science, Statistics, Mathematics & Data Science, GPA:3.9/4.0
2018 – 2022

• Awards: first place in the California Actuarial League competition; finalist in the Mathematics Contest in Modeling.

• Honors Thesis: Data Science Honors – Time series modelling on Finance; EECS Honors – PCA & Facial Recognition.