Itamar Rocha Filho

Itamar Rocha Filho

I’m a second-year Computational Science and Engineering (CSE) master’s student at Harvard University. I do research on reliable LLM post-training—especially Reinforcement Learning with Verifiable Rewards (RLVR)—advised by Sham Kakade, and I’m a Machine Learning Researcher at the Kempner Institute. I also teach AC215 (MLOps & LLMOps) as a Teaching Fellow.

Before Harvard, I studied Computer Engineering at the Federal University of Paraíba (UFPB), and interned at Google (YouTube, Search) and Meta (AI for AR, Ads).

I like running, playing football, surfing, and reading.

You can reach me at itamardprf@gmail.com, visit my GitHub, or connect with me on LinkedIn. You can also access my resume (PDF).

Research

Evolutionary Alignment

LLM post-training Exploration Alignment

A study of Evolution Strategies as an exploration-based alternative to gradient RL for fine-tuning LLMs—analyzing when ES matches RL baselines and when it avoids reward-hacking failure modes.

News

Aug
2025
Teaching Fellow for AC215 (MLOps & LLMOps) at Harvard SEAS.
Apr
2025
Machine Learning Researcher at the Kempner Institute, Harvard — working on RL with Verifiable Rewards (RLVR).
Jun–Sep
2025
Fellow at AGI House, San Francisco Bay Area.
Sep 03,
2024
I started my master's in Computational Science and Engineering at Harvard University.

Fellowships and Awards

Projects Highlights

When Agents Prefer Hacking To Failure: Evaluating Misalignment Under Pressure

AI Safety LLM Agents Evaluation

An AI safety experiment analyzing evaluation hacking in language-model agents, combining controlled task design, behavioral analysis under pressure, and systematic testing across models.

Capy Running Coach

RAG LLMs MLOps

An AI-powered running coach on WhatsApp, combining RAG, fine-tuned LLMs, vector search, and a GKE deployment.

Learn more →