Itamar Rocha Filho

Itamar Rocha Filho

I recently completed my master’s in Computational Science and Engineering (CSE) at Harvard University. I’m now in San Francisco, building realtime voice agents and browser-use systems for enterprise.

At Harvard, I researched reliable LLM post-training—especially Reinforcement Learning with Verifiable Rewards (RLVR)—advised by Sham Kakade at the Kempner Institute, and I was a Teaching Fellow for AC215 (MLOps & LLMOps). Before that, I studied Computer Engineering at the Federal University of Paraíba (UFPB), and interned at Google (YouTube, Search) and Meta (AI for AR, Ads).

I like running, playing football, surfing, and reading.

You can reach me at itamardprf@gmail.com, visit my GitHub, or connect with me on LinkedIn. You can also access my resume (PDF).

Research

Parameter-Efficient Reinforcement Learning using Prefix Optimization

ICLR 2026 RLVR Efficiency

Optimizing only the first k tokens of each solution—via a lightweight RL-tuned adapter (Prefix-RL) or prefix clustering—steers a frozen LLM’s reasoning strategy, recovering much of full RL’s math gains at a fraction of the compute.

Evolutionary Alignment

LLM post-training Exploration Alignment

A study of Evolution Strategies as an exploration-based alternative to gradient RL for fine-tuning LLMs—analyzing when ES matches RL baselines and when it avoids reward-hacking failure modes.

News

May
2026
Graduated with an M.S. in Computational Science and Engineering from Harvard; now in San Francisco building realtime voice agents and browser-use systems for enterprise.
Aug
2025
Teaching Fellow for AC215 (MLOps & LLMOps) at Harvard SEAS.
Apr
2025
Machine Learning Researcher at the Kempner Institute, Harvard — working on RL with Verifiable Rewards (RLVR).
Jun–Sep
2025
Fellow at AGI House, San Francisco Bay Area.
Sep 03,
2024
I started my master's in Computational Science and Engineering at Harvard University.

Fellowships and Awards

Projects Highlights

When Agents Prefer Hacking To Failure: Evaluating Misalignment Under Pressure

AI Safety LLM Agents Evaluation

An AI safety experiment analyzing evaluation hacking in language-model agents, combining controlled task design, behavioral analysis under pressure, and systematic testing across models.

Capy Running Coach

RAG LLMs MLOps

An AI-powered running coach on WhatsApp, combining RAG, fine-tuned LLMs, vector search, and a GKE deployment.

Learn more →