
I recently completed my master’s in Computational Science and Engineering (CSE) at Harvard University. I’m now in San Francisco, building realtime voice agents and browser-use systems for enterprise.
At Harvard, I researched reliable LLM post-training—especially Reinforcement Learning with Verifiable Rewards (RLVR)—advised by Sham Kakade at the Kempner Institute, and I was a Teaching Fellow for AC215 (MLOps & LLMOps). Before that, I studied Computer Engineering at the Federal University of Paraíba (UFPB), and interned at Google (YouTube, Search) and Meta (AI for AR, Ads).
I like running, playing football, surfing, and reading.
You can reach me at itamardprf@gmail.com, visit my GitHub, or connect with me on LinkedIn. You can also access my resume (PDF).
Optimizing only the first k tokens of each solution—via a lightweight RL-tuned adapter (Prefix-RL) or prefix clustering—steers a frozen LLM’s reasoning strategy, recovering much of full RL’s math gains at a fraction of the compute.
A study of Evolution Strategies as an exploration-based alternative to gradient RL for fine-tuning LLMs—analyzing when ES matches RL baselines and when it avoids reward-hacking failure modes.
An AI safety experiment analyzing evaluation hacking in language-model agents, combining controlled task design, behavioral analysis under pressure, and systematic testing across models.
An AI-powered running coach on WhatsApp, combining RAG, fine-tuned LLMs, vector search, and a GKE deployment.