This study investigates Evolution Strategies (ES) as an alternative to gradient-based Reinforcement Learning methods for fine-tuning large language models. The results show that, when properly tuned, ES can achieve competitive performance while avoiding common reward-hacking behaviors observed in GRPO baselines. Through extensive hyperparameter sweeps and geometric analysis of weight space, the work highlights fundamental differences in how ES and GRPO explore and converge to solutions. Finally, the findings demonstrate that ES offers a sample-efficient and robust approach to safety alignment, outperforming established RL-based baselines on helpful–harmless tasks.
As a Machine Learning Researcher at the Kempner Institute at Harvard University, I focus on RLVR, exploring reward specification and verification to build more robust and efficient learning systems along with understanding how they work in practice. We currently have one paper under review at ICLR'26 on parameter efficient Reinforcement Learning and another ongoing research on the ceilling effect of coverage on RL for LLMs.
I also collaborate on applied LLM projects and mentoring activities alongside my research. I am an advisor at the Technology and Artificial Intelligence League of UFPB and at momento.sh. I was also selected to attend the Summit of AI in LatAm (SALA) 2026, so you can find me at Quito in Ecuador from March 9 to 12.
The research conducted here was in the academic environment of the Federal University of Paraíba (UFPB). It is important to point out that three of my internships involved working with research closely to industry research scientists and represent confidential work.
At Meta, I had the pleasure of working with the Behavioural Computing team in London. There, my project consisted of productionizing state-of-the-art speech models. At Google, while working on YouTube, I was responsible for conducting experiments and coming up with new ideas to improve the model used in one of the portions of the video compression pipeline. Industry internships in Machine Learning while still undergraduate are challenging due to competition and complexity. I had this experience in both companies and got great feedback from both teams.
Video Description is a vital accessibility concept in
blind and visually impaired people's life.
Automating this task is not easy and involves many different problems.
This paper presents an approach to automatically describe characters.
The paper shows the results and comparisons of different
computer vision networks and algorithms on our newly created Iris flower image dataset.
The main goal is to propose a new toy/benchmark dataset that is in fact more challenging.