Projects

Some neat demos for my machine learning projects.

GRPO on Qwen2.5-1.5B

Teaching a 1.5B model math reasoning with Group Relative Policy Optimization on GSM8K. +47 pts strict accuracy with LoRA on 8 A4000s.

DocRAG

Generate code from live documentation using Retrieval-Augmented Generation (RAG), with a CLI, a REST API, and a thin-client / fat-server install split.

DDPM from Scratch

A denoising diffusion model for MNIST: watch it generate a digit from pure noise running live in your browser

Connect4 AlphaZero

From-scratch AlphaZero implementation: play against my trained model running live in your browser.