GRPO on Qwen2.5-1.5B
Teaching a 1.5B model math reasoning with Group Relative Policy Optimization on GSM8K. +47 pts strict accuracy with LoRA on 8 A4000s.
Some neat demos for my machine learning projects.
Teaching a 1.5B model math reasoning with Group Relative Policy Optimization on GSM8K. +47 pts strict accuracy with LoRA on 8 A4000s.
Generate code from live documentation using Retrieval-Augmented Generation (RAG), with a CLI, a REST API, and a thin-client / fat-server install split.
A denoising diffusion model for MNIST: watch it generate a digit from pure noise running live in your browser
From-scratch AlphaZero implementation: play against my trained model running live in your browser.