Research
Research notes, experiments, and technical evidence from the lab.
Active work, open questions, and figures from phi9. The log tracks the systems, experiments, and results that shape what we build next.
Policy-based Deep RL: Lunar Landing
Pure Monte Carlo in an actor-critic setting — training a lunar lander with REINFORCE and analyzing the learned policy's decision-making.
The First Project: Hierarchical RL for Robotic Manipulation
From sparse rewards to Gaussian-gated hierarchical policies — how we trained a 7-DOF arm to pour coffee in under 2,000 iterations.