Research

Research notes, experiments, and technical evidence from the lab.

Active work, open questions, and figures from phi9. The log tracks the systems, experiments, and results that shape what we build next.

Pure Monte Carlo in an actor-critic setting — training a lunar lander with REINFORCE and analyzing the learned policy's decision-making.

From sparse rewards to Gaussian-gated hierarchical policies — how we trained a 7-DOF arm to pour coffee in under 2,000 iterations.