Problem
Video alone loses the body.
Egocentric datasets show what the camera saw, but not how the full kinematic chain moved. Arms-only capture works for narrow pick-and-place tasks. Humanoids need walking, reaching, bending, and coordination in the same example.