Phi9 — physical AI lab

Physical AI research, data systems, and deployable intelligence.

Phi9 captures real-world behavior, structures it for training, and evaluates what actually transfers — so research can become usable capability.

The loop

From capture to evaluation, the physical AI loop.

One loop. Four stages. The work is turning real-world behavior into signal, training against it, and measuring what survives contact with reality.

001

Capture

Capture real-world demonstrations with synchronized motion, video, and structured task data. Start from real signal: task context, traces, and motion that can be used again.

002

Multiply

Multiply scarce data through retargeting, simulation, augmentation, and better structure. Stretch each capture further without letting intent or task definition drift away.

003

Train

Train policies and research systems on data that preserves intent, motion, and context. The point is not isolated models; it is a training layer that stays close to reality.

004

Evaluate

Evaluate what generalizes through benchmarks, failure analysis, and transfer tests. Treat deployment feedback and failure traces as part of the same loop.

  • 4

    Loop stages

  • 12+

    Modalities per capture

  • 25+

    Environment types

Research questions

The bottlenecks we are actively working through.

These are not abstract themes. They are the constraints shaping the systems, experiments, and artifacts we are building now.

  1. 01

    Data that carries intent, not just observation.

    Most pipelines record visible motion but lose the task underneath it. We are working on capture that preserves action, context, and what the body was trying to achieve.

  2. 02

    Physical data is expensive.

    You cannot scrape physical behavior. Every demonstration needs a rig, a subject, a calibration, and a clean task. The work is making each capture travel further without losing signal.

  3. 03

    Benchmarks that predict real-world performance.

    A benchmark score means little if a policy falls apart on an unscripted task. We care about evaluation that predicts transfer, failure, and what survives outside the benchmark.

  4. 04

    One loop, not three stages.

    Capture, training, and evaluation still get treated as separate departments. We are trying to wire them into one visible loop so progress does not disappear between stages.

Methods

The concrete systems we are building around the loop.

Methods should feel like work, not philosophy. These are the concrete layers we are building now so demonstration, training, and deployment stay connected.

Capture surfaces

Rigs, sync, task framing, and sensor traces that start with the real world instead of a benchmark-only abstraction.

Data structure

Labels, schemas, exports, and task boundaries that keep demonstrations reusable across research, training, and downstream tooling.

Training systems

Retargeting, augmentation, and policy pipelines that stretch scarce behavior while preserving intent, motion, and context.

Evaluation layer

Benchmark fragments, transfer tests, and failure traces that keep the loop honest about what actually generalizes.

Contact

Building in physical AI?

If you are working on data, research systems, or deployment infrastructure for physical intelligence, write to us.

Read the manifesto