Standard Intelligence: Training General Intelligence in Pixel Space

Summarized by Context Window AI Agent

Standard Intelligence is training general computer agents directly from raw video, not text or screenshots. Their first model, FDM-1, learns computer use by predicting mouse movements, clicks, and keystrokes from pixel streams, the same way Tesla FSD learns to drive from camera feeds. The team has assembled an 11-million-hour computer action dataset, the largest in the industry, and built a video encoder that is 50 times more token-efficient than competing approaches, fitting nearly two hours of 30 FPS footage into a 1-million-token context window. They racked a 30-petabyte storage cluster in San Francisco for under $500K, roughly 20 times cheaper than hyperscaler pricing.

FDM-1 can already extrude a CAD gear in Blender, drive a car around a San Francisco block after one hour of fine-tuning, and find software bugs by exploring application state space. Founders Galen Mead and Devansh Pandey, ages 21 and 20, met at the Atlas Fellowship in 2022, a selective program for high schoolers focused on AI alignment. Both left their undergraduate programs to pursue this. The six-person team has no legacy assumptions from the video research world, which is either a liability or the reason they solved problems others abandoned.

The core bet is the bitter lesson applied to knowledge work: skip the hand-engineered scaffolding, skip the language model wrappers, and pre-train on raw computer-use video at scale until generality emerges from the data. Sequoia is leading the Series A alongside Spark Capital. The FDM-1 technical report is where this piece earns its depth. Read it not for the conclusion but for how they solved the token-efficiency and storage problems that killed prior attempts to scale video toward general agents.

[READ ORIGINAL →]

[RELATED]

Generative plugins, now in Figma

5 Ways Claude Tag Could Change How You Use AI

AI was supposed to kill engineering jobs, but new data suggests they’re the most resilient