Written by 1:56 pm IAH Automation Roundup

NVIDIA introduces a new approach to robot control systems with its Cosmos Policy.

A notable advancement — a new robot control system that builds directly on its existing Cosmos world foundation models (WFMs)- it approaches the problem of teaching robots to act in the physical world.

Most current systems for robotics rely on vision-language models (VLMs) — systems trained to understand images and text that are then adapted to suggest robot actions. These models don’t inherently understand how the physical world evolves over time. They can recognise a cup and suggest “pick it up,” but the nuance of exactly how to reach, grip, and lift — accounting for gravity, object dynamics, and changing states — is not what they were originally trained for.

Cosmos Policy takes a fundamentally different route. It starts from Cosmos Predict-2, a world foundation model trained to predict how scenes change across time — essentially a model that already understands physics and physical dynamics at a deep level. Rather than attaching separate components for perception and control, Cosmos Policy encodes robot actions, physical states, and success scores as additional elements within the same framework the model already uses to understand video. This means perception and control are learned together within one unified system.

The practical upshot- single model that can handle three things simultaneously: guiding robot movement through visuomotor control (hand-eye coordination), predicting future states of the environment for planning purposes, and estimating the likely success of different action sequences.

For the robotics industry broadly, Cosmos Policy represents a meaningful step toward robots that learn from the same understanding of time, cause, and physical consequence that makes the real world coherent.

Visited 153 times, 1 visit(s) today
Close Search Window
Close