Nvidia Releases Cosmos-Transfer1 AI Model That Can Be Used for Simulation-Based Training for Robots

Nvidia released a new artificial intelligence (AI) model last week that can be used to train robots on simulation. Dubbed Cosmos-Transfer 1, the new world generation large language model (LLM) is aimed at AI-powered robotics hardware, also known as physical AI. The company has released the model in open source with a permissive licence, and interested individuals can download it from popular online repositories. The Santa Clara-based tech giant highlighted that the main advantage of the latest AI model is that users will have granular control over the generated simulations.

Nvidia Releases AI Model to Train Robots

Simulation-based robotics training has gained wind in recent times due to the advancement in generative AI technology. This specific branch of robotics deals with hardware that uses an AI for its brain. Essentially, the training method trains the brain of the machine in various real-world scenarios so that it can handle a wider range of tasks. This is a big improvement compared to current robots in factories that are designed to complete a single task.

Nvidia’s Cosmos-Transfer1 is part of the company’s Cosmos Transfer world foundation models (WFMs) which ingest structured video input such as segmentation maps, depth maps, lidar scans and more to generate photoreal video outputs. These outputs can then be used as simulation ground to train physical AI.

In a paper published in the arXiv journal, the company stated that this model offers greater customisation than its predecessors. It enables varying the weight of different conditional inputs based on spatial location. Essentially, this will allow developers to generate highly controllable world generation. Another advantage of the model includes real-time world generation that is helpful in faster and more diverse training sessions.

Coming to model specifics, the Cosmos-Transfer1 is a diffusion-based model with seven billion parameters. It is designed for video denoising in the latent space, and can be modulated by a control branch. The model accepts text and video as input, and using both, it can generate a photorealistic output video. The model supports four types of control input videos including canny edge, blurred RGB, segmentation mask, and depth map.

The AI model has been tested on Nvidia’s Blackwell and Hopper series chipsets, and the inference was run on the Linux operating system. The tech giant has made the AI model available with the Nvidia Open Model License Agreement which allows both academic and commercial usage.

Nvidia’s Cosmos-Transfer1 AI model can be downloaded from the company’s GitHub listing and Hugging Face listing. Another AI model with 14 billion parameters is expected to be released soon.

Source: https://www.gadgets360.com/