NVIDIA has released a massive, open-source dataset designed to accelerate the development of physical AI in robotics and autonomous vehicles (AVs). Unveiled at GTC in San Jose, the dataset aims to provide researchers and developers with a significant head start in building the next generation of AI models.
The initial dataset, available on Hugging Face, includes 15 terabytes of data featuring more than 320,000 trajectories for robotics training and up to 1,000 Universal Scene Description (OpenUSD) assets.
Future updates will add extensive data for AV development, including 20-second clips of diverse traffic scenarios from more than 1,000 cities across the US and Europe.
The dataset is crucial for developing AI models that can power robots to safely navigate complex environments, such as warehouses and hospitals, and for creating autonomous vehicles that can handle challenging traffic situations like construction zones.
It will also support the development of digital twins to simulate rare and challenging conditions, enhancing safety research and model performance.
Early adopters include prominent research centres such as Berkeley DeepDrive Center, Carnegie Mellon Safe AI Lab and Contextual Robotics Institute at UC San Diego. These institutions plan to use the dataset to advance AI models for predicting road user movements, evaluating self-driving car safety and developing semantic AI models for robots to understand various environments.
