NVIDIA has released the Nemotron-Personas-Singapore synthetic dataset co-designed with AI Singapore to support locally grounded AI models.
This first-of-its-kind resource features 888,000 Singaporean personas across 148,000 records, spanning professional, sports, arts, travel, and culinary categories, generated via NVIDIA’s NeMo Data Designer for privacy-safe training.
It has about 118 million tokens, with 48 million dedicated to persona tokens, aligned to Singapore’s demographic statistics including names, sex, age, and ethnicity.
The fully synthetic dataset ensures PDPA compliance with no real personal data or re-identification risks, supporting bias evaluation in sectors such as finance and healthcare.
With seamless integration with Nemotron models and other open-source LLMs, it will enable developers to fine-tune AI agents and systems for Singapore-specific use cases.
Nemotron-Personas-Singapore is the latest in NVIDIA’s open synthetic personas collection, which covers datasets for the United States, Japan, India, and Brazil.
Alignment with Singapore’s AI drive
Nemotron-Personas-Singapore aligns with Singapore’s National AI Strategy 2.0 and the AI for the Public Good vision that emphasises data sovereignty, cultural relevance and excellence in AI research.
It also supports the recent National AI Research and Development Plan, backed by more than S$1 billion through 2030, to build resource-efficient AI and applied capabilities amid investments such as the NVIDIA-powered Aspire 2A+ supercomputer.
By providing culturally contextualised data for models, it advances Singapore’s goal of trusted, homegrown AI systems tailored to local norms and Southeast Asian contexts.
