AI sound model gives users unrivalled audio control

A team of researchers has developed a generative AI model called Fugatto that is set to transform the audio landscape, offering users unparalleled control over sound creation and manipulation.

Fugatto is a versatile tool that allows users to generate or modify any combination of music, voices and sounds using simple text prompts and audio files.

Its capabilities extend far beyond traditional AI sound models. Users can now create music snippets from text descriptions, add or remove instruments from existing songs, alter voice accents and emotions, and even produce entirely new, never-before-heard sounds.

The model’s flexibility opens up a world of possibilities for various industries. Music producers can quickly prototype ideas, experimenting with different styles, voices and instruments. Advertising agencies can easily adapt campaigns for multiple regions by applying different accents and emotions to voiceovers. Language learning tools can be personalised with familiar voices, while video game developers can create dynamic audio assets that respond to in-game actions.

Perhaps one of Fugatto’s most innovative features is its ability to generate unique sounds, akin to the “avocado chair” concept in visual AI. Users can describe and create unconventional audio combinations, such as a trumpet barking or a saxophone meowing. The model also demonstrates emergent properties, showcasing capabilities that arise from the interaction of its various trained abilities.

Fugatto’s user-friendly interface allows for fine-tuned control over audio attributes. The ComposableART technique lets users combine instructions in novel ways, while temporal interpolation allows for the creation of evolving soundscapes. These features provide users with an unprecedented level of artistic control over their audio creations.

“We wanted to create a model that understands and generates sound like humans do. Fugatto is our first step toward a future where unsupervised multi-task learning in audio synthesis and transformation emerges from data and model scale,” said Rafael Valle, Manager of Applied Audio Research at NVIDIA and one of the dozen-plus people, including an orchestral conductor and composer, behind Fugatto.