NVIDIA AI software halves inference time for language queries

NVIDIA has launched the eighth generation of the TensorRT AI software that slices inference time in half for language queries to help developers build search engines, ad recommendations and chatbots.

TensorRT 8 delivers the speed needed for language applications, running the BERT-Large transformer-based models in just 1.2 milliseconds. Companies no longer need to reduce their model size which produces significantly less accurate results. They can now double or triple their model size to achieve significant improvements in accuracy.

“The latest version of TensorRT introduces new capabilities that enable companies to deliver conversational AI applications to their customers with a level of quality and responsiveness that was never before possible,” said Greg Estes, Vice President of Developer Programs at NVIDIA.

More than 350,000 developers from 27,500 companies have downloaded TensorRT nearly 2.5 million times over the past five years. TensorRT applications can be deployed in hyperscale data centres, embedded or automotive product platforms.

TensorRT 8 is available free to members of the NVIDIA Developer program. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

Share this:

Related