IBM and Groq to turbocharge enterprise AI with faster, scalable inference

IBM and Groq are partnering to accelerate enterprise AI deployment by integrating the GroqCloud high-speed inference technology into IBM’s watsonx Orchestrate.

The collaboration aims to reduce latency, improve cost efficiency and deliver enterprise-grade AI performance for organisations moving from pilot projects to full production.

By combining Groq’s language processing unit (LPU) architecture with IBM’s agentic AI orchestration, enterprises will gain access to more than five times faster inference speeds compared to traditional GPU-based systems.

The performance leap is designed to help businesses in regulated industries such as healthcare, finance and government meet growing demands for real-time intelligence at scale.

For instance, IBM’s healthcare clients can now deploy AI agents capable of handling thousands of patient queries simultaneously to deliver rapid and accurate responses while maintaining compliance and reliability.

In the retail and consumer sectors, enterprises are harnessing these capabilities to automate HR functions and improve employee productivity.

“Many large organisations have options when experimenting with AI, but to move to production, they must ensure complex workflows operate successfully at scale. Our partnership with Groq ensures that,” said Rob Thomas, Senior Vice President of Software and Chief Commercial Officer at IBM.

According to Jonathan Ross, CEO and Founder of Groq, the alliance would help make agentic AI real for business.

“Beyond speed and resilience, this partnership is about transforming how enterprises work with AI, moving from experimentation to enterprise-wide adoption with confidence, and opening the door to new patterns where AI can act instantly and learn continuously,” he said.

Both companies also plan to integrate Groq’s LPU with Red Hat’s open-source vLLM technology to enhance developer experience by enabling seamless inference orchestration, load balancing and hardware acceleration. Users of watsonx will then be able to leverage Groq’s acceleration without leaving their existing environments.