Meet Groq, a Lightning Fast AI Accelerator that Beats ChatGPT and Gemini

COMBOFRE February 22, 2024

Whereas utilizing ChatGPT, particularly with the GPT-4 mannequin, you need to have observed how gradual the mannequin responds to queries. To not point out, voice assistants based mostly on giant language fashions like ChatGPT’s Voice Chat characteristic or the not too long ago launched Gemini AI, which changed Google Assistant on Android telephones are even slower because of the excessive latency of LLMs. However all of that’s more likely to change quickly, because of Groq’s highly effective new LPU (Language Processing Unit) inference engine.

Groq has taken the world unexpectedly. Thoughts you, this isn’t Elon Musk’s Grok, which is an AI mannequin accessible on X (previously Twitter). Groq’s LPU inference engine can generate a large 500 tokens per second when working a 7B mannequin. It comes right down to round 250 tokens per second when working a 70B mannequin. This can be a far cry from OpenAI’s ChatGPT, which runs on GPU-powered Nvidia chips that provide round 30 to 60 tokens per second.

Groq is Constructed by Ex-Google TPU Engineers

Groq shouldn’t be an AI chatbot however an AI inference chip, and it’s competing in opposition to business giants like Nvidia within the AI {hardware} area. It was co-founded by Jonathan Ross in 2016, who whereas working at Google co-founded the crew to construct Google’s first TPU (Tensor Processing Unit) chip for machine studying.

Later, many staff left Google’s TPU crew and created Groq to construct {hardware} for next-generation computing.

What’s Groq’s LPU?

The explanation Groq’s LPU engine is so quick compared to established gamers like Nvidia is that it’s constructed totally on a special type of strategy.

In keeping with the CEO Jonathan Ross, Groq first created the software program stack and compiler after which designed the silicon. It went with the software-first mindset to make the efficiency “deterministic” — a key idea to get quick, correct, and predictable ends in AI inferencing.