SambaNova Cloud serves Llama 3.1 405B with 100+ tokens/s

Home › News, Other Content › SambaNova Cloud serves Llama 3.1 405B with 100+ tokens/s

Sources from Theregister • Fireship on Youtube • Published 1 week ago

Not to be outdone by rival AI system upstarts, SambaNova has launched its own inference cloud that it says is ready to serve Meta's biggest models faster than the rest.

Zuck's new Llama is a beast

The cloud offering is one of several that have emerged during the AI boom, offering API access to popular open weight models. Most of these are GPU-based, but for the more boutique vendors dealing in specialized hardware, such as Cerebras, Groq and now SambaNova, it seems whoever can get the biggest model to spit out tokens the fastest has a leg up .

If you're not familiar, tokens here refer to how large language models encode words, word fragments, punctuation, and figures. So the faster your infrastructure can generate tokens, the less time you have to wait for a response.

According to CEO Rodrigo Liang, SambaNova has managed to get Meta's 405 billion parameter Llama 3.1 model (more than twice the size of OpenAI's GPT-3.5 model) to extract tokens at a rate of 132 per second and with full 16-bit precision it was trained on no less.

Tagged: AI ai models ai news app development code report computing facebook ai gen ai generative ai generative ai news language learning model llama 3 llama 3.1 news llama 3.1 release llama ai llama ai how to use llama ai tutorial llama llm llama3.1 llm mark zuckerberg Meta meta ai meta gen ai meta imagine ai meta imagine you meta llama meta llama 3.1 meta llm meta news news Tech tech news Technology webdev zuckerberd llama zuckerberg zuckerberg ai

SambaNova Cloud serves Llama 3.1 405B with 100+ tokens/s.
Want to go more in-depth? Ask a question to learn more about the event.

Tor insists it is safe after police convict CSAM site administrator

News Other Content

Tor insists it is safe after police convict CSAM site administrator

Sources from Theregister • Law By Mike on Youtube • Published 8 hours ago

IBM is quietly cutting thousands of jobs, the source claims

News Other Content

IBM is quietly cutting thousands of jobs, the source claims

Sources from Theregister • Category5 Technology TV with Robbie Ferguson on Youtube • Published 16 hours ago

Chinese spies spent four months on the airline's server

News Other Content

Chinese spies spent four months on the airline's server

Sources from Theregister • BBC News on Youtube • Published 17 hours ago

LockBit once again boasts of solving eFile.com

News Other Content

LockBit once again boasts of solving eFile.com

Sources from Theregister • Dr Josh Stroschein - The Cyber Yeti on Youtube • Published 20 hours ago

Microsoft launches Office LTSC 2024

News Other Content

Microsoft launches Office LTSC 2024

Sources from Theregister • SoftwareKeep on Youtube • Published 21 hours ago