Honey, I shrunk the LLM! A Beginner's Guide to Quantization

Home › News, Other Content › Honey, I shrunk the LLM! A Beginner's Guide to Quantization

Sources from Theregister • AI Anytime on Youtube • Published 2 months ago

Hands on If you hop on Hugging Face and start scrolling through large language models, you'll quickly notice a trend: Most have been trained on 16-bit floating-point Brain-float precision.

LLM's Quantization Crash Course for Beginners

FP16 and BF16 have become quite popular for machine learning – not only because they provide a good balance between accuracy, throughput and model size – but the data types are widely supported across the vast majority of hardware, be it CPUs, GPUs or dedicated AI accelerators.

The problem comes when you try to run models, especially larger ones, with 16-bit tensors on a single chip. With two bytes per parameter, a model like the Llama-3-70B requires at least 140 GB of very fast memory, and that doesn't include other overhead, such as key-value cache.

To get around this, you can either split the model across multiple chips – or even servers – or you can compress the model weights to a lower precision in a process called quantization.

Tagged: ai anytime AI chatbot auto-gptq autogptq awq bitsandbytes bnb chatBots chatgpt Coding Computer Vision Deep Learning gemini gen ai generative ai generative ai for beginners ggml gguf GGUF LLM google HuggingFace LangChain llama 3 gguf llama index llm LLM quantization Machine Learning meta ai ml NLP openai optimum phi-3 phi-3 gguf phi3 Python quant quantization quantization of llm quantization of LLMs rag Tech vector database

Honey, I shrunk the LLM! A Beginner's Guide to Quantization.
Want to go more in-depth? Ask a question to learn more about the event.

1 in 10 companies dump infosec goods after Crowstrike outage

News Other Content

1 in 10 companies dump infosec goods after Crowstrike outage

Sources from Theregister • John Hammond on Youtube • Published 6 hours ago

News Other Content

LinkedIn scrapes user content for its AI without asking

Sources from Theregister • aRafys on Youtube • Published 6 hours ago

Fortnite: All Stark Fan Club Found Quests Guide | Strong Super Fan Secret Quests

Games Guides How to

Fortnite: All Stark Fan Club Found Quests Guide | Strong Super Fan Secret Quests

Sources from Gamesfuze • HarryNinetyFour on Youtube • Published 6 hours ago

NTT Data is building a mainframe cloud for banks

News Other Content

NTT Data is building a mainframe cloud for banks

Sources from Theregister • NTT DATA UK on Youtube • Published 6 hours ago

Over £20B in UK government IT contracts are due to expire soon. What next?

News Other Content

Over £20B in UK government IT contracts are due to expire soon. What next?

Sources from Theregister • Derek James on Youtube • Published 6 hours ago