mysticbeing commited on
Commit
8088938
1 Parent(s): aa4f24d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -51,6 +51,17 @@ By accessing this model, you are agreeing to the LLama 3.1 terms and conditions
51
  Quantized version of [Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) with the updated 8 KV-heads.
52
  It achieves an average score of [TBD] on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.79.
53
 
 
 
 
 
 
 
 
 
 
 
 
54
  [Base model description - Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF):
55
 
56
  Llama-3.1-Nemotron-70B-Instruct-HF is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.
 
51
  Quantized version of [Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF) with the updated 8 KV-heads.
52
  It achieves an average score of [TBD] on the [OpenLLM](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) benchmark (version 1), whereas the unquantized model achieves 86.79.
53
 
54
+ ### Quantized models are eco-friendly and cost-effective
55
+ FP8 quantized models require significantly less storage compared to traditional 32-bit (FP32) or even 16-bit (FP16) models.
56
+ This reduction can be seen in the total file size comparison, where the FP8 model set is nearly half the size of the higher-precision set.
57
+ This efficiency enables easier distribution, storage, and access to powerful AI models, even on devices with limited capacity.
58
+
59
+ Lower hardware requirements mean reduced costs for businesses and public institutions adopting AI solutions. Small businesses, startups, and government entities, which may lack extensive AI budgets, can leverage high-performance,
60
+ FP8 quantized models to solve problems with half the infrastructure cost.
61
+
62
+
63
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6590c65952dc1046ca0f13fe/YfP2hvWReX8T6hPr_7Enl.png)
64
+
65
  [Base model description - Llama-3.1-Nemotron-70B-Instruct-HF](https://huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF):
66
 
67
  Llama-3.1-Nemotron-70B-Instruct-HF is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries.