Downtown-Case
/

Qwen_Qwen2.5-32B-Base-exl2-4.1bpw

@@ -4,16 +4,62 @@ license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE
 language:
 - en
 pipeline_tag: text-generation
 ---
-## THIS QUANTIZATION APPEARS TO BE BROKEN, JUST UPLOADED FOR TESTING
 ## Citation
 ```
 @misc{qwen2.5,
     title = {Qwen2.5: A Party of Foundation Models},

 language:
 - en
 pipeline_tag: text-generation
+base_model:
+- Qwen/Qwen2.5-32B
+library_name: transformers
 ---
+# Quantization
+4.1bpw quantization using default settings, for good amount of context on a 24GB GPU.
+This is the base model, not instruct! Base models tend to be better for raw completion (like novel continuation), especially at long context.
+# Qwen2.5-32B
+## Introduction
+Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. Qwen2.5 brings the following improvements upon Qwen2:
+- Significantly **more knowledge** and has greatly improved capabilities in **coding** and **mathematics**, thanks to our specialized expert models in these domains.
+- Significant improvements in **instruction following**, **generating long texts** (over 8K tokens), **understanding structured data** (e.g, tables), and **generating structured outputs** especially JSON. **More resilient to the diversity of system prompts**, enhancing role-play implementation and condition-setting for chatbots.
+- **Long-context Support** up to 128K tokens and can generate up to 8K tokens.
+- **Multilingual support** for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
+**This repo contains the base 32B Qwen2.5 model**, which has the following features:
+- Type: Causal Language Models
+- Training Stage: Pretraining
+- Architecture: transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+- Number of Parameters: 32.5B
+- Number of Paramaters (Non-Embedding): 31.0B
+- Number of Layers: 64
+- Number of Attention Heads (GQA): 40 for Q and 8 for KV
+- Context Length: 131,072 tokens
+**We do not recommend using base language models for conversations.** Instead, you can apply post-training, e.g., SFT, RLHF, continued pretraining, etc., on this model.
+For more details, please refer to our [blog](https://qwenlm.github.io/blog/qwen2.5/), [GitHub](https://github.com/QwenLM/Qwen2.5), and [Documentation](https://qwen.readthedocs.io/en/latest/).
+## Requirements
+The code of Qwen2.5 has been in the latest Hugging face `transformers` and we advise you to use the latest version of `transformers`.
+With `transformers<4.37.0`, you will encounter the following error:
+```
+KeyError: 'qwen2'
+```
+## Evaluation & Performance
+Detailed evaluation results are reported in this [📑 blog](https://qwenlm.github.io/blog/qwen2.5/).
+For requirements on GPU memory and the respective throughput, see results [here](https://qwen.readthedocs.io/en/latest/benchmark/speed_benchmark.html).
 ## Citation
+If you find our work helpful, feel free to give us a cite.
 ```
 @misc{qwen2.5,
     title = {Qwen2.5: A Party of Foundation Models},