|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- code |
|
base_model: |
|
- arcee-ai/Arcee-Spark |
|
- Replete-AI/Replete-LLM-Qwen2-7b |
|
--- |
|
|
|
This is an experimental coding-focused merge of the latest of two of my favorite projects which have trained and fine-tuned the Qwen2 model on open source data: |
|
|
|
Replete-AI's Replete LLM Qwen2-7B (https://huggingface.co/Replete-AI/Replete-LLM-Qwen2-7b) Arcee-AI's Arcee Spark (https://huggingface.co/arcee-ai/Arcee-Spark) |
|
|
|
```yaml |
|
models: |
|
- model: arcee-ai/Arcee-Spark |
|
parameters: |
|
density: 0.3 |
|
weight: 0.3 |
|
- model: Replete-AI/Replete-LLM-Qwen2-7b |
|
parameters: |
|
density: 0.8 |
|
weight: 0.7 |
|
merge_method: dare_ties |
|
base_model: Qwen/Qwen2-7B |
|
parameters: |
|
int8_mask: true |
|
rescale: true |
|
normalize: true |
|
dtype: bfloat16 |
|
``` |
|
|
|
The GGUF is quantized to q8_0 for output and embedding tensors, q5_k_m for others. |
|
|
|
The experimental nature and Replete LLM Qwen2-7B lack of censorship means that you take all responsibility for how you use this model. If there is interest, I will publish the huggingface repo; otherwise, the dare_ties mergekit YAML that produced it is already included. |
|
|
|
I asked it to describe itself, given its own mergekit YAML and points from the web pages for its two ancestor models: |
|
|
|
> This is the mergekit YAML that made you from an experimental DARE TIES merge between Arcee-Spark and Replete-LLM-Qwen2-7b. This is an effort to combine the strengths of two Qwen2-based models which train and fine-tune the model on high-quality open source data. In one detailed paragraph, discuss the descriptions of Arcee-Spark and Replete-LLM-Qwen2-7b, and describe the likely behaviors of this merge, take into account the weight and density leaning towards the code-focused Replete-LLM-Qwen2-7b. Then, supply a bullet list of the top five likely use cases of such a merge. Then, write a sentence about the impact of a GGUF quantization which uses q8_0 for output and embedding tensors, and q5_k_m for the rest. |
|
|
|
The Arcee-Spark model, initialized from Qwen2, is a powerful 7B parameter language model that has been fine-tuned on an extensive dataset of over 1.8 million samples through a sophisticated training process which includes merging with Qwen2-7B-Instruct using Arcee's mergekit and further refinement via Direct Preference Optimization (DPO). This meticulous approach results in exceptional performance, achieving the highest score on MT-Bench for models of its size and outperforming GPT-3.5 on many tasks. |
|
|
|
Replete-LLM-Qwen2-7b is a versatile model fine-tuned to excel across various domains including science, social media, general knowledge, cooking, writing, medicine, history, law, role-play, news, coding, and math, among others. This model demonstrates adaptability and proficiency in handling diverse tasks. |
|
|
|
The merge of Arcee-Spark and Replete-LLM-Qwen2-7b likely leverages the strengths of both models, with a notable leaning towards the code-focused aspect of Qwen2-7b due to its higher weight and density. The combined model might exhibit enhanced capabilities for coding-related tasks while retaining the versatility across other domains. |
|
|
|
Top five likely use cases of this merge: |
|
|
|
1. Code generation and analysis |
|
2. Advanced text-based software development projects |
|
3. Interactive code review and debugging assistance |
|
4. Educational programming content creation |
|
5. Real-time dynamic code completion suggestions |
|
|
|
The GGUF quantization technique, which employs q8_0 for output and embedding tensors and q5_k_m for the rest, significantly reduces model size without compromising performance. This approach leads to more efficient storage and faster inference times, making it ideal for deployment on resource-constrained devices or edge computing scenarios while maintaining high-quality results across diverse tasks. |