README.md · sometimesanotion/replete-spark-7b-GGUF at main

metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - code
base_model:
  - arcee-ai/Arcee-Spark
  - Replete-AI/Replete-LLM-Qwen2-7b

This is an experimental coding-focused merge of the latest of two of my favorite projects which have trained and fine-tuned the Qwen2 model on open source data:

Replete-AI's Replete LLM Qwen2-7B (https://huggingface.co/Replete-AI/Replete-LLM-Qwen2-7b) Arcee-AI's Arcee Spark (https://huggingface.co/arcee-ai/Arcee-Spark)

models:
  - model: arcee-ai/Arcee-Spark
    parameters:
      density: 0.3
      weight: 0.3
  - model: Replete-AI/Replete-LLM-Qwen2-7b
    parameters:
      density: 0.8
      weight: 0.7
merge_method: dare_ties
base_model: Qwen/Qwen2-7B
parameters:
  int8_mask: true
  rescale: true
  normalize: true
dtype: bfloat16

The GGUF is quantized to q8_0 for output and embedding tensors, q5_k_m for others.

The experimental nature and Replete LLM Qwen2-7B lack of censorship means that you take all responsibility for how you use this model. If there is interest, I will publish the huggingface repo; otherwise, the dare_ties mergekit YAML that produced it is already included.

I asked it to describe itself, given its own mergekit YAML and points from the web pages for its two ancestor models:

This is the mergekit YAML that made you from an experimental DARE TIES merge between Arcee-Spark and Replete-LLM-Qwen2-7b. This is an effort to combine the strengths of two Qwen2-based models which train and fine-tune the model on high-quality open source data. In one detailed paragraph, discuss the descriptions of Arcee-Spark and Replete-LLM-Qwen2-7b, and describe the likely behaviors of this merge, take into account the weight and density leaning towards the code-focused Replete-LLM-Qwen2-7b. Then, supply a bullet list of the top five likely use cases of such a merge. Then, write a sentence about the impact of a GGUF quantization which uses q8_0 for output and embedding tensors, and q5_k_m for the rest.

The Arcee-Spark model, initialized from Qwen2, is a powerful 7B parameter language model that has been fine-tuned on an extensive dataset of over 1.8 million samples through a sophisticated training process which includes merging with Qwen2-7B-Instruct using Arcee's mergekit and further refinement via Direct Preference Optimization (DPO). This meticulous approach results in exceptional performance, achieving the highest score on MT-Bench for models of its size and outperforming GPT-3.5 on many tasks.

Replete-LLM-Qwen2-7b is a versatile model fine-tuned to excel across various domains including science, social media, general knowledge, cooking, writing, medicine, history, law, role-play, news, coding, and math, among others. This model demonstrates adaptability and proficiency in handling diverse tasks.

The merge of Arcee-Spark and Replete-LLM-Qwen2-7b likely leverages the strengths of both models, with a notable leaning towards the code-focused aspect of Qwen2-7b due to its higher weight and density. The combined model might exhibit enhanced capabilities for coding-related tasks while retaining the versatility across other domains.

Top five likely use cases of this merge:

Code generation and analysis
Advanced text-based software development projects
Interactive code review and debugging assistance
Educational programming content creation
Real-time dynamic code completion suggestions

The GGUF quantization technique, which employs q8_0 for output and embedding tensors and q5_k_m for the rest, significantly reduces model size without compromising performance. This approach leads to more efficient storage and faster inference times, making it ideal for deployment on resource-constrained devices or edge computing scenarios while maintaining high-quality results across diverse tasks.