Arcee-Nova / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
acfe4f8 verified
|
raw
history blame
5.98 kB
metadata
license: other
model-index:
  - name: Arcee-Nova
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 79.07
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 56.74
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 40.48
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 18.01
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 17.22
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 49.47
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=arcee-ai/Arcee-Nova
          name: Open LLM Leaderboard
Arcee Nova

Arcee-Nova is our highest performing open source model. Evaluated on the same stack as the OpenLLM Leaderboard 2.0, making it the top-performing open source model tested on that stack. Its performance approaches that of GPT-4 from May 2023, marking a significant milestone.

Nova is a merge of Qwen2-72B-Instruct with a custom model tuned on a generalist dataset mixture.

GGUFs available here

Chat with Arcee-Nova here

Capabilities and Use Cases

Arcee-Nova excels across a wide range of language tasks, demonstrating particular strength in:

  1. Reasoning: Solving complex problems and drawing logical conclusions.

  2. Creative Writing: Generating engaging and original content across various genres.

  3. Coding: Assisting with programming tasks, from code generation to debugging.

  4. General Language Understanding: Comprehending and generating human-like text in diverse contexts.

Business Applications

Arcee-Nova can be applied to various business tasks:

  • Customer Service: Implement sophisticated chatbots and virtual assistants.
  • Content Creation: Generate high-quality written content for marketing and documentation.
  • Software Development: Accelerate coding processes and improve code quality.
  • Data Analysis: Enhance data interpretation and generate insightful reports.
  • Research and Development: Assist in literature reviews and hypothesis generation.
  • Legal and Compliance: Automate contract analysis and regulatory compliance checks.
  • Education and Training: Create adaptive learning systems and intelligent tutoring programs.

Evaluations

Arcee Nova OpenLLM-2.0

Acknowledgments

We extend our gratitude to the open source AI community, whose collective efforts have paved the way for Arcee-Nova. Their commitment to transparency and collaboration continues to drive innovation. We also would like to extend our thanks to the Qwen team - without Qwen2-72B this would not be possible.

Future Directions

As we release Arcee-Nova to the public, we look forward to seeing how researchers, developers, and businesses will leverage its capabilities. We remain committed to advancing open source AI technology and invite the community to explore, contribute, and build upon Arcee-Nova.


Note: This README was written with assistance from Arcee-Nova.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 43.50
IFEval (0-Shot) 79.07
BBH (3-Shot) 56.74
MATH Lvl 5 (4-Shot) 40.48
GPQA (0-shot) 18.01
MuSR (0-shot) 17.22
MMLU-PRO (5-shot) 49.47