Sumo-T9-7B-v0.1 / README.md
tensorplex-labs's picture
Update README.md
3650c29 verified
|
raw
history blame
6.75 kB
metadata
language:
  - en
license: mit
library_name: transformers
tags:
  - pretrained
  - 7B
  - English
  - text-generation
  - base-model
  - bittensor
  - decentralized AI
datasets:
  - tiiuae/falcon-refinedweb

Sumo-T9-7B-v0.1

image/png

Tensorplex Labs Unveils Sumo-T9-7B: Beating Notable 7b Pretrained Models

Tensorplex Labs is proud to announce that its latest top-performing model on Bittensor Subnet 9, Sumo-T9-7B, has outperformed notable models such as TII Falcon 7B and Meta's Llama-2-7b-hf. This achievement highlights the potential of decentralized networks like Bittensor and underscores Tensorplex Labs' commitment to advancing open-source AI technologies.

"Sumo" represents the family of models developed by Tensorplex, and "T9" designates the top-performing model specifically trained for Bittensor Subnet 9.

Bittensor Subnet 9 serves a unique role within the Bittensor ecosystem by rewarding miners who produce pretrained foundational models on the Falcon Refined Web dataset. This subnet functions as a continuous benchmark, where miners are incentivized to achieve the best performance metrics using a model under the parameter limit. The competitive nature of Subnet 9 drives rapid advancements and refinements in large language model training.

Since the parameter limit was upgraded to 7 billion on April 19, 2024, Tensorplex Labs has published the top-performing model, surpassing the performance of notable models such as Falcon 7B and Llama 2 7B within less than a month.

Model Details

Model Description

  • Developed by: Tensorplex Labs
  • Model type: Pretrained Foundational Language Model
  • Language(s) (NLP): Primarily English
  • License: MIT
  • Architecture: Adopted Llama-style architecture with 6.9 billion parameters
  • Training Data: Trained on the tiiuae/falcon-refinedweb dataset
  • Training Objective: Causal Language Modeling (next token prediction)
  • Original Model Repo: tensorplex-labs/pretraining-sn9-7B-1

Sumo-T9-7B-v0.1 features a larger vocabulary size (100k), compatible with the GPT-4 tokenizer, ensuring its versatility across various natural language processing tasks.

This is a pretrained base model, which hasn't been aligned yet. Use with caution or finetune further on downstream tasks before deployment.

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "tensorplex-labs/Sumo-T9-7B-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
)
sequences = pipeline(
   "What is Yokozuna?",
    max_length=256,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    bos_token_id=tokenizer.bos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Training Details

Training Data

This model has been trained with tiiuae/falcon-refinedweb dataset, and still ongoing continuously.

Evaluation

Sumo-T9-7B-v0.1 has outperformed notable models such as TII Falcon 7B, Meta's Llama-2-7b and Llama-1-7b in zero-shot performance, establishing itself as the leading model in aggregate across various evaluation tasks. Such benchmarks include ARC Challenge, GSM8K, HellaSwag, MMLU, TruthfulQA, and Winogrande.

avg arc_challenge gsm8k hellaswag mmlu truthfulqa_mc2 winogrande
meta-llama/Meta-Llama-3-8B 0.6009 0.5333 0.4913 0.7906 0.621 0.4392 0.7301
tensorplex-labs/Sumo-Qyuu-7B-v0.1 0.4769 0.4753 0.1031 0.7666 0.4426 0.3723 0.7017
meta-llama/Llama-2-7b-hf 0.473 0.4625 0.1213 0.7597 0.4123 0.3896 0.693
huggyllama/llama-7b 0.4386 0.4471 0.0849 0.7621 0.2973 0.3408 0.6993
tiiuae/falcon-7b 0.4189 0.4343 0.0432 0.7636 0.2582 0.3428 0.6717

Future Plans

Tensorplex Labs will continue pushing the limits of what is possible on Subnet 9, and will also work on fine-tuning state-of-the-art models for Web3 domain-specific use-cases.

One of the most ambitious projects is the development of a new data collection subnet. This will enable open and incentivized contributions of intelligence from a diverse pool of participants. The subnet will function as a collaborative platform where individuals can provide human preference or training data, which will be used to train, fine-tune, and evaluate AI models and miners across various subnets on Bittensor.

About Tensorplex Labs

Tensorplex Labs is an AI and Web3 startup that is building the decentralized AI of the future. The company’s mission is to decentralize AI, democratize access to data and intelligence, and build a more open, transparent, and equitable future for AI. Tensorplex Labs develops open-source capital and intelligence infrastructure and applications designed to grow decentralized AI, Web3, and crypto ecosystems by making them more capital efficient, intelligent, and trustworthy. The company is currently developing a novel way to better incentivize human input to train AI models, opening up more access to new pools of human contributors with new income opportunities. Founded in 2023 with headquarters in Singapore, Tensorplex Labs’ investors include Canonical Crypto, Collab+Currency, and Digital Currency Group among several others. For more information, visit Tensorplex.

Model Card Authors

Model Card Contact