Edit model card
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Hare-1.1B-base - GGUF

Original model description:

license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - Hare datasets: - cerebras/SlimPajama-627B - HuggingFaceTB/cosmopedia arxiv: 2406.11410

Lite-AI

Hare-1.1B-base

GitHub | 🤖 ModelScope | 📑 ArXiv

Hare-1.1B-base is a pre-trained model developed by the LiteAI Team from China Telecom Guizhou Branch. We use a mix of high-quality open-source data and strategy-generated data as pre-train data. The model is only 1.1B in size and has performed well on the Open LLM Leaderboard.

  • We chose Mistral as the foundational architecture and reused its tokenizer, reducing the number of parameters by adjusting the hyperparameters of its model architecture. Consequently, our model can be directly applied to numerous open-source projects that support Mistral, such as vLLM.

  • Our model has a parameter count of only 1.1 billion, allowing us to deploy it on consumer-grade GPUs, mobile devices, and other cost-effective platforms.

  • We have explored efficient training at FP8 precision and have compiled a set of best practices, hoping to contribute as much as we can to LLM training in the open-source community. For best practices, please see our GitHub homepage.

  • We are currently developing and adapting for Chinese language support.

Hare-1.1B-base是由中国电信股份有限公司贵州分公司LiteAI团队开发的预训练模型。我们使用高质量开源和策略生成的合成数据作为预训练数据。该模型大小仅为1.1B,并在Open LLM Leaderboard上表现优异。

  • 我们选择Mistral架构作为基础框架,并复用了其分词器,通过调整模型架构的超参来减少参数量。因此,我们的模型可以直接应用于许多支持Mistral的开源项目,如vLLM。

  • 我们模型的参数量仅为 11 亿,因此,我们可以将模型部署到消费级显卡、手机端等成本较低的设备上。

  • 我们探索了FP8精度下的高效训练,并总结了一份最佳实践,希望能为开源社区LLM训练作出力所能及的贡献。最佳实践请看GitHub主页。

  • 我们正在研发与适配中文。

Model Details 模型细节

Model Training Tokens Hidden layers Hidden Size Attention Heads Context Length
Hare-1.1B-base ~ 600B 22 2048 32 2048

Model Description 模型说明

  • Developed by: LiteAI Team

  • Institution: China Telecom Guizhou Branch

  • Model size: 1.1B

  • License: Apache 2.0

  • 开发者: LiteAI Team

  • 机构: 中国电信股份有限公司贵州分公司

  • 模型大小: 1.1B

  • 协议: Apache 2.0

Uses 模型使用

Inference 推理

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
model_path = "LiteAI-Team/Hare-1.1B-base"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
model.to(device)

prompt = "Write a poem based on the landscape of Guizhou:"
tokens = tokenizer(prompt, add_special_tokens=True, return_tensors='pt').to(device)
output = model.generate(**tokens,max_new_tokens=128)

output_tokens = output[0].cpu().numpy()[tokens.input_ids.size()[1]:]
output_string = tokenizer.decode(output_tokens)
print(output_string)
>> """The Guizhou landscape is a sight to behold,
A place where nature's beauty is unmatched,
A land of towering mountains and vast plains,
A paradise for those who seek to explore.

The mountains rise high above the sky,
A sight to beholder, a sight to see,
The valleys stretch out as far as the eye can see,
A landscape of endless beauty and grace."""

Install with vllm:

pip install vllm
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer

model_path = "LiteAI-Team/Hare-1.1B-base"
llm = LLM(model=model_path, trust_remote_code=True, tensor_parallel_size=4)

query = "Write a poem based on the landscape of Guizhou:"
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=64)
outputs = llm.generate(query, sampling_params)
print(outputs)

Edge Deployment Demo 端侧部署

Our model has only 1.1 billion parameters, and after Int4 quantization, it occupies just 0.6GB of space, allowing for easy deployment on mobile devices, The Hare-1.1B-Chat model weights have been open-sourced.

  • Android:We chose MLC-LLM as the deployment framework and conducted deployment testing of the Chat model on the Redmi K40.
  • iOS & HarmonyOS:We will conduct deployment testing on the aforementioned devices in the future.

我们的模型参数量仅有1.1B,经Int4量化后,模型仅占用0.6G的空间,可轻松部署在手机端,Hare-1.1B-Chat模型权重已经开源。

  • Android:我们选择MLC-LLM作为部署框架,在Redmi K40上进行Chat模型的部署测试。
  • iOS & HarmonyOS:我们将在未来对上述设备进行部署测试。
    First demo Second demo

Tool Call 工具调用实践

  • To fully leverage the advantages of deploying small models on edge devices, we referred to the work of Octopus-v2 and replaced Gemma-2B with Hare-1.1B-Tool, successfully enabling the invocation of Android system APIs and the orchestration of tool functionalities in composite scenarios on mobile devices.

  • Please click the image below to view.

  • 为完全发挥出小模型在端侧部署上的优势,我们对照Octopus-v2的工作并使用Hare-1.1B-Tool替换Gemma-2B,成功在手机端实现安卓系统API调用和组合场景下的工具调用能力。

  • 请您点击下面图片观看。alt text

Evaluation Results 评测结果

  • Additionally, we conducted explorations and experiments addressing the issue of benchmark data leakage. For a detailed analysis, please refer to our paper.
  • 同时,我们针对benchmark数据泄漏问题做了探索与实验,详细分析请参考我们的论文。 
    Model(base) Size avg MMLU ARC-C TruthfulQA Winogrande Hellaswag GSM8K
    phi-1_5 1.3B 47.69 43.89 52.9 40.89 72.22 63.79 12.43
    Qwen-1.5 1.8B 46.55 46.71 37.88 39.43 60.3 61.42 33.59
    stablelm-2 1.6B 45.25 38.95 43.34 36.78 64.56 70.45 17.44
    Hare 1.1B 40.17 35.74 38.4 42.08 59.27 57.46 8.04
    H2o-danube 1.8B 39.12 25.94 39.42 33.86 64.48 69.58 1.44
    OpenELM 1.1B 38.47 27.05 36.69 33.86 63.22 65.71 1.21
    csg-wukong 1B 37.78 25.33 37.71 42.79 56.67 58.93 5.23
    TinyLlama-3T 1.1B 36.42 26.04 33.87 37.32 59.51 60.31 1.44

License 协议

  • This repository is open-sourced under the Apache-2.0 license.

  • The Hare series model weights are currently fully open only for academic research.

  • 本仓库遵循Apache-2.0协议开源。

  • Hare系列模型权重目前仅对学术研究完全开放。

Statement 声明

  • Hare is a language model trained on a mix of open-source pre-training data and strategy-generated pre-training data. It lacks the ability to make value judgments and cannot understand or express personal opinions. The outputs of the model do not represent the views or positions of the LiteAI development team.

  • Therefore, the content generated using Hare may contain biased viewpoints and inaccuracies. Please use it at your discretion.

  • Similarly, we will not assume any responsibility for risks and issues arising from users deliberately using Hare to generate harmful content.

  • For modifications related to this repository, please contact: zhangly41 At(@) chinatelecom.cn.

  • Team contact information: chensq27 At(@) chinatelecom.cn. The LiteAI Team looks forward to collaborating with you.

  • Hare是一个基于开源预训练数据和策略合成预训练数据混合训练得到的语言模型,它不具备价值判断能力,无法理解、表达个人观点,模型的输出内容不代表LiteAI开发团队的观点与立场。

  • 因此,您使用Hare生成的内容可能存有偏观点和不实情况,请您酌情使用。

  • 同样,我们将不承担用户故意使用Hare进行有害内容生成所带来的任何风险与问题。

  • 如涉及到本仓库的修改,请联系:zhangly41 At(@) chinatelecom.cn。

  • 团队联系方式:chensq27 At(@) chinatelecom.cn,LiteAI团队期待您的合作。

Citation 工作引用

  • If you find Hare helpful for your work, please consider citing our paper.
  • 如果您觉得Hare对您的工作起到了帮助,请考虑引用我们的论文
@misc{zhang2024harehumanpriorskey,
      title={HARE: HumAn pRiors, a key to small language model Efficiency}, 
      author={Lingyun Zhang and Bin jin and Gaojian Ge and Lunhui Liu and Xuewen Shen and Mingyong Wu and Houqian Zhang and Yongneng Jiang and Shiqi Chen and Shi Pu},
      year={2024},
      eprint={2406.11410},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
      url={https://arxiv.org/abs/2406.11410}, 
}
Downloads last month
124
GGUF
Model size
1.12B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .