A Reproduction of OpenLLaMA using 128 H100 GPUs in Bfloat16.

The pretrain data consists of Falcon, Starcoder, and the wikipedia, arxiv, books, stackexchange from RedPajama. In total, this encompassed nearly 1 trillion tokens.

The model was trained over a single epoch, incorporating 2000 warm-up steps and a cosine learning rate schedule, starting at 3e-5 with 4M batch size.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	47.09
AI2 Reasoning Challenge (25-Shot)	46.16
HellaSwag (10-Shot)	76.40
MMLU (5-Shot)	42.82
TruthfulQA (0-shot)	36.65
Winogrande (5-shot)	70.88
GSM8k (5-shot)	9.63

Downloads last month: 1,257

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train itsliupeng/openllama-7b-base

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

46.160
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

76.400
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

42.820
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

36.650
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

70.880
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

9.630

View on Papers With Code