Adding Evaluation Results

2e709f7 verified 9 months ago

5.43 kB

	---
	license: apache-2.0
	model-index:
	- name: tigerbot-7b-sft
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 41.64
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 60.56
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 29.89
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 58.18
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 63.54
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 6.29
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TigerResearch/tigerbot-7b-sft
	name: Open LLM Leaderboard
	---
	<div style="width: 100%;">
	<img src="http://x-pai.algolet.com/bot/img/logo_core.png" alt="TigerBot" style="width: 20%; display: block; margin: auto;">
	</div>
	<p align="center">
	<font face="黑体" size=5"> A cutting-edge foundation for your very own LLM. </font>
	</p>
	<p align="center">
	🌐 <a href="https://tigerbot.com/" target="_blank">TigerBot</a> • 🤗 <a href="https://huggingface.co/TigerResearch" target="_blank">Hugging Face</a>
	</p>

	## Github

	https://github.com/TigerResearch/TigerBot

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from accelerate import infer_auto_device_map, dispatch_model
	from accelerate.utils import get_balanced_memory

	tokenizer = AutoTokenizer.from_pretrained("TigerResearch/tigerbot-7b-sft-v1")

	model = AutoModelForCausalLM.from_pretrained("TigerResearch/tigerbot-7b-sft-v1")

	max_memory = get_balanced_memory(model)
	device_map = infer_auto_device_map(model, max_memory=max_memory, no_split_module_classes=["BloomBlock"])
	model = dispatch_model(model, device_map=device_map, offload_buffers=True)

	device = torch.cuda.current_device()


	tok_ins = "\n\n### Instruction:\n"
	tok_res = "\n\n### Response:\n"
	prompt_input = tok_ins + "{instruction}" + tok_res

	input_text = "What is the next number after this list: [1, 2, 3, 5, 8, 13, 21]"
	input_text = prompt_input.format_map({'instruction': input_text})

	max_input_length = 512
	max_generate_length = 1024
	generation_kwargs = {
	"top_p": 0.95,
	"temperature": 0.8,
	"max_length": max_generate_length,
	"eos_token_id": tokenizer.eos_token_id,
	"pad_token_id": tokenizer.pad_token_id,
	"early_stopping": True,
	"no_repeat_ngram_size": 4,
	}

	inputs = tokenizer(input_text, return_tensors='pt', truncation=True, max_length=max_input_length)
	inputs = {k: v.to(device) for k, v in inputs.items()}
	output = model.generate(inputs, generation_kwargs)
	answer = ''
	for tok_id in output[0][inputs['input_ids'].shape[1]:]:
	if tok_id != tokenizer.eos_token_id:
	answer += tokenizer.decode(tok_id)
	print(answer)
	```

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_TigerResearch__tigerbot-7b-sft)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|43.35\|
	\|AI2 Reasoning Challenge (25-Shot)\|41.64\|
	\|HellaSwag (10-Shot) \|60.56\|
	\|MMLU (5-Shot) \|29.89\|
	\|TruthfulQA (0-shot) \|58.18\|
	\|Winogrande (5-shot) \|63.54\|
	\|GSM8k (5-shot) \| 6.29\|