metadata
language:
- zh
- en
pipeline_tag: text-generation
inference: false
使用方式(int8)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto", trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit")
messages = []
messages.append({"role": "user", "content": "世界上第二高的山峰是哪座"})
response = model.chat(tokenizer, messages)
print(response)
如需使用 int4 量化 (Similarly, to use int4 quantization):
model = AutoModelForCausalLM.from_pretrained("trillionmonster/Baichuan-13B-Chat-8bit", device_map="auto",load_in_4bit=True,trust_remote_code=True)