it run on colab t4
!pip install bitsandbytes
!pip install git+https://github.com/huggingface/transformers.git
!pip install git+https://github.com/huggingface/accelerate.git
Use a pipeline as a high-level helper
from transformers import pipeline
import torch
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="CohereForAI/aya-expanse-8b", device_map="auto", trust_remote_code=True)
pipe(messages)
config.json: 100%
718/718 [00:00<00:00, 38.0kB/s]
model.safetensors.index.json: 100%
21.0k/21.0k [00:00<00:00, 1.29MB/s]
Downloading shards: 100%
4/4 [06:22<00:00, 82.86s/it]
model-00001-of-00004.safetensors: 100%
4.92G/4.92G [01:56<00:00, 41.8MB/s]
model-00002-of-00004.safetensors: 100%
4.92G/4.92G [01:56<00:00, 42.1MB/s]
model-00003-of-00004.safetensors: 100%
5.00G/5.00G [01:58<00:00, 41.8MB/s]
model-00004-of-00004.safetensors: 100%
1.22G/1.22G [00:28<00:00, 42.5MB/s]
Loading checkpoint shards: 100%
4/4 [01:02<00:00, 18.26s/it]
generation_config.json: 100%
137/137 [00:00<00:00, 10.4kB/s]
WARNING:accelerate.big_modeling:Some parameters are on the meta device because they were offloaded to the disk and cpu.
tokenizer_config.json: 100%
16.5k/16.5k [00:00<00:00, 1.28MB/s]
tokenizer.json: 100%
12.8M/12.8M [00:00<00:00, 46.2MB/s]
special_tokens_map.json: 100%
439/439 [00:00<00:00, 19.5kB/s]
Device set to use cuda:0
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'},
{'role': 'assistant',
'content': "I am Coral, an AI-assistant chatbot built by Cohere. I'm designed to engage in"}]}]