|
--- |
|
base_model: EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- llama |
|
- trl |
|
--- |
|
|
|
# Agent LLama |
|
|
|
Experimental and revolutionary fine-tune with DPO dataset to allow LLama 3.1 8B to be agentic coder. It fine tuned with code dataset for Coder Agent. |
|
It has some build-in agent features: |
|
- search |
|
- calculator |
|
- ReAct. [Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) |
|
- fine tuned ReAct for better responses |
|
|
|
Other noticable features: |
|
- Self learning using unsloth. (in progress) |
|
- can be used in RAG applications |
|
- Memory. [**please use Langchain memory , section Message persistence**](https://python.langchain.com/docs/tutorials/chatbot/) |
|
|
|
It is perfectly use for Langchain or LLamaIndex. |
|
|
|
Context Window: 128K |
|
|
|
### Installation |
|
```bash |
|
pip install --upgrade "transformers>=4.43.2" torch==2.3.1 accelerate vllm==0.5.3.post1 |
|
``` |
|
|
|
Developers can easily integrate EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K into their projects using popular libraries like Transformers and vLLM. The following sections illustrate the usage with simple hands-on examples: |
|
|
|
Optional: to use build in tool, please add to system prompt: "Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n" |
|
|
|
#### ToT - Tree of Thought |
|
- Use system prompt: |
|
```python |
|
"Imagine three different experts are answering this question. |
|
All experts will write down 1 step of their thinking, |
|
then share it with the group. |
|
Then all experts will go on to the next step, etc. |
|
If any expert realises they're wrong at any point then they leave. |
|
The question is..." |
|
``` |
|
#### ReAct |
|
example from langchain agent - [langchain React agent](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/agents/react/agent.py) |
|
- Use system prompt: |
|
```python |
|
""" |
|
Answer the following questions as best you can. You have access to the following tools: |
|
|
|
{tools} |
|
|
|
Use the following format: |
|
|
|
Question: the input question you must answer |
|
Thought: you should always think about what to do |
|
Action: the action to take, should be one of [{tool_names}] |
|
Action Input: the input to the action |
|
Observation: the result of the action |
|
... (this Thought/Action/Action Input/Observation can repeat N times) |
|
Thought: I now know the final answer |
|
Final Answer: the final answer to the original input question |
|
|
|
Begin! |
|
|
|
Question: {input} |
|
Thought:{agent_scratchpad} |
|
""" |
|
``` |
|
|
|
### Conversational Use-case |
|
#### Use with [Transformers](https://github.com/huggingface/transformers) |
|
##### Using `transformers.pipeline()` API , best use for 4bit for fast response. |
|
```python |
|
import transformers |
|
import torch |
|
from langchain_community.llms import HuggingFaceEndpoint |
|
from langchain_community.chat_models.huggingface import ChatHuggingFace |
|
|
|
from transformers import BitsAndBytesConfig |
|
|
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_quant_type="nf4", |
|
bnb_4bit_compute_dtype="float16", |
|
bnb_4bit_use_double_quant=True, |
|
) |
|
|
|
model_id = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code" |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"quantization_config": quantization_config}, #for fast response. For full 16bit inference, remove this code. |
|
device_map="auto", |
|
) |
|
messages = [ |
|
{"role": "system", "content": """ |
|
Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n |
|
You are a coding assistant with expert with everything\n |
|
Ensure any code you provide can be executed \n |
|
with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n |
|
write only the code. do not print anything else.\n |
|
debug code if error occurs. \n |
|
Here is the user question: {question} |
|
"""}, |
|
{"role": "user", "content": "Create a bar plot showing the market capitalization of the top 7 publicly listed companies using matplotlib"} |
|
] |
|
outputs = pipeline(messages, max_new_tokens=128, do_sample=True, temperature=0.01, top_k=100, top_p=0.95) |
|
print(outputs[0]["generated_text"][-1]) |
|
``` |
|
|
|
# Example: |
|
Please go to Colab for sample of the code using Langchain [Colab](https://colab.research.google.com/drive/129SEHVRxlr24r73yf34BKnIHOlD3as09?authuser=1) |
|
|
|
# Unsloth Fast |
|
|
|
```python |
|
%%capture |
|
# Installs Unsloth, Xformers (Flash Attention) and all other packages! |
|
!pip install unsloth |
|
# Get latest Unsloth |
|
!pip install --upgrade --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
|
!pip install langchain_experimental |
|
|
|
from unsloth import FastLanguageModel |
|
from google.colab import userdata |
|
|
|
|
|
# 4bit pre quantized models we support for 4x faster downloading + no OOMs. |
|
fourbit_models = [ |
|
"unsloth/mistral-7b-instruct-v0.2-bnb-4bit", |
|
"unsloth/gemma-7b-it-bnb-4bit", |
|
] # More models at https://huggingface.co/unsloth |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name = "EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code", |
|
max_seq_length = 128000, |
|
load_in_4bit = True, |
|
token =userdata.get('HF_TOKEN') |
|
) |
|
def chatbot(query): |
|
messages = [ |
|
{"from": "system", "value": |
|
""" |
|
Environment: ipython. Tools: brave_search, wolfram_alpha. Cutting Knowledge Date: December 2023. Today Date: 4 October 2024\n |
|
You are a coding assistant with expert with everything\n |
|
Ensure any code you provide can be executed \n |
|
with all required imports and variables defined. List the imports. Structure your answer with a description of the code solution. \n |
|
write only the code. do not print anything else.\n |
|
use ipython for search tool. \n |
|
debug code if error occurs. \n |
|
Here is the user question: {question} |
|
""" |
|
}, |
|
{"from": "human", "value": query}, |
|
] |
|
inputs = tokenizer.apply_chat_template(messages, tokenize = True, add_generation_prompt = True, return_tensors = "pt").to("cuda") |
|
|
|
text_streamer = TextStreamer(tokenizer) |
|
_ = model.generate(input_ids = inputs, streamer = text_streamer, max_new_tokens = 2048, use_cache = True) |
|
``` |
|
|
|
|
|
|
|
# Execute code (Make sure to use virtual environments) |
|
```bash |
|
python3 -m venv env |
|
source env/bin/activate |
|
``` |
|
|
|
## Execution code responses from Llama |
|
#### Please use execute python code function for local. For langchain, please use Python REPL() to execute code |
|
|
|
execute code funciton locally in python: |
|
```python |
|
def execute_Python_code(code): |
|
# A string stream to capture the outputs of exec |
|
output = io.StringIO() |
|
try: |
|
# Redirect stdout to the StringIO object |
|
with contextlib.redirect_stdout(output): |
|
# Allow imports |
|
exec(code, globals()) |
|
except Exception as e: |
|
# If an error occurs, capture it as part of the output |
|
print(f"Error: {e}", file=output) |
|
return output.getvalue() |
|
``` |
|
|
|
Langchain python Repl |
|
- Install |
|
|
|
```bash |
|
!pip install langchain_experimental |
|
``` |
|
|
|
Code: |
|
```python |
|
from langchain_core.tools import Tool |
|
from langchain_experimental.utilities import PythonREPL |
|
|
|
python_repl = PythonREPL() |
|
|
|
# You can create the tool to pass to an agent |
|
repl_tool = Tool( |
|
name="python_repl", |
|
description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.", |
|
func=python_repl.run, |
|
) |
|
repl_tool(outputs[0]["generated_text"][-1]) |
|
``` |
|
|
|
# Safety inputs/ outputs procedures |
|
Fo all inputs, please use Llama-Guard: meta-llama/Llama-Guard-3-8B for safety classification. |
|
Go to model card [Llama-Guard](https://huggingface.co/meta-llama/Llama-Guard-3-8B) |
|
|
|
|
|
|
|
# Uploaded model |
|
|
|
- **Developed by:** EpistemeAI |
|
- **License:** apache-2.0 |
|
- **Finetuned from model :** EpistemeAI/Fireball-Meta-Llama-3.1-8B-Instruct-Agent-0.003-128K-code |
|
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |
|
|