Spaces:

thivav
/

chat_with_pdf_using_zephyr-7b-beta

Sleeping

App Files Files Community

thivav commited on Mar 3

Commit

e88c82c

•

1 Parent(s): 143983f

init commit

Browse files

Files changed (18) hide show

.github/workflows/sync_to_huggingface_space.yml +20 -0
.gitignore +160 -0
README.md +18 -6
app.py +182 -0
data/.gitkeep +0 -0
doc/zephyr-7b.tar.gz +3 -0
img/Zephyr-7b.png +0 -0
models/.gitkeep +0 -0
notebooks/chat_with_pdf_using_zephyr-7b_v1.ipynb +329 -0
notebooks/chat_with_pdf_using_zephyr-7b_v2.ipynb +222 -0
notebooks/chat_with_pdf_using_zephyr-7b_v3.ipynb +314 -0
notebooks/chat_with_pdf_using_zephyr-7b_v4.ipynb +663 -0
notebooks/reference/YT_Mistral_7B_Zephyr_ɒ_Testing.ipynb +0 -0
notebooks/reference/zephyr_7b_beta.ipynb +0 -0
requirements.txt +11 -0
requirements_local.txt +13 -0
runtime.txt +1 -0
src/.gitkeep +0 -0

.github/workflows/sync_to_huggingface_space.yml ADDED Viewed

	@@ -0,0 +1,20 @@

+name: Sync to HuggingFace Space
+on:
+  push:
+    branches: [main]
+  # to run this workflow manually from the Actions tab
+  workflow_dispatch:
+jobs:
+  sync-to-hub:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          lfs: true
+      - name: Push to hub
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: git push --force https://thivav:[email protected]/spaces/thivav/chat_with_pdf_using_zephyr-7b-beta main

.gitignore ADDED Viewed

	@@ -0,0 +1,160 @@

+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+# C extensions
+*.so
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+# Translations
+*.mo
+*.pot
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+# Flask stuff:
+instance/
+.webassets-cache
+# Scrapy stuff:
+.scrapy
+# Sphinx documentation
+docs/_build/
+# PyBuilder
+.pybuilder/
+target/
+# Jupyter Notebook
+.ipynb_checkpoints
+# IPython
+profile_default/
+ipython_config.py
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+# SageMath parsed files
+*.sage.py
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+# Spyder project settings
+.spyderproject
+.spyproject
+# Rope project settings
+.ropeproject
+# mkdocs documentation
+/site
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+# Pyre type checker
+.pyre/
+# pytype static type analyzer
+.pytype/
+# Cython debug symbols
+cython_debug/
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+#.idea/

README.md CHANGED Viewed

@@ -1,12 +1,24 @@
 ---
-title: Chat With Pdf Using Zephyr-7b-beta
-emoji: 🔥
-colorFrom: yellow
-colorTo: pink
 sdk: streamlit
 sdk_version: 1.31.1
 app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Chat With Pdf Using Zephyr-7b-Beta
+emoji: 🗣📢
+colorFrom: red
+colorTo: green
 sdk: streamlit
 sdk_version: 1.31.1
 app_file: app.py
+pinned: true
 ---
+![Zephyr-7b-beta](/img/Zephyr-7b.png)
+# Chat with PDF using Zephyr-7b 🗣📢
+#RAG | #Semantic | #Embedding | #HybridSearch | #EnsembleRetriever | #BAAI-Embeddings
+Chat with pdf using [Zephyr-7b LLM](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
+- [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta?)
+- Zephyr-7b finetuned from [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
+- [Embeddings](https://huggingface.co/BAAI/bge-base-en-v1.5)
+[Chat with PDF using Zephyr-7b Beta - Playground](https://huggingface.co/spaces/thivav/chat_with_pdf_using_zephyr-7b-beta)

app.py ADDED Viewed

	@@ -0,0 +1,182 @@

+import os
+import tempfile
+import streamlit as st
+from langchain.chains import ConversationalRetrievalChain
+from langchain.memory import ConversationBufferMemory
+from langchain.retrievers import EnsembleRetriever
+from langchain.text_splitter import RecursiveCharacterTextSplitter
+from langchain_community.chat_message_histories import StreamlitChatMessageHistory
+from langchain_community.document_loaders import PyPDFLoader
+from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
+from langchain_community.llms import CTransformers
+from langchain_community.retrievers import BM25Retriever
+from langchain_community.vectorstores import Chroma
+from streamlit_extras.add_vertical_space import add_vertical_space
+@st.cache_resource(ttl="1h")
+def get_retriever(pdf_files):
+    """get retriever"""
+    docs = []
+    temp_dir = tempfile.TemporaryDirectory()
+    for pdf_file in pdf_files:
+        temp_pdf_file_path = os.path.join(temp_dir.name, pdf_file.name)
+        with open(temp_pdf_file_path, "wb") as f:
+            f.write(pdf_file.getvalue())
+        loader = PyPDFLoader(temp_pdf_file_path)
+        docs.extend(loader.load())
+    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
+        chunk_size=1500, chunk_overlap=200
+    )
+    chunks = text_splitter.split_documents(docs)
+    # get huggingface token from env secret
+    HF_TOKEN = os.environ.get("HF_TOKEN")
+    # embeddings
+    embeddings = HuggingFaceInferenceAPIEmbeddings(
+        api_key=HF_TOKEN,
+        model_name="BAAI/bge-base-en-v1.5",
+    )
+    # retrieve k
+    k = 5
+    # vector retriever
+    vector_store = Chroma.from_documents(chunks, embeddings)
+    vector_retriever = vector_store.as_retriever(search_kwargs={"k": k})
+    # semantic retriever
+    semantic_retriever = BM25Retriever.from_documents(chunks)
+    semantic_retriever.k = k
+    # ensemble retriever
+    ensemble_retriever = EnsembleRetriever(
+        retrievers=[vector_retriever, semantic_retriever], weights=[0.5, 0.5]
+    )
+    return ensemble_retriever
+@st.cache_resource(ttl="1h")
+def initialize_llm(_retriever):
+    """initialize llm"""
+    # load llm model
+    model_type = "mistral"
+    model_id = "TheBloke/zephyr-7B-beta-GGUF"
+    model_file = "zephyr-7b-beta.Q4_K_S.gguf"
+    config = {
+        "max_new_tokens": 2048,
+        "repetition_penalty": 1.1,
+        "temperature": 1,
+        "top_k": 50,
+        "top_p": 0.9,
+        "stream": True,
+        "context_length": 4096,
+        "gpu_layers": 0,
+        "threads": int(os.cpu_count()),
+    }
+    llm = CTransformers(
+        model=model_id,
+        model_file=model_file,
+        model_type=model_type,
+        config=config,
+        lib="avx2",
+    )
+    chat_history = StreamlitChatMessageHistory()
+    # init chat history memory
+    memory = ConversationBufferMemory(
+        memory_key="chat_history", chat_memory=chat_history, return_messages=True
+    )
+    chain = ConversationalRetrievalChain.from_llm(
+        llm, retriever=_retriever, memory=memory, verbose=False
+    )
+    return chain, chat_history
+def main():
+    """main func"""
+    st.set_page_config(
+        page_title="Talk to PDF using Zephyr-7B-Beta",
+        page_icon="📰",
+        layout="centered",
+        initial_sidebar_state="expanded",
+    )
+    st.header("Talk to PDF files 📰", divider="rainbow")
+    st.subheader(
+        "Enjoy :red[talking] with :green[PDF] files using :sunglasses: Zephyr-7B-Beta"
+    )
+    st.markdown(
+        """
+            * Used the [zephyr-7b-beta.Q4_K_S.gguf](https://huggingface.co/TheBloke/zephyr-7B-alpha-GGUF/blob/main/zephyr-7b-alpha.Q4_K_S.gguf) quantised
+            version of [Zephyr-7B Beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) model
+            from the [TheBloke/zephyr-7B-beta-GGUF](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF) repositry.
+            ___
+        """
+    )
+    st.sidebar.title("Talk to PDF 📰")
+    st.sidebar.markdown(
+        "[Checkout the repository](https://github.com/ThivaV/chat_with_pdf_using_zephyr-7b)"
+    )
+    st.sidebar.markdown(
+        """
+            ### This is a LLM powered chatbot, built using:
+            * [Streamlit](https://streamlit.io)
+            * [LangChain](https://python.langchain.com/)
+            * [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta)
+            * [TheBloke/zephyr-7B-beta-GGUF](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)
+            * [CTransformers](https://github.com/marella/ctransformers)
+            * [Embeddings](https://huggingface.co/BAAI/bge-base-en-v1.5)
+            * [Chroma](https://docs.trychroma.com/?lang=py)
+            ___
+            """
+    )
+    add_vertical_space(2)
+    upload_pdf_files = st.sidebar.file_uploader(
+        "Upload a pdf files 📤", type="pdf", accept_multiple_files=True
+    )
+    if not upload_pdf_files:
+        st.info("👈 :red[Please upload pdf files] ⛔")
+        st.stop()
+    retriever = get_retriever(upload_pdf_files)
+    chain, chat_history = initialize_llm(retriever)
+    # load previous chat history
+    # re-draw the chat history in the chat window
+    for message in chat_history.messages:
+        st.chat_message(message.type).write(message.content)
+    if prompt := st.chat_input("Ask questions"):
+        with st.chat_message("human"):
+            st.markdown(prompt)
+        response = chain.invoke(prompt)
+        with st.chat_message("ai"):
+            st.write(response["answer"])
+if __name__ == "__main__":
+    # init main func
+    main()

data/.gitkeep ADDED Viewed

File without changes

doc/zephyr-7b.tar.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:29b840eef6b47880bff422f5c6cea2fbe19fcd6c3831e78a0e56ec669a8654b0
+size 3402315

img/Zephyr-7b.png ADDED Viewed

models/.gitkeep ADDED Viewed

File without changes

notebooks/chat_with_pdf_using_zephyr-7b_v1.ipynb ADDED Viewed

	@@ -0,0 +1,329 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# HuggingFaceHub API method"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import PyPDFLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import Chroma\n",
+    "from langchain.retrievers import BM25Retriever, EnsembleRetriever\n",
+    "\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough\n",
+    "\n",
+    "from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings\n",
+    "from langchain_community.llms import HuggingFaceHub"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file_path = \"../data/Orca Progressive Learning from Complex.pdf\"\n",
+    "data_file = PyPDFLoader(file_path)\n",
+    "docs = data_file.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Split & Chunk Docs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create chunks\n",
+    "splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)\n",
+    "chunks = splitter.split_documents(docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load Embedder"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "HF_TOKEN = input(\"Enter your HuggingFace Token\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://huggingface.co/BAAI/bge-base-en-v1.5\n",
+    "embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
+    "    api_key=HF_TOKEN, model_name=\"BAAI/bge-base-en-v1.5\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# retrieve k\n",
+    "k = 5"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Vector Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = Chroma.from_documents(chunks, embeddings)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_retriever = vector_store.as_retriever(search_kwargs={\"k\": k})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Semantic Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "semantic_retriever = BM25Retriever.from_documents(chunks)\n",
+    "semantic_retriever.k = k"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Ensemble Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ensemble_retriever = EnsembleRetriever(\n",
+    "    retrievers=[vector_retriever, semantic_retriever],\n",
+    "    weights=[0.5, 0.5]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### LLM"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The class `langchain_community.llms.huggingface_hub.HuggingFaceHub` was deprecated in langchain-community 0.0.21 and will be removed in 0.2.0. Use HuggingFaceEndpoint instead.\n",
+      "  warn_deprecated(\n"
+     ]
+    }
+   ],
+   "source": [
+    "# HuggingFaceH4/zephyr-7b-beta\n",
+    "llm = HuggingFaceHub(\n",
+    "    repo_id=\"HuggingFaceH4/zephyr-7b-beta\", \n",
+    "    model_kwargs={\"temperature\": 0.1, \"max_new_tokens\": 1024},\n",
+    "    huggingfacehub_api_token=HF_TOKEN\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prompting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "template = \"\"\"\n",
+    "<|system|>>\n",
+    "You are a helpful AI Assistant that follows instructions extremely well.\n",
+    "Use the following context to answer user question.\n",
+    "\n",
+    "Think step by step before answering the question. \n",
+    "You will get a $100 tip if you provide correct answer.\n",
+    "\n",
+    "CONTEXT: {context}\n",
+    "</s>\n",
+    "<|user|>\n",
+    "{query}\n",
+    "</s>\n",
+    "<|assistant|>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = ChatPromptTemplate.from_template(template)\n",
+    "output_parser = StrOutputParser()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = (\n",
+    "    {\"context\": ensemble_retriever, \"query\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | output_parser\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "HfHubHTTPError",
+     "evalue": "429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta (Request ID: MB9PO09bJU8rFNr1BbEry)\n\nRate limit reached. You reached free usage limit (reset hourly). Please subscribe to a plan at https://huggingface.co/pricing to use the API at this rate",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[0;31mHTTPError\u001b[0m                                 Traceback (most recent call last)",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:304\u001b[0m, in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m    303\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 304\u001b[0m     \u001b[43mresponse\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mraise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    305\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m HTTPError \u001b[38;5;28;01mas\u001b[39;00m e:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/requests/models.py:1021\u001b[0m, in \u001b[0;36mResponse.raise_for_status\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m   1020\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m http_error_msg:\n\u001b[0;32m-> 1021\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m HTTPError(http_error_msg, response\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m)\n",
+      "\u001b[0;31mHTTPError\u001b[0m: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta",
+      "\nThe above exception was the direct cause of the following exception:\n",
+      "\u001b[0;31mHfHubHTTPError\u001b[0m                            Traceback (most recent call last)",
+      "Cell \u001b[0;32mIn[15], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mchain\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mWhat is instruction tuning?\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/runnables/base.py:2056\u001b[0m, in \u001b[0;36mRunnableSequence.invoke\u001b[0;34m(self, input, config)\u001b[0m\n\u001b[1;32m   2054\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m   2055\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m i, step \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msteps):\n\u001b[0;32m-> 2056\u001b[0m         \u001b[38;5;28minput\u001b[39m \u001b[38;5;241m=\u001b[39m \u001b[43mstep\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43minvoke\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   2057\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2058\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;66;43;03m# mark each step as a child run\u001b[39;49;00m\n\u001b[1;32m   2059\u001b[0m \u001b[43m            \u001b[49m\u001b[43mpatch_config\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m   2060\u001b[0m \u001b[43m                \u001b[49m\u001b[43mconfig\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget_child\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43mf\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mseq:step:\u001b[39;49m\u001b[38;5;132;43;01m{\u001b[39;49;00m\u001b[43mi\u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[38;5;132;43;01m}\u001b[39;49;00m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m   2061\u001b[0m \u001b[43m            \u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m   2062\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   2063\u001b[0m \u001b[38;5;66;03m# finish the root run\u001b[39;00m\n\u001b[1;32m   2064\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:273\u001b[0m, in \u001b[0;36mBaseLLM.invoke\u001b[0;34m(self, input, config, stop, **kwargs)\u001b[0m\n\u001b[1;32m    263\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21minvoke\u001b[39m(\n\u001b[1;32m    264\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    265\u001b[0m     \u001b[38;5;28minput\u001b[39m: LanguageModelInput,\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    269\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    270\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m \u001b[38;5;28mstr\u001b[39m:\n\u001b[1;32m    271\u001b[0m     config \u001b[38;5;241m=\u001b[39m ensure_config(config)\n\u001b[1;32m    272\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m (\n\u001b[0;32m--> 273\u001b[0m         \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate_prompt\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    274\u001b[0m \u001b[43m            \u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_convert_input\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;28;43minput\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m]\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    275\u001b[0m \u001b[43m            \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    276\u001b[0m \u001b[43m            \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcallbacks\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    277\u001b[0m \u001b[43m            \u001b[49m\u001b[43mtags\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mtags\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    278\u001b[0m \u001b[43m            \u001b[49m\u001b[43mmetadata\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mmetadata\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    279\u001b[0m \u001b[43m            \u001b[49m\u001b[43mrun_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconfig\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mrun_name\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    280\u001b[0m \u001b[43m            \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    281\u001b[0m \u001b[43m        \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    282\u001b[0m         \u001b[38;5;241m.\u001b[39mgenerations[\u001b[38;5;241m0\u001b[39m][\u001b[38;5;241m0\u001b[39m]\n\u001b[1;32m    283\u001b[0m         \u001b[38;5;241m.\u001b[39mtext\n\u001b[1;32m    284\u001b[0m     )\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:568\u001b[0m, in \u001b[0;36mBaseLLM.generate_prompt\u001b[0;34m(self, prompts, stop, callbacks, **kwargs)\u001b[0m\n\u001b[1;32m    560\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mgenerate_prompt\u001b[39m(\n\u001b[1;32m    561\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    562\u001b[0m     prompts: List[PromptValue],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    565\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    566\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m    567\u001b[0m     prompt_strings \u001b[38;5;241m=\u001b[39m [p\u001b[38;5;241m.\u001b[39mto_string() \u001b[38;5;28;01mfor\u001b[39;00m p \u001b[38;5;129;01min\u001b[39;00m prompts]\n\u001b[0;32m--> 568\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mgenerate\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt_strings\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcallbacks\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcallbacks\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:741\u001b[0m, in \u001b[0;36mBaseLLM.generate\u001b[0;34m(self, prompts, stop, callbacks, tags, metadata, run_name, **kwargs)\u001b[0m\n\u001b[1;32m    725\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\n\u001b[1;32m    726\u001b[0m             \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mAsked to cache, but no cache found at `langchain.cache`.\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m    727\u001b[0m         )\n\u001b[1;32m    728\u001b[0m     run_managers \u001b[38;5;241m=\u001b[39m [\n\u001b[1;32m    729\u001b[0m         callback_manager\u001b[38;5;241m.\u001b[39mon_llm_start(\n\u001b[1;32m    730\u001b[0m             dumpd(\u001b[38;5;28mself\u001b[39m),\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    739\u001b[0m         )\n\u001b[1;32m    740\u001b[0m     ]\n\u001b[0;32m--> 741\u001b[0m     output \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate_helper\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    742\u001b[0m \u001b[43m        \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43mbool\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mnew_arg_supported\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\n\u001b[1;32m    743\u001b[0m \u001b[43m    \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    744\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m output\n\u001b[1;32m    745\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mlen\u001b[39m(missing_prompts) \u001b[38;5;241m>\u001b[39m \u001b[38;5;241m0\u001b[39m:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:605\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m    603\u001b[0m     \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n\u001b[1;32m    604\u001b[0m         run_manager\u001b[38;5;241m.\u001b[39mon_llm_error(e, response\u001b[38;5;241m=\u001b[39mLLMResult(generations\u001b[38;5;241m=\u001b[39m[]))\n\u001b[0;32m--> 605\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[1;32m    606\u001b[0m flattened_outputs \u001b[38;5;241m=\u001b[39m output\u001b[38;5;241m.\u001b[39mflatten()\n\u001b[1;32m    607\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m manager, flattened_output \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28mzip\u001b[39m(run_managers, flattened_outputs):\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:592\u001b[0m, in \u001b[0;36mBaseLLM._generate_helper\u001b[0;34m(self, prompts, stop, run_managers, new_arg_supported, **kwargs)\u001b[0m\n\u001b[1;32m    582\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_generate_helper\u001b[39m(\n\u001b[1;32m    583\u001b[0m     \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m    584\u001b[0m     prompts: List[\u001b[38;5;28mstr\u001b[39m],\n\u001b[0;32m   (...)\u001b[0m\n\u001b[1;32m    588\u001b[0m     \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs: Any,\n\u001b[1;32m    589\u001b[0m ) \u001b[38;5;241m-\u001b[39m\u001b[38;5;241m>\u001b[39m LLMResult:\n\u001b[1;32m    590\u001b[0m     \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m    591\u001b[0m         output \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m--> 592\u001b[0m             \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_generate\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    593\u001b[0m \u001b[43m                \u001b[49m\u001b[43mprompts\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    594\u001b[0m \u001b[43m                \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    595\u001b[0m \u001b[43m                \u001b[49m\u001b[38;5;66;43;03m# TODO: support multiple run managers\u001b[39;49;00m\n\u001b[1;32m    596\u001b[0m \u001b[43m                \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_managers\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mif\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[43mrun_managers\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01melse\u001b[39;49;00m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mNone\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m    597\u001b[0m \u001b[43m                \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m    598\u001b[0m \u001b[43m            \u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    599\u001b[0m             \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m    600\u001b[0m             \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_generate(prompts, stop\u001b[38;5;241m=\u001b[39mstop)\n\u001b[1;32m    601\u001b[0m         )\n\u001b[1;32m    602\u001b[0m     \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mBaseException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m    603\u001b[0m         \u001b[38;5;28;01mfor\u001b[39;00m run_manager \u001b[38;5;129;01min\u001b[39;00m run_managers:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/language_models/llms.py:1177\u001b[0m, in \u001b[0;36mLLM._generate\u001b[0;34m(self, prompts, stop, run_manager, **kwargs)\u001b[0m\n\u001b[1;32m   1174\u001b[0m new_arg_supported \u001b[38;5;241m=\u001b[39m inspect\u001b[38;5;241m.\u001b[39msignature(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call)\u001b[38;5;241m.\u001b[39mparameters\u001b[38;5;241m.\u001b[39mget(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mrun_manager\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m   1175\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m prompt \u001b[38;5;129;01min\u001b[39;00m prompts:\n\u001b[1;32m   1176\u001b[0m     text \u001b[38;5;241m=\u001b[39m (\n\u001b[0;32m-> 1177\u001b[0m         \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mstop\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mstop\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mrun_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mrun_manager\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m   1178\u001b[0m         \u001b[38;5;28;01mif\u001b[39;00m new_arg_supported\n\u001b[1;32m   1179\u001b[0m         \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_call(prompt, stop\u001b[38;5;241m=\u001b[39mstop, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m   1180\u001b[0m     )\n\u001b[1;32m   1181\u001b[0m     generations\u001b[38;5;241m.\u001b[39mappend([Generation(text\u001b[38;5;241m=\u001b[39mtext)])\n\u001b[1;32m   1182\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m LLMResult(generations\u001b[38;5;241m=\u001b[39mgenerations)\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_community/llms/huggingface_hub.py:135\u001b[0m, in \u001b[0;36mHuggingFaceHub._call\u001b[0;34m(self, prompt, stop, run_manager, **kwargs)\u001b[0m\n\u001b[1;32m    132\u001b[0m _model_kwargs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mmodel_kwargs \u001b[38;5;129;01mor\u001b[39;00m {}\n\u001b[1;32m    133\u001b[0m parameters \u001b[38;5;241m=\u001b[39m {\u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39m_model_kwargs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs}\n\u001b[0;32m--> 135\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpost\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m    136\u001b[0m \u001b[43m    \u001b[49m\u001b[43mjson\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m{\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43minputs\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43mprompt\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mparameters\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m:\u001b[49m\u001b[43m \u001b[49m\u001b[43mparameters\u001b[49m\u001b[43m}\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mtask\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtask\u001b[49m\n\u001b[1;32m    137\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    138\u001b[0m response \u001b[38;5;241m=\u001b[39m json\u001b[38;5;241m.\u001b[39mloads(response\u001b[38;5;241m.\u001b[39mdecode())\n\u001b[1;32m    139\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124merror\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01min\u001b[39;00m response:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/huggingface_hub/inference/_client.py:242\u001b[0m, in \u001b[0;36mInferenceClient.post\u001b[0;34m(self, json, data, model, task, stream)\u001b[0m\n\u001b[1;32m    239\u001b[0m         \u001b[38;5;28;01mraise\u001b[39;00m InferenceTimeoutError(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mInference call timed out: \u001b[39m\u001b[38;5;132;01m{\u001b[39;00murl\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01merror\u001b[39;00m  \u001b[38;5;66;03m# type: ignore\u001b[39;00m\n\u001b[1;32m    241\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 242\u001b[0m     \u001b[43mhf_raise_for_status\u001b[49m\u001b[43m(\u001b[49m\u001b[43mresponse\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m    243\u001b[0m     \u001b[38;5;28;01mreturn\u001b[39;00m response\u001b[38;5;241m.\u001b[39miter_lines() \u001b[38;5;28;01mif\u001b[39;00m stream \u001b[38;5;28;01melse\u001b[39;00m response\u001b[38;5;241m.\u001b[39mcontent\n\u001b[1;32m    244\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m HTTPError \u001b[38;5;28;01mas\u001b[39;00m error:\n",
+      "File \u001b[0;32m/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py:362\u001b[0m, in \u001b[0;36mhf_raise_for_status\u001b[0;34m(response, endpoint_name)\u001b[0m\n\u001b[1;32m    358\u001b[0m     \u001b[38;5;28;01mraise\u001b[39;00m BadRequestError(message, response\u001b[38;5;241m=\u001b[39mresponse) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n\u001b[1;32m    360\u001b[0m \u001b[38;5;66;03m# Convert `HTTPError` into a `HfHubHTTPError` to display request information\u001b[39;00m\n\u001b[1;32m    361\u001b[0m \u001b[38;5;66;03m# as well (request id and/or server error message)\u001b[39;00m\n\u001b[0;32m--> 362\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m HfHubHTTPError(\u001b[38;5;28mstr\u001b[39m(e), response\u001b[38;5;241m=\u001b[39mresponse) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01me\u001b[39;00m\n",
+      "\u001b[0;31mHfHubHTTPError\u001b[0m: 429 Client Error: Too Many Requests for url: https://api-inference.huggingface.co/models/HuggingFaceH4/zephyr-7b-beta (Request ID: MB9PO09bJU8rFNr1BbEry)\n\nRate limit reached. You reached free usage limit (reset hourly). Please subscribe to a plan at https://huggingface.co/pricing to use the API at this rate"
+     ]
+    }
+   ],
+   "source": [
+    "print(chain.invoke(\"What is instruction tuning?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(chain.invoke(\"How does Orca compares to ChatGPT?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "_______________________________________________________"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/chat_with_pdf_using_zephyr-7b_v2.ipynb ADDED Viewed

	@@ -0,0 +1,222 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using HuggingFace Load model directly method\n",
+    "\n",
+    "* AutoModelForCausalLM"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import PyPDFLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import Chroma\n",
+    "from langchain.retrievers import BM25Retriever, EnsembleRetriever\n",
+    "\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough\n",
+    "\n",
+    "from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import AutoModelForCausalLM"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file_path = \"../data/Orca Progressive Learning from Complex.pdf\"\n",
+    "data_file = PyPDFLoader(file_path)\n",
+    "docs = data_file.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create chunks\n",
+    "splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)\n",
+    "chunks = splitter.split_documents(docs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "HF_TOKEN = input(\"Enter your HuggingFace Token\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://huggingface.co/BAAI/bge-base-en-v1.5\n",
+    "embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
+    "    api_key=HF_TOKEN, model_name=\"BAAI/bge-base-en-v1.5\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# retrieve k\n",
+    "k = 5"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = Chroma.from_documents(chunks, embeddings)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_retriever = vector_store.as_retriever(search_kwargs={\"k\": k})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "semantic_retriever = BM25Retriever.from_documents(chunks)\n",
+    "semantic_retriever.k = k"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ensemble_retriever = EnsembleRetriever(\n",
+    "    retrievers=[vector_retriever, semantic_retriever], weights=[0.5, 0.5]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# tokenizer = AutoTokenizer.from_pretrained(\"HuggingFaceH4/zephyr-7b-beta\")\n",
+    "llm = AutoModelForCausalLM.from_pretrained(\n",
+    "    \"HuggingFaceH4/zephyr-7b-beta\", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "template = \"\"\"\n",
+    "<|system|>\n",
+    "You are a helpful AI Assistant that follows instructions extremely well.\n",
+    "Use the following context to answer user question.\n",
+    "\n",
+    "Think step by step before answering the question.\n",
+    "You will get a $100 tip if you provide correct answer.\n",
+    "\n",
+    "CONTEXT: {context}\n",
+    "</s>\n",
+    "<|user|>\n",
+    "{query}\n",
+    "</s>\n",
+    "<|assistant|>\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = ChatPromptTemplate.from_template(template)\n",
+    "output_parser = StrOutputParser()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = (\n",
+    "    {\"context\": ensemble_retriever, \"query\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | output_parser\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(chain.invoke(\"What is instruction tuning?\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/chat_with_pdf_using_zephyr-7b_v3.ipynb ADDED Viewed

	@@ -0,0 +1,314 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using HuggingFace use a pipeline as a high-level helper method\n",
+    "\n",
+    "* from transformers import pipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import PyPDFLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import Chroma\n",
+    "from langchain.retrievers import BM25Retriever, EnsembleRetriever\n",
+    "\n",
+    "from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "from transformers import pipeline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file_path = \"../data/Orca Progressive Learning from Complex.pdf\"\n",
+    "data_file = PyPDFLoader(file_path)\n",
+    "docs = data_file.load()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create chunks\n",
+    "splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)\n",
+    "chunks = splitter.split_documents(docs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "HF_TOKEN = input(\"Enter your HuggingFace Token\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://huggingface.co/BAAI/bge-base-en-v1.5\n",
+    "embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
+    "    api_key=HF_TOKEN, model_name=\"BAAI/bge-base-en-v1.5\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# retrieve k\n",
+    "k = 5"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = Chroma.from_documents(chunks, embeddings)\n",
+    "vector_retriever = vector_store.as_retriever(search_kwargs={\"k\": k})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "semantic_retriever = BM25Retriever.from_documents(chunks)\n",
+    "semantic_retriever.k = k"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ensemble_retriever = EnsembleRetriever(\n",
+    "    retrievers=[vector_retriever, semantic_retriever], weights=[0.5, 0.5]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e7be2fbc6d0b4866b0ec4605ab2919eb",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.\n"
+     ]
+    }
+   ],
+   "source": [
+    "pipe = pipeline(\n",
+    "    \"text-generation\",\n",
+    "    model=\"HuggingFaceH4/zephyr-7b-beta\",\n",
+    "    torch_dtype=torch.bfloat16,\n",
+    "    device_map=\"auto\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<|system|>\n",
+      "You are a friendly chatbot who always responds in the style of a pirate</s>\n",
+      "<|user|>\n",
+      "How many helicopters can a human eat in one sitting?</s>\n",
+      "<|assistant|>\n",
+      "Matey, I'm afraid no human can eat a helicopter, as it's not food. Helicopters are machines used for transportation and other purposes, not a source of nourishment. I'd suggest you stick to eating hearty meals of grog, seafood, and maybe some plundered booty if ya fancy it! Arrrr!\n"
+     ]
+    }
+   ],
+   "source": [
+    "# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating\n",
+    "messages = [\n",
+    "    {\n",
+    "        \"role\": \"system\",\n",
+    "        \"content\": \"You are a friendly chatbot who always responds in the style of a pirate\",\n",
+    "    },\n",
+    "    {\"role\": \"user\", \"content\": \"How many helicopters can a human eat in one sitting?\"},\n",
+    "]\n",
+    "\n",
+    "\n",
+    "prompt = pipe.tokenizer.apply_chat_template(\n",
+    "    messages, tokenize=False, add_generation_prompt=True\n",
+    ")\n",
+    "\n",
+    "\n",
+    "outputs = pipe(\n",
+    "    prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95\n",
+    ")\n",
+    "\n",
+    "print(outputs[0][\"generated_text\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "_____________________"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import textwrap\n",
+    "\n",
+    "def wrap_text(text, width=90):  # preserve_newlines\n",
+    "    # Split the input text into lines based on newline characters\n",
+    "    lines = text.split(\"\\n\")\n",
+    "\n",
+    "    # Wrap each line individually\n",
+    "    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]\n",
+    "\n",
+    "    # Join the wrapped lines back together using newline characters\n",
+    "    wrapped_text = \"\\n\".join(wrapped_lines)\n",
+    "\n",
+    "    return wrapped_text"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate(input_text, system_prompt=\"\", max_length=512):\n",
+    "    if system_prompt != \"\":\n",
+    "        system_prompt = system_prompt\n",
+    "    else:\n",
+    "        system_prompt = (\n",
+    "            \"You are a friendly chatbot who always responds in the style of a pirate\"\n",
+    "        )\n",
+    "    messages = [\n",
+    "        {\n",
+    "            \"role\": \"system\",\n",
+    "            \"content\": system_prompt,\n",
+    "        },\n",
+    "        {\"role\": \"user\", \"content\": input_text},\n",
+    "    ]\n",
+    "\n",
+    "    prompt = pipe.tokenizer.apply_chat_template(\n",
+    "        messages, tokenize=False, add_generation_prompt=True\n",
+    "    )\n",
+    "\n",
+    "    outputs = pipe(\n",
+    "        prompt,\n",
+    "        max_new_tokens=max_length,\n",
+    "        do_sample=True,\n",
+    "        temperature=0.7,\n",
+    "        top_k=50,\n",
+    "        top_p=0.95,\n",
+    "    )\n",
+    "    text = outputs[0][\"generated_text\"]\n",
+    "    text = text.replace(prompt, \"\", 1)\n",
+    "    wrapped_text = wrap_text(text)\n",
+    "    \n",
+    "    print(wrapped_text)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "generate(\n",
+    "    \"\"\"Alice: I don't know why, I'm struggling to maintain focus while studying. Any suggestion? \\n\\n Bob:\"\"\",\n",
+    "    system_prompt=\"You are Zephyr, a LLM that generates great conversations. continue as Bob here\",\n",
+    "    max_length=512,\n",
+    ")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/chat_with_pdf_using_zephyr-7b_v4.ipynb ADDED Viewed

	@@ -0,0 +1,663 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using Zephyr 7B Beta Quantised Model\n",
+    "\n",
+    "* [TheBloke/zephyr-7B-beta-GGUF](https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF)\n",
+    "* Used CTransformers wrapper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install torch==2.2.1\n",
+    "%pip install langchain==0.1.9\n",
+    "%pip install langchain-community==0.0.24\n",
+    "%pip install ctransformers==0.2.27\n",
+    "%pip install streamlit==1.31.1\n",
+    "%pip install streamlit-extras==0.4.0\n",
+    "%pip install langchain==0.1.9\n",
+    "%pip install rank_bm25==0.2.2\n",
+    "%pip install pypdf==4.0.2\n",
+    "%pip install chromadb==0.4.24\n",
+    "%pip install tiktoken==0.6.0"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from langchain_community.llms import CTransformers\n",
+    "from langchain import PromptTemplate, LLMChain"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_type = \"mistral\"\n",
+    "model_id = \"TheBloke/zephyr-7B-beta-GGUF\"\n",
+    "model_file = \"zephyr-7b-beta.Q4_K_S.gguf\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "config = {\n",
+    "    \"max_new_tokens\": 1024,\n",
+    "    \"repetition_penalty\": 1.1,\n",
+    "    \"temperature\": 1,\n",
+    "    \"top_k\": 50,\n",
+    "    \"top_p\": 0.9,\n",
+    "    \"stream\": True,\n",
+    "    \"threads\": int(os.cpu_count() / 2),\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6469959ce27843a6b808f7c92e6b6a74",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "4dd3533c42c94adcb46b36bfbe2748a3",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "76fda961588e489e8d5c749ccb426596",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "zephyr-7b-beta.Q4_K_S.gguf:   0%|          | 0.00/4.14G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Error while downloading from https://cdn-lfs-us-1.huggingface.co/repos/fe/17/fe17596731f84a0d03bece77489780bc7e068323c0aeca88b6393d3e9e65dd49/cafa0b85b2efc15ca33023f3b87f8d0c44ddcace16b3fb608280e0eb8f425cb1?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27zephyr-7b-beta.Q4_K_S.gguf%3B+filename%3D%22zephyr-7b-beta.Q4_K_S.gguf%22%3B&Expires=1709696299&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwOTY5NjI5OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2ZlLzE3L2ZlMTc1OTY3MzFmODRhMGQwM2JlY2U3NzQ4OTc4MGJjN2UwNjgzMjNjMGFlY2E4OGI2MzkzZDNlOWU2NWRkNDkvY2FmYTBiODViMmVmYzE1Y2EzMzAyM2YzYjg3ZjhkMGM0NGRkY2FjZTE2YjNmYjYwODI4MGUwZWI4ZjQyNWNiMT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=JJymveuF19P%7EYsnDVvRKvHbsUjRrNso4dCIEhQ591C6Ponli%7EvQZXE3jKIWNH0ZG%7El1ERzgSns5Qdhx9ImLRLyCtq0szMjeb7eycm%7E8BBpBH3%7EUle4RQoGm1056cJbbOqbiCyTQpFsoRe6N3ivAxTn11BjMY1b-dAmZnWbL%7E%7EyyY3Og7h9YVXX3g%7E-3I5FaWIwv-GTwPPtGiYJGAP23wYFY%7Eax59dAkwC38V9qOwYGTwm1knXNIQhWVxrcykflJos57vJESMntXRc9PFn0BNu0ZXu%7EYd7nBcyk3%7ELOJjsTKHwP76D3guyIuXduUbpBRVGi1kTnjVfdyEvtDRwSIr3Q__&Key-Pair-Id=KCD77M1F0VK2B: HTTPSConnectionPool(host='cdn-lfs-us-1.huggingface.co', port=443): Read timed out.\n",
+      "Trying to resume download...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "c44f1b49b4434d71a9630f5f451be6d5",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "zephyr-7b-beta.Q4_K_S.gguf:   0%|          | 0.00/4.14G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Error while downloading from https://cdn-lfs-us-1.huggingface.co/repos/fe/17/fe17596731f84a0d03bece77489780bc7e068323c0aeca88b6393d3e9e65dd49/cafa0b85b2efc15ca33023f3b87f8d0c44ddcace16b3fb608280e0eb8f425cb1?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27zephyr-7b-beta.Q4_K_S.gguf%3B+filename%3D%22zephyr-7b-beta.Q4_K_S.gguf%22%3B&Expires=1709696299&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwOTY5NjI5OX19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy11cy0xLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2ZlLzE3L2ZlMTc1OTY3MzFmODRhMGQwM2JlY2U3NzQ4OTc4MGJjN2UwNjgzMjNjMGFlY2E4OGI2MzkzZDNlOWU2NWRkNDkvY2FmYTBiODViMmVmYzE1Y2EzMzAyM2YzYjg3ZjhkMGM0NGRkY2FjZTE2YjNmYjYwODI4MGUwZWI4ZjQyNWNiMT9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoifV19&Signature=JJymveuF19P%7EYsnDVvRKvHbsUjRrNso4dCIEhQ591C6Ponli%7EvQZXE3jKIWNH0ZG%7El1ERzgSns5Qdhx9ImLRLyCtq0szMjeb7eycm%7E8BBpBH3%7EUle4RQoGm1056cJbbOqbiCyTQpFsoRe6N3ivAxTn11BjMY1b-dAmZnWbL%7E%7EyyY3Og7h9YVXX3g%7E-3I5FaWIwv-GTwPPtGiYJGAP23wYFY%7Eax59dAkwC38V9qOwYGTwm1knXNIQhWVxrcykflJos57vJESMntXRc9PFn0BNu0ZXu%7EYd7nBcyk3%7ELOJjsTKHwP76D3guyIuXduUbpBRVGi1kTnjVfdyEvtDRwSIr3Q__&Key-Pair-Id=KCD77M1F0VK2B: HTTPSConnectionPool(host='cdn-lfs-us-1.huggingface.co', port=443): Read timed out.\n",
+      "Trying to resume download...\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "6de1cf2f043e4af48b96f6efbbdc7eae",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "zephyr-7b-beta.Q4_K_S.gguf:   0%|          | 0.00/4.14G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "init_model = CTransformers(model=model_id, model_file=model_file, model_type=model_type, **config, lib=\"avx2\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Without Prompt Template"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"what is the meaning of the life ?\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.7 and will be removed in 0.2.0. Use invoke instead.\n",
+      "  warn_deprecated(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "what happens after we die ?\n",
+      "\n",
+      "is there any god or creator ?\n",
+      "\n",
+      "who am I really ?\n",
+      "\n",
+      "these are the questions that have always fascinated human mind and kept us thinking for ages. These questions are so profound, yet simple and so personal. We all have our own answers to these questions, whether in form of religion, spirituality or philosophy, which become a part of our life philosophy as we grow up.\n",
+      "\n",
+      "But there is another dimension where people look beyond the boundaries of these religions and philosophies. They go into a quest for truth that goes deeper than what they have been taught by their religion or philosophy. They start looking within themselves to find the answers. This quest takes them on a journey of self-discovery, which is often referred to as Spirituality.\n",
+      "\n",
+      "Spirituality, at its core, is an intense thirst to know the truth about life and ourselves. It is a longing for connection with something greater than oneself – God or the Universe. The spiritual quest takes us on a journey of self-reflection and discovery where we learn to observe ourselves in our daily lives and situations as they arise. This brings deep insights into our own nature and enables us to let go\n"
+     ]
+    }
+   ],
+   "source": [
+    "result = init_model(query)\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## With Prompt Template"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "template = \"\"\"You are a helpful AI Assistant that follows instructions extremely well.\n",
+    "Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step and answer it faithfully.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = PromptTemplate(template=template, input_variables=[\"question\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = LLMChain(prompt=prompt, llm=init_model, verbose=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "query = \"What is LLM ?\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/mnt/d/repo/experiments/chat_with_pdf_using_zephyr-7b/venv/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `run` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.\n",
+      "  warn_deprecated(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "\n",
+      "\u001b[1m> Entering new LLMChain chain...\u001b[0m\n",
+      "Prompt after formatting:\n",
+      "\u001b[32;1m\u001b[1;3mYou are a helpful AI Assistant that follows instructions extremely well.\n",
+      "Question: What is LLM ?\n",
+      "\n",
+      "Answer: Let's think step by step and answer it faithfully.\n",
+      "\u001b[0m\n",
+      "\n",
+      "\u001b[1m> Finished chain.\u001b[0m\n"
+     ]
+    }
+   ],
+   "source": [
+    "result = chain.run(query)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "LLM stands for Large Language Model. It refers to a type of machine learning algorithm specifically designed to process and generate human-like language, typically in the form of text or speech. These models are called \"large\" because they require vast amounts of training data to learn the complex patterns and relationships within language. The ultimate goal of LLMs is to enable more natural and intuitive interactions between humans and machines through enhanced communication capabilities.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## RAG - Talk to PDF"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from langchain_community.llms import CTransformers\n",
+    "from langchain_community.document_loaders import PyPDFLoader\n",
+    "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+    "from langchain.vectorstores import Chroma\n",
+    "from langchain.retrievers import BM25Retriever, EnsembleRetriever\n",
+    "\n",
+    "from langchain_core.prompts import ChatPromptTemplate\n",
+    "from langchain_core.output_parsers import StrOutputParser\n",
+    "from langchain_core.runnables import RunnablePassthrough\n",
+    "\n",
+    "from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file_path = \"../data/Orca Progressive Learning from Complex.pdf\"\n",
+    "data_file = PyPDFLoader(file_path)\n",
+    "docs = data_file.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Split & Chunk Docs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create chunks\n",
+    "splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=100)\n",
+    "chunks = splitter.split_documents(docs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Load Embedder"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "HF_TOKEN = input(\"Enter your HuggingFace Token\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# https://huggingface.co/BAAI/bge-base-en-v1.5\n",
+    "embeddings = HuggingFaceInferenceAPIEmbeddings(\n",
+    "    api_key=HF_TOKEN, model_name=\"BAAI/bge-base-en-v1.5\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Retrievers"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# retrieve k\n",
+    "k = 5"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Vector Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "vector_store = Chroma.from_documents(chunks, embeddings)\n",
+    "vector_retriever = vector_store.as_retriever(search_kwargs={\"k\": k})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Semantic Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "semantic_retriever = BM25Retriever.from_documents(chunks)\n",
+    "semantic_retriever.k = k"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Ensemble Retriever"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ensemble_retriever = EnsembleRetriever(\n",
+    "    retrievers=[vector_retriever, semantic_retriever], weights=[0.5, 0.5]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Init LLM Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "model_type = \"mistral\"\n",
+    "model_id = \"TheBloke/zephyr-7B-beta-GGUF\"\n",
+    "model_file = \"zephyr-7b-beta.Q4_K_S.gguf\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "config = {\n",
+    "    \"max_new_tokens\": 2048,\n",
+    "    \"repetition_penalty\": 1.1,\n",
+    "    \"temperature\": 1,\n",
+    "    \"top_k\": 50,\n",
+    "    \"top_p\": 0.9,\n",
+    "    \"stream\": True,\n",
+    "    \"context_length\": 4096,\n",
+    "    \"gpu_layers\": 0,\n",
+    "    \"threads\": int(os.cpu_count() / 2),\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "c0281307720f46be8386fb08c0d655ad",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "2974900a3e474614b538b006079881fd",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "llm = CTransformers(\n",
+    "    model=model_id, model_file=model_file, model_type=model_type, config=config, lib=\"avx2\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prompting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "template = \"\"\"You are a helpful AI Assistant that follows instructions extremely well.\n",
+    "Use the following context to answer user question.\n",
+    "\n",
+    "Think step by step before answering the question. \n",
+    "You will get a $100 tip if you provide correct answer.\n",
+    "\n",
+    "Context: {context}\n",
+    "\n",
+    "Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step and answer it faithfully.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt = ChatPromptTemplate.from_template(template)\n",
+    "output_parser = StrOutputParser()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chain = (\n",
+    "    {\"context\": ensemble_retriever, \"question\": RunnablePassthrough()}\n",
+    "    | prompt\n",
+    "    | llm\n",
+    "    | output_parser\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Instruction tuning is a technique that allows pre-trained language models to learn from input (natural language descriptions of the task) and response pairs, for example, \"{\\\"instruction\\\": \\\"Arrange the words in the given sentence to form a grammatically\\ncorrect sentence.\\\", \\\"input\\\": \\\"the quickly brown fox jumped\\\", \\\"output\\\": \\\"the brown\\nfox jumped quickly\\\"} .\". It is commonly used for both language-only and multimodal tasks, such as image captioning and visual question answering. In recent times, many works have adopted instruction tuning to train smaller language models with outputs generated from large foundation models like GPT family. However, these approaches face several challenges, including limited task diversity, query complexity, and small-scale training data that understate the benefits of such methods. The Orca model presented in this thesis addresses these limitations by combining self-supervised learning, reinforcement learning, and instruction tuning to achieve competitive performance on multiple zero-shot benchmarks, reducing the gap with proprietary LLMs like ChatGPT and GPT-4.\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(chain.invoke(\"What is instruction tuning?\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

notebooks/reference/YT_Mistral_7B_Zephyr_ɒ_Testing.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

notebooks/reference/zephyr_7b_beta.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+torch==2.2.1
+langchain==0.1.9
+langchain-community==0.0.24
+ctransformers==0.2.27
+streamlit==1.31.1
+streamlit-extras==0.4.0
+tiktoken==0.6.0
+langchain==0.1.9
+rank_bm25==0.2.2
+pypdf==4.0.2
+chromadb==0.4.24

requirements_local.txt ADDED Viewed

	@@ -0,0 +1,13 @@

+ipykernel
+ipywidgets
+torch==2.2.1
+langchain==0.1.9
+langchain-community==0.0.24
+ctransformers==0.2.27
+streamlit==1.31.1
+streamlit-extras==0.4.0
+tiktoken==0.6.0
+langchain==0.1.9
+rank_bm25==0.2.2
+pypdf==4.0.2
+chromadb==0.4.24

runtime.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ python-3.9.0

src/.gitkeep ADDED Viewed

File without changes