metadata

language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
  - trl
  - sft
base_model: unsloth/mistral-7b-v0.2-bnb-4bit
datasets:
  - visheratin/realworldqa

Mistral-RealworldQA-v0.2-7b SFT

GGUFs can be found here

An experiment with the goal of reducing hallucinations in VQA

First in a series of experiments centering around fine-tuning for image captioning.

Release Notes

v0.1 - Initial Release
v0.2 (Current)- Updating base model to official Mistral-7b fp16 release, refinements to dataset and instruction formating

Background & Methodology

Mistral-7b-02 base model was fine-tuned using the RealWorldQA dataset, originally provided by the X.Ai Team here: https://x.ai/blog/grok-1.5v

Vision Results

Example 1 Example 2

Experiment yielded model that provides shorter, less verbose output for questions about pictures
The likelihood of hallucinations in output has decreased, however, the model can still be easily influenced to be inaccurate by the user
Best suited for captioning use cases that require concise descriptions and low token counts
This model lacks the conversational prose of Excalibur-7b-DPO and is much "drier" in tone

Requires additional mmproj file. You have two options for vision functionality (available inside this repo):

Select the gguf file of your choice in Koboldcpp as usual, then make sure to choose the mmproj file above in the LLaVA mmproj field of the model submenu:

Prompt Format

Use Alpaca for best results.

Other info

Developed by: InferenceIllusionist
License: apache-2.0
Finetuned from model : mistral-community/Mistral-7B-v0.2

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.