metadata
license: cc-by-nc-4.0
language:
- en
Jellyfish-7B
Model Details
Jellyfish-7B is a large language model equipped with 7 billion parameters.
We fine-tuned the mistralai/Mistral-7B-Instruct-v0.2 model using
a subset of the Jellyfish-Instruct
Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%.
More details about the model can be found in the Jellyfish paper.
- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
- Contact: [email protected]
- Funded by: NEC Corporation, Osaka University
- Language(s) (NLP): English
- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
- Finetuned from model: mistralai/Mistral-7B-Instruct-v0.2
Citation
If you find our work useful, please give us credit by citing:
@article{zhang2023jellyfish,
title={Jellyfish: A Large Language Model for Data Preprocessing},
author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
journal={arXiv preprint arXiv:2312.01678},
year={2023}
}
Performance on seen tasks
Task | Type | Dataset | Non-LLM SoTA1 | GPT-3.52 | GPT-42 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
---|---|---|---|---|---|---|---|---|---|---|
Error Detection | Seen | Adult | 99.10 | 99.10 | 92.01 | 83.58 | -- | 77.40 | 73.74 | 99.33 |
Error Detection | Seen | Hospital | 94.40 | 97.80 | 90.74 | 44.76 | -- | 94.51 | 93.40 | 95.59 |
Error Detection | Unseen | Flights | 81.00 | -- | 83.48 | 66.01 | -- | 69.15 | 66.21 | 82.52 |
Error Detection | Unseen | Rayyan | 79.00 | -- | 81.95 | 68.53 | -- | 75.07 | 81.06 | 90.65 |
Data Imputation | Seen | Buy | 96.50 | 98.50 | 100 | 100 | -- | 98.46 | 98.46 | 100 |
Data Imputation | Seen | Restaurant | 77.20 | 88.40 | 97.67 | 90.70 | -- | 89.53 | 87.21 | 89.53 |
Data Imputation | Unseen | Flipkart | 68.00 | -- | 89.94 | 83.20 | -- | 87.14 | 87.48 | 81.68 |
Data Imputation | Unseen | Phone | 86.70 | -- | 90.79 | 86.78 | -- | 86.52 | 85.68 | 87.21 |
Schema Matching | Seen | MIMIC-III | 20.00 | -- | 40.00 | 29.41 | -- | 53.33 | 45.45 | 40.00 |
Schema Matching | Seen | Synthea | 38.50 | 45.20 | 66.67 | 6.56 | -- | 55.56 | 47.06 | 56.00 |
Schema Matching | Unseen | CMS | 50.00 | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | 59.29 |
Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 74.21 | 70.91 | 70.10 | 81.69 | 81.42 | 81.34 |
Entity Matching | Seen | Beer | 94.37 | 100 | 100 | 90.32 | 96.30 | 100.00 | 100.00 | 96.77 |
Entity Matching | Seen | DBLP-ACM | 98.99 | 96.60 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | 98.98 |
Entity Matching | Seen | DBLP-GoogleScholar | 95.70 | 83.80 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | 98.51 |
Entity Matching | Seen | Fodors-Zagats | 100 | 100 | 100 | 93.62 | 100 | 100 | 100 | 100 |
Entity Matching | Seen | iTunes-Amazon | 97.06 | 98.20 | 100 | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 |
Entity Matching | Unseen | Abt-Buy | 89.33 | -- | 92.77 | 78.73 | -- | 86.06 | 88.84 | 89.58 |
Entity Matching | Unseen | Walmart-Amazon | 86.89 | 87.00 | 90.27 | 79.19 | 82.40 | 84.91 | 85.24 | 89.42 |
Avg | 80.44 | - | 84.17 | 72.58 | - | 82.74 | 81.55 | 86.02 |
For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish models, the few-shot approach is disabled on seen datasets and enabled on unseen datasets.
Accuracy as the metric for data imputation and the F1 score for other tasks.
- Ditto for Entity Matching
SMAT for Schema Matching
HoloDetect for Error Detection seen datasets
RAHA for Error Detection unseen datasets
IPM for Data Imputation - Large Language Models as Data Preprocessors
Performance on unseen tasks
Column Type Annotation
Dataset | RoBERTa (159 shots)1 | GPT-3.51 | GPT-4 | GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
---|---|---|---|---|---|---|---|
SOTAB | 79.20 | 89.47 | 91.55 | 65.05 | 83 | 76.33 | 82 |
Few-shot is disabled for Jellyfish models.
- Results from Column Type Annotation using ChatGPT
Attribute Value Extraction
Dataset | Stable Beluga 2 70B1 | SOLAR 70B1 | GPT-3.51 | GPT-4 1 | GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B |
---|---|---|---|---|---|---|---|---|
AE-110k | 52.10 | 49.20 | 61.30 | 55.50 | 55.77 | 56.09 | 59.55 | 58.12 |
OA-Mine | 50.80 | 55.20 | 62.70 | 68.90 | 60.20 | 51.98 | 59.22 | 55.96 |
Few-shot is disabled for Jellyfish models.
Prompt Template
[INST]:
<prompt> (without the <>)
[\INST]]