Jellyfish-7B / README.md

Update README.md

75eefc6 verified 5 months ago

7.03 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	# Jellyfish-7B
	<!-- Provide a quick summary of what the model is/does. -->
	<!--
	<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
	-->
	<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>


	## Model Details
	Jellyfish-7B is a large language model equipped with 7 billion parameters.
	We fine-tuned the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using the datasets pertinent to data preprocessing tasks.
	The training data include two parts:
	* Jellyfish-13B training data
	* GPT4 generated reasoning data for data preprocessing tasks.

	Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%.

	More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

	- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
	- Contact: [email protected]
	- Funded by: NEC Corporation, Osaka University
	- Language(s) (NLP): English
	- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
	- Finetuned from model: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
	## Citation

	If you find our work useful, please give us credit by citing:

	```
	@article{zhang2023jellyfish,
	title={Jellyfish: A Large Language Model for Data Preprocessing},
	author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
	journal={arXiv preprint arXiv:2312.01678},
	year={2023}
	}
	```

	## Performance on seen tasks

	\| Task \| Type \| Dataset \| Non-LLM SoTA<sup>1</sup> \| GPT-3.5<sup>2</sup> \| GPT-4<sup>2</sup> \| GPT-4o \| Table-GPT \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|-----------------\|--------\|-------------------\|-----------------\|--------\|--------\|--------\|-----------\|--------------\|--------------\|---------------\|
	\| Error Detection \| Seen \| Adult \| 99.10 \| 99.10 \| 92.01 \| 83.58 \| -- \| 77.40 \| 73.74 \| 99.33 \|
	\| Error Detection \| Seen \| Hospital \| 94.40 \| 97.80 \| 90.74 \| 44.76 \| -- \| 94.51 \| 93.40 \| 95.59 \|
	\| Error Detection \| Unseen \| Flights \| 81.00 \| -- \| 83.48 \| 66.01 \| -- \| 69.15 \| 66.21 \| 82.52 \|
	\| Error Detection \| Unseen \| Rayyan \| 79.00 \| -- \| 81.95 \| 68.53 \| -- \| 75.07 \| 81.06 \| 90.65 \|
	\| Data Imputation \| Seen \| Buy \| 96.50 \| 98.50 \| 100 \| 100 \| -- \| 98.46 \| 98.46 \| 100 \|
	\| Data Imputation \| Seen \| Restaurant \| 77.20 \| 88.40 \| 97.67 \| 90.70 \| -- \| 89.53 \| 87.21 \| 89.53 \|
	\| Data Imputation \| Unseen \| Flipkart \| 68.00 \| -- \| 89.94 \| 83.20 \| -- \| 87.14 \| 87.48 \| 81.68 \|
	\| Data Imputation \| Unseen \| Phone \| 86.70 \| -- \| 90.79 \| 86.78 \| -- \| 86.52 \| 85.68 \| 87.21 \|
	\| Schema Matching \| Seen \| MIMIC-III \| 20.00 \| -- \| 40.00 \| 29.41 \| -- \| 53.33 \| 45.45 \| 40.00 \|
	\| Schema Matching \| Seen \| Synthea \| 38.50 \| 45.20 \| 66.67 \| 6.56 \| -- \| 55.56 \| 47.06 \| 56.00 \|
	\| Schema Matching \| Unseen \| CMS \| 50.00 \| -- \| 19.35 \| 22.22 \| -- \| 42.86 \| 38.10 \| 59.29 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58 \| 63.50 \| 74.21 \| 70.91 \| 70.10 \| 81.69 \| 81.42 \| 81.34 \|
	\| Entity Matching \| Seen \| Beer \| 94.37 \| 100 \| 100 \| 90.32 \| 96.30 \| 100.00 \| 100.00 \| 96.77 \|
	\| Entity Matching \| Seen \| DBLP-ACM \| 98.99 \| 96.60 \| 97.44 \| 95.87 \| 93.80 \| 98.65 \| 98.77 \| 98.98 \|
	\| Entity Matching \| Seen \| DBLP-GoogleScholar\| 95.70 \| 83.80 \| 91.87 \| 90.45 \| 92.40 \| 94.88 \| 95.03 \| 98.51 \|
	\| Entity Matching \| Seen \| Fodors-Zagats \| 100 \| 100 \| 100 \| 93.62 \| 100 \| 100 \| 100 \| 100 \|
	\| Entity Matching \| Seen \| iTunes-Amazon \| 97.06 \| 98.20\| 100 \| 98.18 \| 94.30 \| 96.30 \| 96.30 \| 98.11 \|
	\| Entity Matching \| Unseen \| Abt-Buy \| 89.33 \| -- \| 92.77 \| 78.73 \| -- \| 86.06 \| 88.84 \| 89.58 \|
	\| Entity Matching \| Unseen \| Walmart-Amazon \| 86.89 \| 87.00 \| 90.27 \| 79.19 \| 82.40 \| 84.91 \| 85.24 \| 89.42 \|
	\| Avg \| \| \| 80.44 \| - \| 84.17 \| 72.58 \| - \| 82.74 \| 81.55 \| 86.02 \|

	_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish models, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
	_Accuracy as the metric for data imputation and the F1 score for other tasks._

	1.
	[Ditto](https://arxiv.org/abs/2004.00584) for Entity Matching
	[SMAT](https://www.researchgate.net/publication/353920530_SMAT_An_Attention-Based_Deep_Learning_Solution_to_the_Automation_of_Schema_Matching) for Schema Matching
	[HoloDetect](https://arxiv.org/abs/1904.02285) for Error Detection seen datasets
	[RAHA](https://dl.acm.org/doi/10.1145/3299869.3324956) for Error Detection unseen datasets
	[IPM](https://ieeexplore.ieee.org/document/9458712) for Data Imputation
	2.
	[Large Language Models as Data Preprocessors](https://arxiv.org/abs/2308.16361)

	## Performance on unseen tasks

	### Column Type Annotation

	\| Dataset \| RoBERTa (159 shots)<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 \| GPT-4o \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|--------\|-----------------\|--------\|--------\|--------\|--------------\|--------------\|---------------\|
	\| SOTAB \| 79.20 \| 89.47 \| 91.55 \| 65.05 \| 83 \| 76.33 \| 82 \|

	_Few-shot is disabled for Jellyfish models._

	1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

	### Attribute Value Extraction

	\| Dataset \|Stable Beluga 2 70B<sup>1</sup> \| SOLAR 70B<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 <sup>1</sup>\| GPT-4o \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ----\| ----\| ----\|
	\| AE-110k \| 52.10 \| 49.20 \| 61.30 \| 55.50 \| 55.77 \| 56.09 \|59.55 \| 58.12 \|
	\| OA-Mine \| 50.80 \| 55.20 \| 62.70 \| 68.90 \| 60.20 \| 51.98 \| 59.22 \| 55.96 \|

	_Few-shot is disabled for Jellyfish models._

	1. Results from [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537)


	## Prompt Template
	```
	[INST]:

	<prompt> (without the <>)

	[\INST]]
	```