Jellyfish-7B / README.md

Update README.md

6d4ddfd verified 3 months ago

11.7 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	---
	# Jellyfish-7B
	<!-- Provide a quick summary of what the model is/does. -->
	<!--
	<img src="https://i.imgur.com/d8Bl04i.png" alt="PicToModel" width="330"/>
	-->
	<img src="https://i.imgur.com/E1vqCIw.png" alt="PicToModel" width="330"/>

	Other versions of Jellyfish:
	[Jellyfish-8B](https://huggingface.co/NECOUDBFM/Jellyfish-8B)
	[Jellyfish-13B](https://huggingface.co/NECOUDBFM/Jellyfish-13B)

	## Model Details
	Jellyfish-7B is a large language model equipped with 7 billion parameters.
	We fine-tuned the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using
	a subset of the [Jellyfish-Instruct](https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct)


	Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%.

	More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678).

	- Developed by: Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada
	- Contact: [email protected]
	- Funded by: NEC Corporation, Osaka University
	- Language(s) (NLP): English
	- License: Non-Commercial Creative Commons license (CC BY-NC-4.0)
	- Finetuned from model: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
	## Citation

	If you find our work useful, please give us credit by citing:

	```
	@article{zhang2023jellyfish,
	title={Jellyfish: A Large Language Model for Data Preprocessing},
	author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi},
	journal={arXiv preprint arXiv:2312.01678},
	year={2023}
	}
	```

	## Performance on seen tasks

	\| Task \| Type \| Dataset \| Non-LLM SoTA<sup>1</sup> \| GPT-3.5<sup>2</sup> \| GPT-4<sup>2</sup> \| GPT-4o \| Table-GPT \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|-----------------\|--------\|-------------------\|-----------------\|--------\|--------\|--------\|-----------\|--------------\|--------------\|---------------\|
	\| Error Detection \| Seen \| Adult \| 99.10 \| 99.10 \| 92.01 \| 83.58 \| -- \| 77.40 \| 73.74 \| 99.33 \|
	\| Error Detection \| Seen \| Hospital \| 94.40 \| 97.80 \| 90.74 \| 44.76 \| -- \| 94.51 \| 93.40 \| 95.59 \|
	\| Error Detection \| Unseen \| Flights \| 81.00 \| -- \| 83.48 \| 66.01 \| -- \| 69.15 \| 66.21 \| 82.52 \|
	\| Error Detection \| Unseen \| Rayyan \| 79.00 \| -- \| 81.95 \| 68.53 \| -- \| 75.07 \| 81.06 \| 90.65 \|
	\| Data Imputation \| Seen \| Buy \| 96.50 \| 98.50 \| 100 \| 100 \| -- \| 98.46 \| 98.46 \| 100 \|
	\| Data Imputation \| Seen \| Restaurant \| 77.20 \| 88.40 \| 97.67 \| 90.70 \| -- \| 89.53 \| 87.21 \| 89.53 \|
	\| Data Imputation \| Unseen \| Flipkart \| 68.00 \| -- \| 89.94 \| 83.20 \| -- \| 87.14 \| 87.48 \| 81.68 \|
	\| Data Imputation \| Unseen \| Phone \| 86.70 \| -- \| 90.79 \| 86.78 \| -- \| 86.52 \| 85.68 \| 87.21 \|
	\| Schema Matching \| Seen \| MIMIC-III \| 20.00 \| -- \| 40.00 \| 29.41 \| -- \| 53.33 \| 45.45 \| 40.00 \|
	\| Schema Matching \| Seen \| Synthea \| 38.50 \| 45.20 \| 66.67 \| 6.56 \| -- \| 55.56 \| 47.06 \| 56.00 \|
	\| Schema Matching \| Unseen \| CMS \| 50.00 \| -- \| 19.35 \| 22.22 \| -- \| 42.86 \| 38.10 \| 59.29 \|
	\| Entity Matching \| Seen \| Amazon-Google \| 75.58 \| 63.50 \| 74.21 \| 70.91 \| 70.10 \| 81.69 \| 81.42 \| 81.34 \|
	\| Entity Matching \| Seen \| Beer \| 94.37 \| 100 \| 100 \| 90.32 \| 96.30 \| 100.00 \| 100.00 \| 96.77 \|
	\| Entity Matching \| Seen \| DBLP-ACM \| 98.99 \| 96.60 \| 97.44 \| 95.87 \| 93.80 \| 98.65 \| 98.77 \| 98.98 \|
	\| Entity Matching \| Seen \| DBLP-GoogleScholar\| 95.70 \| 83.80 \| 91.87 \| 90.45 \| 92.40 \| 94.88 \| 95.03 \| 98.51 \|
	\| Entity Matching \| Seen \| Fodors-Zagats \| 100 \| 100 \| 100 \| 93.62 \| 100 \| 100 \| 100 \| 100 \|
	\| Entity Matching \| Seen \| iTunes-Amazon \| 97.06 \| 98.20\| 100 \| 98.18 \| 94.30 \| 96.30 \| 96.30 \| 98.11 \|
	\| Entity Matching \| Unseen \| Abt-Buy \| 89.33 \| -- \| 92.77 \| 78.73 \| -- \| 86.06 \| 88.84 \| 89.58 \|
	\| Entity Matching \| Unseen \| Walmart-Amazon \| 86.89 \| 87.00 \| 90.27 \| 79.19 \| 82.40 \| 84.91 \| 85.24 \| 89.42 \|
	\| Avg \| \| \| 80.44 \| - \| 84.17 \| 72.58 \| - \| 82.74 \| 81.55 \| 86.02 \|

	_For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish models, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._
	_Accuracy as the metric for data imputation and the F1 score for other tasks._

	1.
	[Ditto](https://arxiv.org/abs/2004.00584) for Entity Matching
	[SMAT](https://www.researchgate.net/publication/353920530_SMAT_An_Attention-Based_Deep_Learning_Solution_to_the_Automation_of_Schema_Matching) for Schema Matching
	[HoloDetect](https://arxiv.org/abs/1904.02285) for Error Detection seen datasets
	[RAHA](https://dl.acm.org/doi/10.1145/3299869.3324956) for Error Detection unseen datasets
	[IPM](https://ieeexplore.ieee.org/document/9458712) for Data Imputation
	2.
	[Large Language Models as Data Preprocessors](https://arxiv.org/abs/2308.16361)

	## Performance on unseen tasks

	### Column Type Annotation

	\| Dataset \| RoBERTa (159 shots)<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 \| GPT-4o \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\|--------\|-----------------\|--------\|--------\|--------\|--------------\|--------------\|---------------\|
	\| SOTAB \| 79.20 \| 89.47 \| 91.55 \| 65.05 \| 83 \| 76.33 \| 82 \|

	_Few-shot is disabled for Jellyfish models._

	1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745)

	### Attribute Value Extraction

	\| Dataset \|Stable Beluga 2 70B<sup>1</sup> \| SOLAR 70B<sup>1</sup> \| GPT-3.5<sup>1</sup> \| GPT-4 <sup>1</sup>\| GPT-4o \| Jellyfish-7B \| Jellyfish-8B \| Jellyfish-13B \|
	\| ---- \| ---- \| ---- \| ---- \| ---- \| ---- \| ----\| ----\| ----\|
	\| AE-110k \| 52.10 \| 49.20 \| 61.30 \| 55.50 \| 55.77 \| 56.09 \|59.55 \| 58.12 \|
	\| OA-Mine \| 50.80 \| 55.20 \| 62.70 \| 68.90 \| 60.20 \| 51.98 \| 59.22 \| 55.96 \|

	_Few-shot is disabled for Jellyfish models._

	1. Results from [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537)


	## Prompt Template
	```
	{system message}

	[INST]:

	{prompt} (without the {})

	[\INST]]
	```

	## Prompts

	We provide the prompts used for both the model's fine-tuning and inference.
	You can structure your data according to these prompts.

	### System Message
	```
	You are an AI assistant that follows instruction extremely well.
	User will give you a question. Your task is to answer as faithfully as you can.
	```

	### For Entity Matching
	```
	You are tasked with determining whether two records listed below are the same based on the information provided.
	Carefully compare the {attribute 1}, {attribute 2}... for each record before making your decision.
	Note that missing values (N/A or \"nan\") should not be used as a basis for your decision.
	Record A: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
	Record B: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
	Are record A and record B the same entity? Choose your answer from: [Yes, No].
	```

	### For Data Imputation
	```
	You are presented with a {keyword} record that is missing a specific attribute: {attribute X}.
	Your task is to deduce or infer the value of {attribute X} using the available information in the record.
	You may be provided with fields like {attribute 1}, {attribute 2}, ... to help you in the inference.
	Record: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
	Based on the provided record, what would you infer is the value for the missing attribute {attribute X}?
	Answer only the value of {attribute X}.
	```

	### For Data Imputation
	```
	You are presented with a {keyword} record that is missing a specific attribute: {attribute X}.
	Your task is to deduce or infer the value of {attribute X} using the available information in the record.
	You may be provided with fields like {attribute 1}, {attribute 2}, ... to help you in the inference.
	Record: [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
	Based on the provided record, what would you infer is the value for the missing attribute {attribute X}?
	Answer only the value of {attribute X}.
	```

	### For Error Detection
	_There are two forms of the error detection task.
	In the first form, a complete record row is provided, and the task is to determine if a specific value is erroneous.
	In the second form, only the value of a specific attribute is given, and the decision about its correctness is based solely on the attribute's name and value.
	The subsequent prompt examples pertain to these two forms, respectively._
	```
	Your task is to determine if there is an error in the value of a specific attribute within the whole record provided.
	The attributes may include {attribute 1}, {attribute 2}, ...
	Errors may include, but are not limited to, spelling errors, inconsistencies, or values that don't make sense given the context of the whole record.
	Record [{attribute 1}: {attribute 1 value}, {attribute 2}: {attribute 2 value}, ...]
	Attribute for Verification: [{attribute X}: {attribute X value}]
	Question: Is there an error in the value of {attribute X}? Choose your answer from: [Yes, No].
	```
	```
	Your task is to determine if there is an error in the value of a specific attribute.
	The attributes may belong to a {keyword} record and could be one of the following: {attribute 1}, {attribute 2}, ...
	Errors can include, but are not limited to, spelling errors, inconsistencies, or values that don't make sense for that attribute.
	Note: Missing values (N/A or \"nan\") are not considered errors.
	Attribute for Verification: [{attribute X}: {attribute X value}]
	Question: Is there an error in the value of {attribute X}? Choose your answer from: [Yes, No].
	```

	### For Schema Matching
	```
	Your task is to determine if the two attributes (columns) are semantically equivalent in the context of merging two tables.
	Each attribute will be provided by its name and a brief description.
	Your goal is to assess if they refer to the same information based on these names and descriptions provided.
	Attribute A is [name: {value of name}, description: {value of description}].
	Attribute B is [name: {value of name}, description: {value of description}].
	Are Attribute A and Attribute B semantically equivalent? Choose your answer from: [Yes, No].
	```

	### For Column Type Annotation

	We follow the prompt in [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745) (text+inst+2-step).

	### For Attribute Value Extraction

	We follow the prompt in [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537) (textual, w/o examples).