--- license: cc-by-nc-4.0 language: - en --- # Jellyfish-7B PicToModel ## Model Details Jellyfish-7B is a large language model equipped with 7 billion parameters. We fine-tuned the [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) model using a subset of the [Jellyfish-Instruct](https://huggingface.co/datasets/NECOUDBFM/Jellyfish-Instruct) Jellyfish-7B vs GPT-3.5-turbo wining rate by GPT4 evaluation is 56.36%. More details about the model can be found in the [Jellyfish paper](https://arxiv.org/abs/2312.01678). - **Developed by:** Haochen Zhang, Yuyang Dong, Chuan Xiao, Masafumi Oyamada - **Contact: dongyuyang@nec.com** - **Funded by:** NEC Corporation, Osaka University - **Language(s) (NLP):** English - **License:** Non-Commercial Creative Commons license (CC BY-NC-4.0) - **Finetuned from model:** [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) ## Citation If you find our work useful, please give us credit by citing: ``` @article{zhang2023jellyfish, title={Jellyfish: A Large Language Model for Data Preprocessing}, author={Zhang, Haochen and Dong, Yuyang and Xiao, Chuan and Oyamada, Masafumi}, journal={arXiv preprint arXiv:2312.01678}, year={2023} } ``` ## Performance on seen tasks | Task | Type | Dataset | Non-LLM SoTA1 | GPT-3.52 | GPT-42 | GPT-4o | Table-GPT | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B | |-----------------|--------|-------------------|-----------------|--------|--------|--------|-----------|--------------|--------------|---------------| | Error Detection | Seen | Adult | *99.10* | 99.10 | 92.01 | 83.58 | -- | 77.40 | 73.74 | **99.33** | | Error Detection | Seen | Hospital | 94.40 | **97.80** | 90.74 | 44.76 | -- | 94.51 | 93.40 | *95.59* | | Error Detection | Unseen | Flights | 81.00 | -- | **83.48** | 66.01 | -- | 69.15 | 66.21 | *82.52* | | Error Detection | Unseen | Rayyan | 79.00 | -- | *81.95* | 68.53 | -- | 75.07 | 81.06 | **90.65** | | Data Imputation | Seen | Buy | 96.50 | 98.50 | **100** | **100** | -- | 98.46 | 98.46 | **100** | | Data Imputation | Seen | Restaurant | 77.20 | 88.40 | **97.67** | 90.70 | -- | 89.53 | 87.21 | 89.53 | | Data Imputation | Unseen | Flipkart | 68.00 | -- | **89.94** | 83.20 | -- | 87.14 | *87.48* | 81.68 | | Data Imputation | Unseen | Phone | 86.70 | -- | **90.79** | 86.78 | -- | 86.52 | 85.68 | *87.21* | | Schema Matching | Seen | MIMIC-III | 20.00 | -- | 40.00 | 29.41 | -- | **53.33** | *45.45* | 40.00 | | Schema Matching | Seen | Synthea | 38.50 | 45.20 | **66.67** | 6.56 | -- | 55.56 | 47.06 | 56.00 | | Schema Matching | Unseen | CMS | *50.00* | -- | 19.35 | 22.22 | -- | 42.86 | 38.10 | **59.29** | | Entity Matching | Seen | Amazon-Google | 75.58 | 63.50 | 74.21 | 70.91 | 70.10 | **81.69** | *81.42* | 81.34 | | Entity Matching | Seen | Beer | 94.37 | **100** | **100** | 90.32 | 96.30 | **100.00** | **100.00** | 96.77 | | Entity Matching | Seen | DBLP-ACM | **98.99** | 96.60 | 97.44 | 95.87 | 93.80 | 98.65 | 98.77 | *98.98* | | Entity Matching | Seen | DBLP-GoogleScholar| *95.70* | 83.80 | 91.87 | 90.45 | 92.40 | 94.88 | 95.03 | **98.51** | | Entity Matching | Seen | Fodors-Zagats | **100** | **100** | **100** | 93.62 | **100** | **100** | **100** | **100** | | Entity Matching | Seen | iTunes-Amazon | 97.06 | *98.20*| **100** | 98.18 | 94.30 | 96.30 | 96.30 | 98.11 | | Entity Matching | Unseen | Abt-Buy | 89.33 | -- | **92.77** | 78.73 | -- | 86.06 | 88.84 | *89.58* | | Entity Matching | Unseen | Walmart-Amazon | 86.89 | 87.00 | **90.27** | 79.19 | 82.40 | 84.91 | 85.24 | *89.42* | | Avg | | | 80.44 | - | *84.17* | 72.58 | - | 82.74 | 81.55 | **86.02** | _For GPT-3.5 and GPT-4, we used the few-shot approach on all datasets. However, for Jellyfish models, the few-shot approach is disabled on seen datasets and enabled on unseen datasets._ _Accuracy as the metric for data imputation and the F1 score for other tasks._ 1. [Ditto](https://arxiv.org/abs/2004.00584) for Entity Matching [SMAT](https://www.researchgate.net/publication/353920530_SMAT_An_Attention-Based_Deep_Learning_Solution_to_the_Automation_of_Schema_Matching) for Schema Matching [HoloDetect](https://arxiv.org/abs/1904.02285) for Error Detection seen datasets [RAHA](https://dl.acm.org/doi/10.1145/3299869.3324956) for Error Detection unseen datasets [IPM](https://ieeexplore.ieee.org/document/9458712) for Data Imputation 2. [Large Language Models as Data Preprocessors](https://arxiv.org/abs/2308.16361) ## Performance on unseen tasks ### Column Type Annotation | Dataset | RoBERTa (159 shots)1 | GPT-3.51 | GPT-4 | GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B | |--------|-----------------|--------|--------|--------|--------------|--------------|---------------| | SOTAB | 79.20 | 89.47 | 91.55 | 65.05 | 83 | 76.33 | 82 | _Few-shot is disabled for Jellyfish models._ 1. Results from [Column Type Annotation using ChatGPT](https://arxiv.org/abs/2306.00745) ### Attribute Value Extraction | Dataset |Stable Beluga 2 70B1 | SOLAR 70B1 | GPT-3.51 | GPT-4 1| GPT-4o | Jellyfish-7B | Jellyfish-8B | Jellyfish-13B | | ---- | ---- | ---- | ---- | ---- | ---- | ----| ----| ----| | AE-110k | 52.10 | 49.20 | 61.30 | 55.50 | 55.77 | 56.09 |59.55 | 58.12 | | OA-Mine | 50.80 | 55.20 | 62.70 | 68.90 | 60.20 | 51.98 | 59.22 | 55.96 | _Few-shot is disabled for Jellyfish models._ 1. Results from [Product Attribute Value Extraction using Large Language Models](https://arxiv.org/abs/2310.12537) ## Prompt Template ``` [INST]: (without the <>) [\INST]] ```