SpursgoZmy
/

table-llava-v1.5-7b

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

SpursgoZmy commited on Jun 21

Commit

ca19a28

•

1 Parent(s): 7101ea4

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ See the ACL 2024 paper for more details: [Multimodal Table Understanding](https:
 <!-- Provide a longer summary of what this model is. -->
-**Model Type:** Table LLaVA strictly follows the [LLaVA-v1.5](https://arxiv.org/abs/2310.03744) model architecture and training pipeline,
 with [CLIP-ViT-L-336px](https://huggingface.co/openai/clip-vit-large-patch14-336) as visual encoder (336*336 image resolution),
 [Vicuna-v1.5-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) as base LLM and a two-layer MLP as vision-language connector.
@@ -53,7 +53,7 @@ which is a large-scale dataset covering a wide range of table images and table-r
 |              | 232K multimodal instruction tuning data of 14 tabular tasks | 232K | [MMTab-instruct_sft_data_llava_format_232K.json](https://huggingface.co/datasets/SpursgoZmy/MMTab) |
 We also provide the merged pre-training and instruction fine-tuning data in the MMTab dataset,
-i.e., enhanced_llava_pretrain_data_708K.json and enhanced_llava_sft_data_898K.json.
 ## Evaluation dataset

 <!-- Provide a longer summary of what this model is. -->
+**Model Type:** Table LLaVA 7B strictly follows the [LLaVA-v1.5](https://arxiv.org/abs/2310.03744) model architecture and training pipeline,
 with [CLIP-ViT-L-336px](https://huggingface.co/openai/clip-vit-large-patch14-336) as visual encoder (336*336 image resolution),
 [Vicuna-v1.5-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) as base LLM and a two-layer MLP as vision-language connector.
 |              | 232K multimodal instruction tuning data of 14 tabular tasks | 232K | [MMTab-instruct_sft_data_llava_format_232K.json](https://huggingface.co/datasets/SpursgoZmy/MMTab) |
 We also provide the merged pre-training and instruction fine-tuning data in the MMTab dataset,
+i.e., enhanced_llava_pretrain_data_708K.json and enhanced_llava_sft_data_898K.json, which was used to train Table LLaVA.
 ## Evaluation dataset