Image-Text-to-Text
Transformers
PyTorch
English
llava
text-generation
Inference Endpoints
SpursgoZmy commited on
Commit
ca19a28
1 Parent(s): 7101ea4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -23,7 +23,7 @@ See the ACL 2024 paper for more details: [Multimodal Table Understanding](https:
23
 
24
  <!-- Provide a longer summary of what this model is. -->
25
 
26
- **Model Type:** Table LLaVA strictly follows the [LLaVA-v1.5](https://arxiv.org/abs/2310.03744) model architecture and training pipeline,
27
  with [CLIP-ViT-L-336px](https://huggingface.co/openai/clip-vit-large-patch14-336) as visual encoder (336*336 image resolution),
28
  [Vicuna-v1.5-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) as base LLM and a two-layer MLP as vision-language connector.
29
 
@@ -53,7 +53,7 @@ which is a large-scale dataset covering a wide range of table images and table-r
53
  | | 232K multimodal instruction tuning data of 14 tabular tasks | 232K | [MMTab-instruct_sft_data_llava_format_232K.json](https://huggingface.co/datasets/SpursgoZmy/MMTab) |
54
 
55
  We also provide the merged pre-training and instruction fine-tuning data in the MMTab dataset,
56
- i.e., enhanced_llava_pretrain_data_708K.json and enhanced_llava_sft_data_898K.json.
57
 
58
  ## Evaluation dataset
59
 
 
23
 
24
  <!-- Provide a longer summary of what this model is. -->
25
 
26
+ **Model Type:** Table LLaVA 7B strictly follows the [LLaVA-v1.5](https://arxiv.org/abs/2310.03744) model architecture and training pipeline,
27
  with [CLIP-ViT-L-336px](https://huggingface.co/openai/clip-vit-large-patch14-336) as visual encoder (336*336 image resolution),
28
  [Vicuna-v1.5-7B](https://huggingface.co/lmsys/vicuna-7b-v1.5) as base LLM and a two-layer MLP as vision-language connector.
29
 
 
53
  | | 232K multimodal instruction tuning data of 14 tabular tasks | 232K | [MMTab-instruct_sft_data_llava_format_232K.json](https://huggingface.co/datasets/SpursgoZmy/MMTab) |
54
 
55
  We also provide the merged pre-training and instruction fine-tuning data in the MMTab dataset,
56
+ i.e., enhanced_llava_pretrain_data_708K.json and enhanced_llava_sft_data_898K.json, which was used to train Table LLaVA.
57
 
58
  ## Evaluation dataset
59