|
--- |
|
license: apache-2.0 |
|
language: |
|
- zh |
|
pipeline_tag: image-to-text |
|
widget: |
|
- src: >- |
|
https://huggingface.co/snzhang/FilmTitle-Beit-GPT2/resolve/main/SpiderMan.jpg |
|
example_title: SpiderMan |
|
- src: >- |
|
https://huggingface.co/snzhang/FilmTitle-Beit-GPT2/resolve/main/BorntoFly.jpg |
|
example_title: Born to Fly |
|
--- |
|
|
|
# Image Caption Model |
|
|
|
## Model description |
|
|
|
The model is used to generate the Chinese title of a random movie post. It is based on the [BEiT](https://huggingface.co/microsoft/beit-base-patch16-224-pt22k-ft22k) and [GPT2](https://huggingface.co/IDEA-CCNL/Wenzhong-GPT2-110M). |
|
|
|
## Training Data |
|
|
|
The training data contains 5043 movie posts and their corresponding Chinese title which are collected by [Movie-Title-Post](https://huggingface.co/datasets/snzhang/Movie-Title-Post) |
|
|
|
## How to use |
|
|
|
```Python |
|
from transformers import VisionEncoderDecoderModel, ViTFeatureExtractor, AutoTokenizer |
|
from PIL import Image |
|
|
|
pretrained = "snzhang/FilmTitle-Beit-GPT2" |
|
model = VisionEncoderDecoderModel.from_pretrained(pretrained) |
|
feature_extractor = ViTFeatureExtractor.from_pretrained(pretrained) |
|
tokenizer = AutoTokenizer.from_pretrained(pretrained) |
|
|
|
image_path = "your image path" |
|
image = Image.open(image_path) |
|
if image.mode != "RGB": |
|
image = image.convert("RGB") |
|
pixel_values = feature_extractor(images=image, return_tensors="pt").pixel_values |
|
|
|
output_ids = model.generate(pixel_values, **gen_kwargs) |
|
preds = tokenizer.batch_decode(output_ids, skip_special_tokens=True) |
|
preds = [pred.strip() for pred in preds] |
|
print(preds) |
|
``` |
|
|
|
## More Details |
|
|
|
You can get more training details in [FilmTitle-Beit-GPT2](https://github.com/h7nian/FilmTitle-Beit-GPT2) |
|
|