PerRing
/

llava-v1.6-34b-hf

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

llava-v1.6-34b-hf / README.md

PerRing's picture

Update README.md

9c9d796 verified 8 months ago

|

3.33 kB

	---
	{}
	---
	this repo is huggingface version of liuhaotian/llava-v1.6-34b
	# Issue
	~Despite the completion of generation, '\n' is repeatedly generated, so be mindful of adjusting the 'max_length'.~
	<br>
	error fixed!

	```python
	import requests
	from PIL import Image
	import torch
	from transformers import AutoProcessor, LlavaForConditionalGeneration

	model_id = "PerRing/llava-v1.6-34b-hf"
	model = LlavaForConditionalGeneration.from_pretrained(
	model_id,
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	).to(0)
	processor = AutoProcessor.from_pretrained(model_id)

	Q='explain about this image'
	prompt = f"""<\|im_start\|>system
	Answer the questions.<\|im_end\|><\|im_start\|>user
	<image>
	{Q}<\|im_end\|><\|im_start\|>assistant
	"""
	image_file = "https://images.pexels.com/photos/757889/pexels-photo-757889.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=2"

	raw_image = Image.open(requests.get(image_file, stream=True).raw)
	inputs = processor(prompt, raw_image, return_tensors='pt').to(0, torch.float16)

	output = model.generate(**inputs, max_length=256, temperature=0.4, do_sample=True)
	print(processor.decode(output[0], skip_special_tokens=True))
	```
	## result
	```output
	<\|im_start\|> system
	Answer the questions.<\|im_start\|> user

	explain about this image<\|im_start\|> assistant
	The image shows a clear glass vase filled with water and containing several purple flowers with green leaves. The flowers appear to be tulips or a similar type of lily, given their shape and color. The vase is placed on what looks like a balcony or outdoor terrace, as suggested by the presence of other plants and a railing in the background. The lighting and the shadows cast by the flowers and the railing indicate that the photo was taken during the day with natural light. The overall scene conveys a sense of tranquility and appreciation for nature.
	```



	# Original(liuhaotian/llava-v1.6-34b) README.md


	<br>
	<br>

	# LLaVA Model Card

	## Model details

	Model type:
	LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data.
	It is an auto-regressive language model, based on the transformer architecture.
	Base LLM: [NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B)

	Model date:
	LLaVA-v1.6-34B was trained in December 2023.

	Paper or resources for more information:
	https://llava-vl.github.io/

	## License
	[NousResearch/Nous-Hermes-2-Yi-34B](https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B) license.

	Where to send questions or comments about the model:
	https://github.com/haotian-liu/LLaVA/issues

	## Intended use
	Primary intended uses:
	The primary use of LLaVA is research on large multimodal models and chatbots.

	Primary intended users:
	The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

	## Training dataset
	- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
	- 158K GPT-generated multimodal instruction-following data.
	- 500K academic-task-oriented VQA data mixture.
	- 50K GPT-4V data mixture.
	- 40K ShareGPT data.

	## Evaluation dataset
	A collection of 12 benchmarks, including 5 academic VQA benchmarks and 7 recent benchmarks specifically proposed for instruction-following LMMs.