Files changed (1) hide show
  1. README.md +17 -1
README.md CHANGED
@@ -18,13 +18,25 @@ datasets:
18
  This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the `ytucosmos/Turkish-Llama-8b-Instruct-v0.1` language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish.
19
 
20
  # Model Details
21
- The model was pretrained with a translated version of the **[LLaVA-CC3M-Pretrain-595K](https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K)** dataset.<br>
22
  It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities:
23
  - **[Stanford GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)**
24
  - **[VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)**
25
  - **[COCO](https://cocodataset.org/#home)**
 
26
 
27
  ## Example Usage
 
 
 
 
 
 
 
 
 
 
 
28
  ```python
29
  from lmdeploy import pipeline, ChatTemplateConfig
30
  from lmdeploy.vl import load_image
@@ -38,6 +50,7 @@ image = load_image(url)
38
  response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image))
39
 
40
  print(response)
 
41
  """
42
  Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor.
43
  Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor.
@@ -45,6 +58,9 @@ Bu sahne, köpeğin bahçede geçirdiği zamanın tadını çıkardığı ve çe
45
  """
46
  ```
47
 
 
 
 
48
  # Acknowledgments
49
  - Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM).
50
  - Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗
 
18
  This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the `ytucosmos/Turkish-Llama-8b-Instruct-v0.1` language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish.
19
 
20
  # Model Details
21
+ The model was pretrained on **[LLaVA-CC3M-Pretrain-595K](https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K)** dataset, which was translated to Turkish using DeepL Translate.<br>
22
  It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities:
23
  - **[Stanford GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html)**
24
  - **[VisualGenome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)**
25
  - **[COCO](https://cocodataset.org/#home)**
26
+ - **110K multi-turn instruction following data** consisting of **book covers**, to enhance models capabilities on tasks regarding OCR.
27
 
28
  ## Example Usage
29
+
30
+ #### Using lmdeploy
31
+
32
+ 1. Install requirements:
33
+ ```
34
+ pip install 'lmdeploy>=0.4.0'
35
+ pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
36
+ ```
37
+
38
+ 2. Run the following code:
39
+
40
  ```python
41
  from lmdeploy import pipeline, ChatTemplateConfig
42
  from lmdeploy.vl import load_image
 
50
  response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image))
51
 
52
  print(response)
53
+
54
  """
55
  Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor.
56
  Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor.
 
58
  """
59
  ```
60
 
61
+ Image used in this example:
62
+ <img src="./example.png"/>
63
+
64
  # Acknowledgments
65
  - Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM).
66
  - Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗