dart-v1-sft / README.md

Update README.md

01c1073 verified 9 months ago

8.05 kB

	---
	library_name: transformers
	license: apache-2.0
	datasets:
	- isek-ai/danbooru-tags-2023
	base_model: p1atdev/dart-v1-base
	tags:
	- trl
	- sft
	- danbooru
	inference: false
	---

	# Dart (Danbooru Tags Transformer) v1

	This model is a fine-tuned Dart (Danbooru Tags Transformer) model that generates danbooru tags.

	Demo: [🤗 Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)

	If you are a developer and want to finetune, it's recommended using the base version, [p1atdev/dart-v1-base](https://huggingface.co/p1atdev/dart-v1-base), instead

	## Usage

	### Using AutoModel

	🤗 Transformers library is required.

	```bash
	pip install -U transformers
	```

	```py
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

	MODEL_NAME = "p1atdev/dart-v1-sft"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) # trust_remote_code is required for tokenizer
	model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16)

	prompt = "<\|bos\|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl, "
	inputs = tokenizer(prompt, return_tensors="pt").input_ids

	with torch.no_grad():
	outputs = model.generate(inputs, generation_config=generation_config)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	# rating:sfw, rating:general, original, 1girl, ahoge, black hair, blue eyes, blush, closed mouth, ear piercing, earrings, jewelry, looking at viewer, mole, mole under eye, piercing, portrait, shirt, short hair, solo, white shirt
	```

	#### Flash attention (optional)

	Using flash attention can optimize computations, but it is currently only compatible with Linux.

	```bash
	pip install flash_attn
	```

	### Accelerate with ORTModel

	🤗 Optimum library is also compatible, for the high performance inference using ONNX.

	```bash
	pip install "optimum[onnxruntime]"
	```

	Two ONNX models are provided:

	- [Normal](./model.onnx)
	- [Quantized](./model_quantized.onnx)

	Both can be utilized based on the following code:

	```py
	import torch
	from transformers import AutoTokenizer, GenerationConfig
	from optimum.onnxruntime import ORTModelForCausalLM

	MODEL_NAME = "p1atdev/dart-v1-sft"

	tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

	# normal version
	ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME)

	# qunatized version
	# ort_model = ORTModelForCausalLM.from_pretrained(MODEL_NAME, file_name="model_quantized.onnx")

	prompt = "<\|bos\|><rating>rating:sfw, rating:general</rating><copyright>original</copyright><character></character><general>1girl, "
	inputs = tokenizer(prompt, return_tensors="pt").input_ids

	with torch.no_grad():
	outputs = model.generate(inputs, generation_config=generation_config)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Prompt guidde

	Due to training with a specialized prompt format, natural language is not supported.

	The trained sentences are essentially composed of the following elements, arranged in the strict order shown below:

	- `<\|bos\|>`: The bos (begin of sentence) token
	- `<rating>[RATING_PARENT], [RATING_CHILD]</rating>`: The block of rating tags
	- [RATING_PARENT]: `rating:sfw`, `rating:nsfw`
	- [RATING_CHILD]:
	- if `[RATING_PARENT]` is `rating:sfw`: `rating:general`, `rating:sensitive`
	- else: `rating:questionable`, `rating:explicit`
	- `<copyright>[COPYRIGHT, ...]</copyright>`: The block of copyright tags.
	- [COPYRIGHT, ...]: All supported copyright tags can be seen in [TODO]()
	- `<character>[CHARACTER, ...]</character>`: The block of character tags.
	- [CHARACTER, ...]: All supported character tags can be seen in [TODO]()
	- `<general>[LENGTH_TOKEN][GENERAL, ...]<\|input_end\|>[COMPLETION]</general>`: The block of general tags.
	- [LENGTH_TOKEN]: A token to specify total amount of general tags.
	- Avaiable:
	- `<\|very_short\|>`: less than 10 tags
	- `<\|short\|>`: less than 20 tags
	- `<\|long\|>`: less than 40 tags (recommended)
	- `<\|very_long\|>`: more than 40 tags
	- [GENERAL, ...]: All supported general tags can be seen in [TODO]()
	- `<\|input_end\|>`: A tag to show the end of input. Set this token at last of prompt.
	- [COMPLETION]: The model complete tags in alphabetical order.
	- `<\|eos\|>`: The eos (end of sentence) token

	- Tags other than special tokens are separated by commas.
	- You can place tags in any order you like in each block.

	Example sentence:

	```
	<\|bos\|><rating>rating:sfw, rating:general</rating><copyright>vocaloid</copyright><character>hatsune miku</character><general><\|long\|>solo, 1girl, very long hair<\|input_end\|>blue hair, cowboy shot, ...</general><\|eos\|>
	```

	Therefore, to complete the tags, the input prompt should be as follows:

	1. without any copyright and character tags

	```
	<\|bos\|><rating>rating:sfw, rating:general</rating><copyright></copyright><character></character><general><\|very_long\|>1girl, solo, cat ears<\|input_end\|>
	```

	2. specifing copyright and character tags

	```
	<\|bos\|><rating>rating:sfw, rating:general</rating><copyright>sousou no frieren</copyright><character>frieren</character><general><\|long\|>1girl, solo, from side<\|input_end\|>
	```

	## Model Details

	### Model Description

	- Developed by: Plat
	- Model type: Causal language model
	- Language(s) (NLP): Danbooru tags
	- License: Apache-2.0

	- Demo: Avaiable on [🤗Space](https://huggingface.co/spaces/p1atdev/danbooru-tags-transformer)

	## Bias, Risks, and Limitations

	Since this model is a pre-trained model, it cannot accommodate flexible specifications.

	## Training Details

	### Training Data

	This model was trained with:

	- [isek-ai/danbooru-tags-2023](https://huggingface.co/datasets/isek-ai/danbooru-tags-2023): 6M size of danbooru tags dataset since 2005 to 2023

	Only data from 2020 onwards was used for SFT.

	### Training Procedure

	Trained using 🤗 transformers' trainer.

	#### Preprocessing

	Preprocessing was conducted through the following process:

	1. Remove data where `general` tags is null.
	2. Remove `general` tags that appear less than 100 times.
	3. Remove undesirable tags such as `watermark` and `bad anatomy`.
	4. Remove based on the number of tags attached to a single post (following rules):
	- Remove if more than 100 for `general` tags.
	- Remove if more than 5 for `copyright` tags.
	- Remove if more than 10 for `character` tags.
	5. Remove posts created before 2020
	6. Set length token according to each tags length
	7. Shuffle some tags in the following rule:
	- Include people tags (e.g. `1girl`, `no humans`) tags in the shuffle-group with a 95% probability, and do not do so with a 5% probability.
	- Get tags at a random percentage between 0% and 75% to create a shuffle-group.
	- Shuffle tags in shuffle-group and concatnate with `<\|input_end\|>` token and remains in alphabetical order.
	8. Concatnate all categories

	#### Training Hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 32
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 1


	## Evaluation

	Evaluation has not been done yet and it needs to evaluate.

	## Technical Specifications

	### Model Architecture and Objective

	The architecture of this model is [OPT (Open Pretrained Transformer)](https://huggingface.co/docs/transformers/model_doc/opt), but the position embeddings was not trained.

	### Compute Infrastructure

	In house

	#### Hardware

	1x RTX 3070 Ti

	#### Software

	- Dataset processing: [🤗 Datasets](https://github.com/huggingface/datasets)
	- Training: [🤗 Transformers](https://github.com/huggingface/transformers)
	- Optimizing: [🤗 Optimum](https://github.com/huggingface/optimum)
	- SFT: [🤗 TRL](https://github.com/huggingface/trl)

	## More Information [optional]

	[More Information Needed]