k050506koch
/

gpt2-story-tinetuned

Model card Files Files and versions Community

gpt2-story-tinetuned / README.md

k050506koch's picture

Create README.md

809aeca verified 2 months ago

|

2.31 kB

	# Model Card for V2 Models

	## Model Description
	This repository contains multiple models trained using the GPT-2 architecture for generating creative stories, superhero names, and abilities. The models are designed to assist in generating narrative content based on user prompts.

	## Model Variants
	- Story Model: Generates stories based on prompts.
	- Name Model: Generates superhero names based on story context.
	- Abilities Model: Generates superhero abilities based on story context.
	- Midjourney Model: Generates mid-journey prompts for storytelling.

	## Training Data
	The models were trained on a custom dataset stored in `batch_ds_v2.txt`, which includes various story prompts, superhero names, and abilities. The dataset was preprocessed to extract relevant parts for training.

	## Training Procedure
	- Framework: PyTorch with Hugging Face Transformers
	- Model: GPT-2
	- Training Arguments:
	- Learning Rate: 1e-4
	- Number of Epochs: 15
	- Max Steps: 5000
	- Batch Size: Auto-detected
	- Gradient Clipping: 1.0
	- Logging Steps: 1

	## Evaluation
	The models were evaluated based on their ability to generate coherent and contextually relevant text. Specific metrics were not provided, but qualitative assessments were made during development.

	## Inference
	To use the models for inference, you can send a POST request to the `/generate/<model_path>` endpoint of the Flask application. The input should be a JSON object containing the `input_text` key.

	### Example Request
	```
	json
	{
	"input_text": "[Ivan Ivanov, Lead Software Engineer, Superhero for Justice, Writing code, fixing issues, solving problems, Masculine, Long Hair, Adult]<endoftext>"
	}
	```

	### Example Response
	The response will contain the generated text based on the input prompt.

	## Limitations
	- The models may generate biased or nonsensical outputs based on the training data.
	- They may not always understand complex prompts or context, leading to irrelevant or inaccurate responses.
	- The models are sensitive to input phrasing; slight changes in the prompt can yield different results.

	## License
	This model is released under the MIT License. Please refer to the LICENSE file for more details.

	## Citation
	If you use this model in your research or applications, please cite it as follows: