--- datasets: - e9t/nsmc language: - ko metrics: - accuracy pipeline_tag: text-classification --- ## Model Description - **Finetuned from model klue/bert :** (https://huggingface.co/klue/bert-base) - i got **test_accuracy: 0.9041** ## Uses - use to sentimental analysis task ## How to Get Started with the Model ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc") model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc") ``` ```python from transformers import pipeline pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc") pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}] pipe("굿굿") # [{'label': 'LABEL_1', 'score': 0.9875587224960327}] ``` ## Training Details ### Training Data - nsmc datasets (https://huggingface.co/datasets/e9t/nsmc) ```python from datasets import load_dataset dataset = load_dataset('nsmc') ``` #### Preprocessing - bert's default is 512, but it costs a lot of time. - maxlen = 55 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/t7axSlo4JI4bPLynUB3OP.png) ```python def tokenize_function_with_max(examples, maxlen=maxlen): encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length') return encodings ``` #### Training Hyperparameters - learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2 #### Metrics - **accuracy** - label ratio is about almost balanced ![image/png](https://cdn-uploads.huggingface.co/production/uploads/634330a304d4ff28aeb8de56/_S5TTyec8I25Kx-yaqeJo.png) #### Result {'eval_loss': 0.2575262784957886, 'eval_accuracy': 0.9041, 'eval_runtime': 163.2129, 'eval_samples_per_second': 306.348, 'eval_steps_per_second': 9.576, 'epoch': 2.0}