seongyeon1's picture
Update README.md
fbc2006 verified
|
raw
history blame
1.94 kB
metadata
datasets:
  - e9t/nsmc
language:
  - ko
metrics:
  - accuracy
pipeline_tag: text-classification

Model Description

Uses

  • use to sentimental analysis task

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
model = AutoModelForSequenceClassification.from_pretrained("seongyeon1/klue-base-finetuned-nsmc")
from transformers import pipeline

pipe = pipeline("text-classification", model="seongyeon1/klue-base-finetuned-nsmc")
pipe("진짜 별로더라") # [{'label': 'LABEL_0', 'score': 0.999700665473938}]
pipe("굿굿")        # [{'label': 'LABEL_1', 'score': 0.9875587224960327}]

Training Details

Training Data

from datasets import load_dataset

dataset = load_dataset('nsmc')

Preprocessing

  • bert's default is 512, but it costs a lot of time.
    • maxlen = 55 image/png
def tokenize_function_with_max(examples, maxlen=maxlen):
    encodings = tokenizer(examples['document'],max_length=maxlen, truncation=True, padding='max_length')
    return encodings

Training Hyperparameters

  • learning rate=2e-5, weight decay=0.01, batch size=32, epochs=2

Metrics

  • accuracy
  • label ratio is about almost balanced

image/png

Result

{'eval_loss': 0.2575262784957886, 'eval_accuracy': 0.9041, 'eval_runtime': 163.2129, 'eval_samples_per_second': 306.348, 'eval_steps_per_second': 9.576, 'epoch': 2.0}