WangchanBERTa Base for Sentiment Analysis

This is a fine-tuned version of the WangchanBERTa model, trained for sentiment analysis in Thai language using simpletransformers.

Model Details

Model Name: WangchanBERTa Base Sentiment Analysis
Pretrained Base Model: airesearch/wangchanberta-base-att-spm-uncased
Architecture: CamemBERT
Language: Thai
Task: Sentiment Classification

Training Configuration

Training Dataset: (e.g., your dataset name or a public dataset if applicable)
Number of Training Epochs: 6
Train Batch Size: 16
Eval Batch Size: 32
Learning Rate: 2e-5
Optimizer: AdamW
Scheduler: Cosine
Gradient Accumulation Steps: 2
Seed: 42
Training Framework: simpletransformers
FP16: Disabled

Model Performance

Provide any performance metrics here, such as accuracy, F1-score, etc., depending on your dataset.

Usage

To use this model, you can load it as follows:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F
import numpy as np
from pythainlp.tokenize import word_tokenize

tokenizer = AutoTokenizer.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("Pongsathorn/wangchanberta-base-sentiment")

id2label = {
    0: "pos", 
    1: "neu", 
    2: "neg",  
}

input_text = "พนักงานบริการดีมาก สัญญาณก็ดี แต่ร้านอยู่ที่ไหน อยากได้ข้อมูลเพิ่มเติม จะได้ประกาศบนเว็บถูก"  

segmented_text = word_tokenize(input_text, engine="longest")

preprocessed_text = " ".join(segmented_text)

inputs = tokenizer(preprocessed_text, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

probs = F.softmax(logits, dim=-1)

predicted_class = torch.argmax(probs, dim=-1).item()

predicted_label = id2label[predicted_class]

print("Predicted Label (ID):", predicted_class)
print("Predicted Label (Description):", predicted_label)
max_prob = np.max(probs.numpy())
print(f"Maximum Probability: {max_prob:.4f}")