File size: 2,942 Bytes
ce91500
 
 
 
 
78c1454
 
 
 
e3454a1
8a1fa53
 
 
ce91500
347458d
70afe6a
ce91500
 
347458d
ce91500
 
 
 
 
 
 
 
 
 
 
46e0406
ce91500
 
347458d
 
46e0406
347458d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce91500
 
d7d7362
ce91500
d7d7362
ce91500
d7d7362
 
 
 
 
347458d
ce91500
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: cc-by-nc-4.0
language:
- az
pipeline_tag: text-classification
tags:
- sentiment
- analysis
- azerbaijani
widget:
- text: Bu mənim xoşuma gəlir
datasets:
- LocalDoc/sentiments_dataset_azerbaijani
---
# Sentiment Analysis Model for Azerbaijani Text
This repository hosts a fine-tuned XLM-RoBERTa model for sentiment analysis on Azerbaijani text. The model is capable of classifying text into three categories: negative, neutral, and positive.

## Model Description
The model is based on `xlm-roberta-base`, which has been fine-tuned on a diverse dataset of Azerbaijani text samples. It is designed to understand the sentiment expressed in texts and classify them accordingly.

## How to Use
You can use this model directly with a pipeline for text classification, or you can use it with the `transformers` library for more custom usage, as shown in the example below.

### Quick Start
First, install the transformers library if you haven't already:
```bash
pip install transformers
```

```python
from transformers import AutoModelForSequenceClassification, XLMRobertaTokenizer
import torch

# Load the model and tokenizer from Hugging Face Hub
model_name = "LocalDoc/sentiment_analysis_azerbaijani"
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    # Encode the text using the tokenizer
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)

    # Get predictions from the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Convert logits to probabilities using softmax
    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)

    # Get the highest probability and corresponding label
    top_prob, top_label = torch.max(probs, dim=-1)
    labels = ["negative", "neutral", "positive"]

    # Return the label with the highest probability
    return labels[top_label], top_prob

# Example text
text = "Bu mənim xoşuma gəlir"

# Get the sentiment
predicted_label, probability = predict_sentiment(text)
print(f"Predicted sentiment: {predicted_label} with a probability of {probability.item():.4f}")

```

## Sentiment Label Information

The model outputs a label for each prediction, corresponding to one of the sentiment categories listed below. Each label is associated with a specific sentiment as detailed in the following table:

| Label | Sentiment |
|-------|-----------|
| 0     | Negative  |
| 1     | Neutral   |
| 2     | Positive  |



License

The dataset is licensed under the Creative Commons Attribution-NonCommercial 4.0 International  license. This license allows you to freely share and redistribute the dataset with attribution to the source but prohibits commercial use and the creation of derivative works.



Contact information

If you have any questions or suggestions, please contact us at [[email protected]].