File size: 3,676 Bytes
48b9a4a 2017989 48b9a4a f265d17 48b9a4a 75a7038 48b9a4a 3dc9d6f 48b9a4a 7bf40a4 48b9a4a f265d17 48b9a4a 7bf40a4 d0cc15f 48b9a4a 7bf40a4 48b9a4a f265d17 48b9a4a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
---
language: tr
Dataset: interpress_news_category_tr
---
# INTERPRESS NEWS CLASSIFICATION
## Dataset
The dataset downloaded from interpress. This dataset is real world data. Actually there are 273K data but I filtered them and used 108K data for this model. For more information about dataset please visit this [link](https://huggingface.co/datasets/interpress_news_category_tr_lite)
## Model
Model accuracy on train data and validation data is %97.
## Usage for Torch
```sh
pip install transformers or pip install transformers==4.3.3
```
```sh
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("serdarakyol/interpress-turkish-news-classification")
model = AutoModelForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")
```
```sh
import torch
import numpy as np
if torch.cuda.is_available():
device = torch.device("cuda")
model = model.cuda()
print('There are %d GPU(s) available.' % torch.cuda.device_count())
print('GPU name is:', torch.cuda.get_device_name(0))
else:
print('No GPU available, using the CPU instead.')
device = torch.device("cpu")
```
```sh
def prediction(news):
news=[news]
indices=tokenizer.batch_encode_plus(
news,
max_length=512,
add_special_tokens=True,
return_attention_mask=True,
padding='max_length',
truncation=True,
return_tensors='pt')
inputs = indices["input_ids"].clone().detach().to(device)
masks = indices["attention_mask"].clone().detach().to(device)
with torch.no_grad():
output = model(inputs, token_type_ids=None,attention_mask=masks)
logits = output[0]
logits = logits.detach().cpu().numpy()
pred = np.argmax(logits,axis=1)[0]
return pred
```
```sh
news = r"ABD'den Prens Selman'a yaptırım yok Beyaz Saray Sözcüsü Psaki, Muhammed bin Selman'a yaptırım uygulamamanın \"doğru karar\" olduğunu savundu. Psaki, \"Tarihimizde, Demokrat ve Cumhuriyetçi başkanların yönetimlerinde diplomatik ilişki içinde olduğumuz ülkelerin liderlerine yönelik yaptırım getirilmemiştir\" dedi."
```
You can find the news in this [link](https://www.ntv.com.tr/dunya/abdden-prens-selmana-yaptirim-yok,YTeWNv0-oU6Glbhnpjs1JQ) (news date: 02/03/2021)
```sh
labels = {
0 : "Culture-Art",
1 : "Economy",
2 : "Politics",
3 : "Education",
4 : "World",
5 : "Sport",
6 : "Technology",
7 : "Magazine",
8 : "Health",
9 : "Agenda"
}
pred = prediction(news)
print(labels[pred])
# > World
```
## Usage for Tensorflow
```sh
pip install transformers or pip install transformers==4.3.3
import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification
import numpy as np
tokenizer = BertTokenizer.from_pretrained('serdarakyol/interpress-turkish-news-classification')
model = TFBertForSequenceClassification.from_pretrained("serdarakyol/interpress-turkish-news-classification")
inputs = tokenizer(news, return_tensors="tf")
inputs["labels"] = tf.reshape(tf.constant(1), (-1, 1)) # Batch size 1
outputs = model(inputs)
loss = outputs.loss
logits = outputs.logits
pred = np.argmax(logits,axis=1)[0]
labels[pred]
# > World
```
Thanks to [@yavuzkomecoglu](https://huggingface.co/yavuzkomecoglu) for contributes
If you have any question, please, don't hesitate to contact with me
[![linkedin](https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/serdarakyol55/)
[![Github](https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/serdarakyol) |