deprem-ml
/

deprem-loodos-bert-base-uncased

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

deprem-loodos-bert-base-uncased / README.md

merve's picture

merve HF staff

Update README.md

cbade61 over 1 year ago

|

history blame contribute delete

2.78 kB

	---
	license: apache-2.0
	language:
	- tr
	tags:
	- deprem-clf-v1
	metrics:
	- accuracy
	- recall
	- f1
	library_name: transformers
	pipeline_tag: text-classification
	model-index:
	- name: deprem_v12
	results:
	- task:
	type: text-classification
	dataset:
	type: deprem_private_dataset_v1_2
	name: deprem_private_dataset_v1_2
	metrics:
	- type: recall
	value: 0.8
	verified: false
	- type: f1
	value: 0.75
	verified: false

	---
	### Deprem NER Training Results

	```
	precision recall f1-score support

	0 0.85 0.91 0.88 734
	1 0.77 0.84 0.80 207
	2 0.71 0.88 0.79 130
	3 0.68 0.76 0.72 94
	4 0.80 0.85 0.82 362
	5 0.63 0.59 0.61 112
	6 0.73 0.82 0.77 108
	7 0.55 0.77 0.64 78
	8 0.65 0.71 0.68 31
	9 0.70 0.85 0.76 117

	micro avg 0.77 0.85 0.81 1973
	macro avg 0.71 0.80 0.75 1973
	weighted avg 0.77 0.85 0.81 1973
	samples avg 0.82 0.87 0.83 1973
	```

	### Preprocessing Funcs
	```
	tr_stopwords = stopwords.words('turkish')
	tr_stopwords.append("hic")
	tr_stopwords.append("dm")
	tr_stopwords.append("vs")
	tr_stopwords.append("ya")

	def remove_punct(tok):
	tok = re.sub(r'[^\w\s]', '', tok)
	return tok

	def normalize(tok):
	if tok.isdigit():
	tok = "digit"
	return tok

	def clean(tok):
	tok = remove_punct(tok)
	tok = normalize(tok)

	return tok

	def exceptions(tok):
	if not tok.isdigit() and len(tok)==1:
	return False

	if not tok:
	return False

	if tok in tr_stopwords:
	return False

	if tok.startswith('#') or tok.startswith("@"):
	return False

	return True


	sm_tok = lambda text: [clean(tok) for tok in text.split(" ") if exceptions(tok)]
	```

	### Other HyperParams
	```
	training_args = TrainingArguments(
	output_dir="./output",
	evaluation_strategy="epoch",
	per_device_train_batch_size=32,
	per_device_eval_batch_size=32,
	weight_decay=0.01,
	report_to=None,
	num_train_epochs=4
	)
	```

	```
	class_weights[0] = 1.0
	class_weights[1] = 1.5167249178108022
	class_weights[2] = 1.7547338578655642
	class_weights[3] = 1.9610520059358458
	class_weights[4] = 1.269341370129623
	class_weights[5] = 1.8684086209021484
	class_weights[6] = 1.8019018017117145
	class_weights[7] = 2.110648663094536
	class_weights[8] = 3.081208739200435
	class_weights[9] = 1.7994815143101963
	```

	Threshold: 0.25

	```