Create README.md

9d0d17d verified 10 months ago

5.65 kB

	---
	license: mit
	language: en
	tags:
	- Pre-CoFactv3
	- Text-Classification
	datasets:
	- FACTIFY5WQA
	metrics:
	- accuracy
	pipeline_tag: text-classification
	library_name: transformers
	base_model: microsoft/deberta-v3-large
	widget:
	- text: "BREAKING: Another nearly 1.9 million Americans filed for unemployment insurance last week, the Department of Labor said. https://t.co/dVwyI6avmx [SEP] By Anneken Tappe, CNN BusinessUpdated 11:50 AM ET, Thu June 4, 2020 New York (CNN Business)Millions of Americans again filed for unemployment benefits last week, as the coronavirus recession drags on."
	example_title: "Support"
	- text: "Micah Richards spent an entire season at Aston Vila without playing a single game. [SEP] Despite speculation that Richards would leave Aston Villa before the transfer deadline for the 2018~19 season , he remained at the club , although he is not being considered for first team selection."
	example_title: "Neutral"
	- text: "Mahatma Gandhi having breakfast with British official inside the jail. [SEP] A photo is being shared on Facebook with a claim that Gandhi was having breakfast with British officials inside the jail while people are fighting for Independence. Let\u2019s try to check the authenticity of the image in the post. Claim: Mahatma Gandhi having breakfast with British official inside the jail. Fact: The photo was not taken inside the jail. It was taken during a breakfast meeting between Gandhi and Mountbatten at Viceroy\u2019s House in April 1947. Hence the claim made in the post is FALSE. When the image in the post is run Google Reverse Image Search, a link to Getty Images website containing the same image can be found in the search results. In that website, the image has a description which reads, \u201cBreakfast meeting between Mahatma Gandhi and Viceroy of India, Lord Mountbatten 1947\u201d. Also, in the book \u2018India Remembered\u2019 written by Pamela Mountbatten (the daughter of Lord Mountbatten), the same image can be found in the \u2018A Huge Task\u2019 chapter. She writes that the photo was taken on 1st April 1947 at the Viceroy\u2019s House. The Viceroy invited Gandhi for breakfast to discuss the transfer of power, declared by England\u2019s PM Clement R. Atlee in February 1947. So, the photo was not taken inside the jail. To sum it up, the photo was taken in April 1947 at the Viceroy\u2019s house, not inside the jail. Did you watch our Facebook live on Fake News (Misinformation)."
	example_title: "Refute"
	---

	# Pre-CoFactv3-Text-Classification

	## Model description

	This is a Text Classification model for AAAI 2024 Workshop Paper: “Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning”

	Its input are claim and evidence, and output is the predicted label, which falls into one of the categories: Support, Neutral, or Refute.

	It is fine-tuned by FACTIFY5WQA dataset based on [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) model.

	For more details, you can see our paper or [GitHub](https://github.com/AndyChiangSH/Pre-CoFactv3).

	## How to use?

	1. Download the model by hugging face transformers.
	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model = AutoModelForSequenceClassification.from_pretrained("AndyChiang/Pre-CoFactv3-Text-Classification")
	tokenizer = AutoTokenizer.from_pretrained("AndyChiang/Pre-CoFactv3-Text-Classification")
	```

	2. Create a pipeline.
	```python
	classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
	```

	3. Use the pipeline to predict the label.
	```python
	label = classifier("Micah Richards spent an entire season at Aston Vila without playing a single game. [SEP] Despite speculation that Richards would leave Aston Villa before the transfer deadline for the 2018~19 season , he remained at the club , although he is not being considered for first team selection.")
	print(label)
	```

	## Dataset

	We utilize the dataset FACTIFY5WQA provided by the AAAI-24 Workshop Factify 3.0.

	This dataset is designed for fact verification, with the task of determining the veracity of a claim based on the given evidence.

	- claim: the statement to be verified.
	- evidence: the facts to verify the claim.
	- question: the questions generated from the claim by the 5W framework (who, what, when, where, and why).
	- claim_answer: the answers derived from the claim.
	- evidence_answer: the answers derived from the evidence.
	- label: the veracity of the claim based on the given evidence, which is one of three categories: Support, Neutral, or Refute.

	\| \| Training \| Validation \| Testing \| Total \|
	\| --- \| --- \| --- \| --- \| --- \|
	\| Support \| 3500 \| 750 \| 750 \| 5000 \|
	\| Neutral \| 3500 \| 750 \| 750 \| 5000 \|
	\| Refute \| 3500 \| 750 \| 750 \| 5000 \|
	\| Total \| 10500 \| 2250 \| 2250 \| 15000 \|

	## Fine-tuning

	Fine-tuning is conducted by the Hugging Face Trainer API on the [Text Classification](https://huggingface.co/docs/transformers/tasks/sequence_classification) task.

	### Training hyperparameters

	The following hyperparameters were used during training:

	- Pre-train language model: [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large)
	- Optimizer: adam
	- Learning rate: 0.00001
	- Max token of input: 650
	- Batch size: 4
	- Epoch: 12
	- Device: NVIDIA RTX A5000

	## Testing

	In the case of the Text Classification task, accuracy serves as the evaluation metric.

	\| Accuracy \|
	\| ----- \|
	\| 0.8502 \|

	## Other models

	[AndyChiang/Pre-CoFactv3-Question-Answering](https://huggingface.co/AndyChiang/Pre-CoFactv3-Question-Answering)

	## Citation