amandakonet's picture
Update app.py
a1beca9
## SETUP #####################################################################################################################
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import streamlit as st
import pandas as pd
from PIL import Image
import os
# load model
model = AutoModelForSequenceClassification.from_pretrained("amandakonet/climatebert-fact-checking")
tokenizer = AutoTokenizer.from_pretrained("amandakonet/climatebert-fact-checking")
# read in example
ex_claims = ['Global warming is driving polar bears toward extinction',
'Global warming is driving polar bears toward extinction',
'Global warming is driving polar bears toward extinction',
'Global warming is driving polar bears toward extinction',
'Global warming is driving polar bears toward extinction',
'Climate skeptics argue temperature records have been adjusted in recent years to make the past appear cooler and the present warmer, although the Carbon Brief showed that NOAA has actually made the past warmer, evening out the difference.',
'Climate skeptics argue temperature records have been adjusted in recent years to make the past appear cooler and the present warmer, although the Carbon Brief showed that NOAA has actually made the past warmer, evening out the difference.',
'Climate skeptics argue temperature records have been adjusted in recent years to make the past appear cooler and the present warmer, although the Carbon Brief showed that NOAA has actually made the past warmer, evening out the difference.',
'Climate skeptics argue temperature records have been adjusted in recent years to make the past appear cooler and the present warmer, although the Carbon Brief showed that NOAA has actually made the past warmer, evening out the difference.',
'Climate skeptics argue temperature records have been adjusted in recent years to make the past appear cooler and the present warmer, although the Carbon Brief showed that NOAA has actually made the past warmer, evening out the difference.',
'We don\'t expect record years every year, but the ongoing long-term warming trend is clear.',
'We don\'t expect record years every year, but the ongoing long-term warming trend is clear.',
'We don\'t expect record years every year, but the ongoing long-term warming trend is clear.',
'We don\'t expect record years every year, but the ongoing long-term warming trend is clear.',
'We don\'t expect record years every year, but the ongoing long-term warming trend is clear.',
'It is increasingly clear that the planet was significantly warmer than today several times during the past 10,000 years.',
'It is increasingly clear that the planet was significantly warmer than today several times during the past 10,000 years.',
'It is increasingly clear that the planet was significantly warmer than today several times during the past 10,000 years.',
'It is increasingly clear that the planet was significantly warmer than today several times during the past 10,000 years.',
'It is increasingly clear that the planet was significantly warmer than today several times during the past 10,000 years.',
'Eleven percent of all global greenhouse gas emissions caused by humans are caused by deforestation, comparable to the emissions from all of the cars and trucks on the planet.',
'Eleven percent of all global greenhouse gas emissions caused by humans are caused by deforestation, comparable to the emissions from all of the cars and trucks on the planet.',
'Eleven percent of all global greenhouse gas emissions caused by humans are caused by deforestation, comparable to the emissions from all of the cars and trucks on the planet.',
'Eleven percent of all global greenhouse gas emissions caused by humans are caused by deforestation, comparable to the emissions from all of the cars and trucks on the planet.',
'Eleven percent of all global greenhouse gas emissions caused by humans are caused by deforestation, comparable to the emissions from all of the cars and trucks on the planet.']
ex_evidence = ['Recent Research Shows Human Activity Driving Earth Towards Global Extinction Event.',
'Environmental impacts include the extinction or relocation of many species as their ecosystems change, most immediately the environments of coral reefs, mountains, and the Arctic.',
'Rising temperatures push bees to their physiological limits, and could cause the extinction of bee populations.',
'Rising global temperatures, caused by the greenhouse effect, contribute to habitat destruction, endangering various species, such as the polar bear.',
'"Bear hunting caught in global warming debate".',
'It is a major aspect of climate change, and has been demonstrated by the instrumental temperature record which shows global warming of around 1\xa0°C since the pre-industrial period, although the bulk of this (0.9°C) has occurred since 1970.',
'Improved measurement and analysis techniques have reconciled this discrepancy: corrected buoy and satellite surface temperatures are slightly cooler and corrected satellite and radiosonde measurements of the tropical troposphere are slightly warmer.',
'Reconstructions have consistently shown that the rise in the instrumental temperature record of the past 150 years is not matched in earlier centuries, and the name "hockey stick graph" was coined for figures showing a long-term decline followed by an abrupt rise in temperatures.',
'It concluded, "The weight of current multi-proxy evidence, therefore, suggests greater 20th-century warmth, in comparison with temperature levels of the previous 400 years, than was shown in the TAR.',
'In at least some areas, the recent period appears to be warmer than has been the case for a thousand or more years".',
'Climate change is a long-term, sustained trend of change in climate.',
'Using the long-term temperature trends for the earth scientists and statisticians conclude that it continues to warm through time.',
'While record-breaking years attract considerable public interest, individual years are less significant than the overall trend.',
'This long-term trend is the main cause for the record warmth of 2015 and 2016, surpassing all previous years—even ones with strong El Niño events."',
'Between 1850 and 1950 a long-term trend of gradual climate warming is observable, and during this same period the Marsham record of oak-leafing dates tended to become earlier.',
'During the Mesozoic, the world, including India, was considerably warmer than today.',
'Consequently, summers are 2.3\xa0°C (4\xa0°F) warmer in the Northern Hemisphere than in the Southern Hemisphere under similar conditions.',
'The result is a picture of relatively cool conditions in the seventeenth and early nineteenth centuries and warmth in the eleventh and early fifteenth centuries, but the warmest conditions are apparent in the twentieth century.',
"The current scientific consensus is that: Earth's climate has warmed significantly since the late 1800s.",
'However, the geological record demonstrates that Earth has remained at a fairly constant temperature throughout its history, and that the young Earth was somewhat warmer than it is today.',
'Tropical deforestation is responsible for approximately 20% of world greenhouse gas emissions.',
'Of these emissions, 65% was carbon dioxide from fossil fuel burning and industry, 11% was carbon dioxide from land use change, which is primarily due to deforestation, 16% was from methane, 6.2% was from nitrous oxide, and 2.0% was from fluorinated gases.',
'Land-use change, such as deforestation, caused about 31% of cumulative emissions over 1870–2017, coal 32%, oil 25%, and gas 10%.',
'The estimate of total CO 2 emissions includes biotic carbon emissions, mainly from deforestation.',
'The vast majority of anthropogenic carbon dioxide emissions come from combustion of fossil fuels, principally coal, oil, and natural gas, with additional contributions coming from deforestation, changes in land use, soil erosion and agriculture (including livestock).']
ex_labels = ['not enough info',
'supports',
'not enough info',
'supports',
'not enough info',
'not enough info',
'not enough info',
'not enough info',
'refutes',
'refutes',
'not enough info',
'supports',
'not enough info',
'supports',
'supports',
'not enough info',
'not enough info',
'not enough info',
'not enough info',
'not enough info',
'refutes',
'not enough info',
'refutes',
'not enough info',
'not enough info']
ex_df = pd.DataFrame({'claim' : ex_claims, 'evidence' : ex_evidence, 'label': ex_labels})
## TEXT ######################################################################################################################
# title
st.title('Combatting Climate Change Misinformation with Transformers')
st.markdown("## The Gist")
st.markdown("**Problem**🤔: Climate change misinformation spreads quickly and is difficult to combat. However, its important to do so, because climate change misinformation has direct impacts on public opinion and public policy surrounding climate change.")
st.markdown("**Solution**💡: Develop a pipeline in which users can input climate change claims... and the pipeline returns whether the claim is refuted or supported by current climate science, along with the corresponding evidence.")
st.markdown("**Approach**🔑:")
st.markdown("* There are many steps to this pipeline. Here, I focus on fine-tuning a transformer model, ClimateBERT, using the textual entailment task.")
st.markdown("* Given a {claim, evidence} pair, determine whether the climate claim is supported or refuted (or neither) by the evidence")
st.markdown("* The dataset used is Climate FEVER, a natural language inference dataset with 1,535 {claim, [evidence], [label]} tuples")
st.markdown("---")
st.markdown("## The Details")
# section 1: the context, problem; how to address
st.markdown("### Problem 🤔")
st.markdown("Misinformation about climate change spreads quickly and has direct impacts on public opinion and public policy surrounding the climate. Further, misinformation is difficult to combat, and people are able to \"verify\" false climate claims on biased sites. Ideally, people would be able to easily verify climate claims. This is where transformers come in.")
# section 2: what is misinformation? how is it combatted now? how successful is this?
st.markdown("### More about Misinformation")
st.markdown("What is misinformation? How does it spread?")
st.markdown("* **Misinformation** can be defined as “false or inaccurate information, especially that which is deliberately intended to deceive.”")
st.markdown("* It can exist in different domains, and each domain has different creators and distributors of misinformation.")
st.markdown("* Misinformation regarding climate change is often funded by conservative foundations or large energy industries such as gas, coal, and oil. (1)")
misinfo_flowchart = Image.open('images/misinfo_chart.jpeg')
st.image(misinfo_flowchart, caption='The misinformation flowchart. (1)')
st.markdown("**Why does this matter?** Through echo chambers, polarization, and feedback loops, misinformation can spread from these large organizes to the public, thus arming the public with pursausive information designed to create scepticism around and/or denial of climate change, its urgency, and climate change scientists. This is especially problematic in democratic societies, where the public, to some extent, influences governmental policy decisions (brookings). Existing research suggests that misinformation directly contributes to public support of political inaction and active stalling or rejection of pro- climate change policies (1).")
st.markdown("How is climate change misinformation combatted now? Below are a few of the ways according to the Brookings Institute:")
st.markdown("1. Asking news sources to call out misinformation")
st.markdown("2. Teaching and encouraging media literacy among the public (how to detect fake news, critical evaluation of information provided, etc.")
st.markdown("3. Governments should encourage independent journalism but avoid censoring news")
st.markdown("4. Social media platform investment in algorithmic detection of fake news")
st.markdown("However, many of the proposed solutions above require adoption of behaviors. This is difficult to acheive, particularly among news organizations and social media platforms which receive monetary benefits from misinformation in the form of ad revenue from cite usage and viewership.")
# section 3: how can transformers help?
st.markdown("### How can Transformers Help?💡")
st.markdown("**FEVER**")
st.markdown("* FEVER, or Fact Extraction and VERification, was introduced in 2018 as the first dataset containing {fact, evdience, entailment_label} information. They extracted altering sentences from Wikipedia and had annotators report the relationship between the setences: entailment, contradition, not enough information.")
st.markdown("* Since then, other researchers have expanded on this area in different domains")
st.markdown("* Here, we use Climate FEVER (3), a similar dataset developed and annotated by ")
st.markdown("**Fact Verification / Fact-Checking**")
st.markdown("* This is simply an extenstion of the textual entailment task")
st.markdown("* Given two sentences, sent1 and sent2, determine the relationship: entail, contradict, neutral")
st.markdown("* With fact verification, we can think of the sentences as claim and evidence and labels as support, refute, or not enough information to refute or support.")
# section 4: The process
# this is the pipeline in my notes (u are here highlight)
st.markdown("### The Process 🔑")
st.markdown("Imagine: A person is curious about whether a claim they heard about climate change is true. How can transformers help validate or refute the claim?")
st.markdown("1. User inputs a climate claim")
st.markdown("2. Retrieve evidence related to input claim \
- For each claim, collect N related documents. These documents are selected by finding the N documents with the highest similarity scores to the claim. A current area of research: How do we keep the set of curated documents up-to-date? Validate their contents?")
st.markdown("3. Send (claim, evidence) pairs to a transformer model. Have the model predict whether each evidence supports, refutes, or is not relevant to the claim. (📍 YOU ARE HERE!)")
st.markdown("4. Report back to the user: The supporting evidence for the claim (if any), the refuting evidence for the claim (if any). If no relevant evidence is found, report that the claim cannot be supported or refuted by current evidence.")
# section 5: my work
st.markdown("### Climate Claim Fact-Checking with Transformers")
st.markdown("My work focuses on step 3 of the process: Training a transformer model to accurately categorize (claim, evidence) as:")
st.markdown("* evidence *supports* (entails) claim")
st.markdown("* evidence *refutes* (contradicts) claim")
st.markdown("* evidence *does not provide enough info to support or refute* (neutral) claim")
st.markdown("For this project, I fine-tune ClimateBERT (4) on the text entailment task")
st.markdown("* ClimateBERT is a domain-adapted DistilRoBERTa model")
st.markdown("* Corpus used has ~1.7M climate-related passages taken from news articles, research abstracts, and corporate climate reports")
## EXAMPLE ###################################################################################################################
st.markdown("## Try it out!")
# select climate claim
option_claim = st.selectbox('Select a climate claim to test', ex_df['claim'].unique())
# filter df to selected claim
filtered_df = ex_df[ex_df['claim'] == option_claim]
# select evidence
option_evidence = st.selectbox('Select evidence to test', filtered_df['evidence'].unique())
st.markdown("Now, we can use your selected (claim, evidence) pair in the fine-tuned transformer!")
# tokenize
features = tokenizer(option_claim, option_evidence,
padding='max_length', truncation=True, return_tensors="pt", max_length=512)
# set model into eval
model.eval()
with torch.no_grad():
scores = model(**features).logits
label_mapping = ['supports', 'refutes', 'not enough info']
labels = [label_mapping[score_max] for score_max in scores.argmax(dim=1)]
st.write("**Claim:**", option_claim)
st.write("**Evidence**", option_evidence)
st.write("**The predicted label is**:", labels[0])
# clean up true label
true_label = list(filtered_df[filtered_df['evidence'] == option_evidence]['label'])[0]
st.write("**The true label is**", true_label)
st.write("Check out my github repository to try out custom claim and evidence pairs, linked under references.")
# section 6: analysis
st.markdown("## Critical Analysis")
st.markdown("What else could we do?")
st.markdown("* Given more data, the performance of the model can be greatly improved. This is just a proof of concept")
st.markdown("* This is only one small part of the puzzle!")
st.markdown("* In the complete pipeline (from user input to final output), we could move from just outputting evidence to training a transformer to reply with persuasive evidence. That is, instead of simply saying, \"This claim is supported by this evidence\", the model could transform the evidence into a persuasive argument, thus combatting climate change misinfo in a more platable and convincing way.")
# References + Resource Links
st.markdown("## Resource Links")
st.markdown("### References")
st.markdown("0. My [huggingface model card](https://huggingface.co/amandakonet/climatebert-fact-checking), [adopted Climate FEVER dataset card](https://huggingface.co/datasets/amandakonet/climate_fever_adopted), and [project code on github](https://github.com/amandakonet/climate-change-misinformation)")
st.markdown("1. https://www.carbonbrief.org/guest-post-how-climate-change-misinformation-spreads-online")
st.markdown("2. https://www.brookings.edu/research/how-to-combat-fake-news-and-disinformation/")
st.markdown("3. Climate FEVER [paper](https://arxiv.org/abs/2012.00614), [huggingface repo](https://huggingface.co/datasets/climate_fever), and [github](https://github.com/huggingface/datasets/tree/master/datasets/climate_fever)")
st.markdown("4. [ClimateBERT](https://climatebert.ai/), [paper](https://arxiv.org/abs/2110.12010)")