Edit model card

cnn_dailymail_108_3000_1500_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 29
  • Number of training documents: 1500
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - one - year - time - people 11 -1_said_one_year_time
0 police - said - told - school - court 384 0_police_said_told_school
1 game - liverpool - league - goal - season 249 1_game_liverpool_league_goal
2 said - attack - group - police - people 91 2_said_attack_group_police
3 people - said - planet - mountain - mile 85 3_people_said_planet_mountain
4 baby - family - mother - said - cancer 77 4_baby_family_mother_said
5 labour - mr - miliband - tax - leader 54 5_labour_mr_miliband_tax
6 shark - crocodile - fish - animal - water 49 6_shark_crocodile_fish_animal
7 chelsea - arsenal - mourinho - hazard - league 37 7_chelsea_arsenal_mourinho_hazard
8 united - manchester - city - van - league 36 8_united_manchester_city_van
9 masters - round - woods - group - tournament 31 9_masters_round_woods_group
10 model - fashion - dress - woman - look 30 10_model_fashion_dress_woman
11 food - sugar - restaurant - water - vitamin 29 11_food_sugar_restaurant_water
12 race - hamilton - rosberg - grand - prix 29 12_race_hamilton_rosberg_grand
13 madrid - ronaldo - real - goal - barcelona 29 13_madrid_ronaldo_real_goal
14 england - cricket - test - cook - benaud 28 14_england_cricket_test_cook
15 clinton - president - obama - hillary - said 27 15_clinton_president_obama_hillary
16 property - house - home - market - apartment 25 16_property_house_home_market
17 fight - mayweather - pacquiao - manny - bout 25 17_fight_mayweather_pacquiao_manny
18 apple - watch - price - per - cent 24 18_apple_watch_price_per
19 celtic - rangers - game - scottish - deila 24 19_celtic_rangers_game_scottish
20 dog - animal - owner - council - dogs 21 20_dog_animal_owner_council
21 prince - royal - harry - queen - baby 21 21_prince_royal_harry_queen
22 film - actor - downey - interview - show 17 22_film_actor_downey_interview
23 hotel - flight - mile - island - room 16 23_hotel_flight_mile_island
24 bayern - guardiola - porto - dortmund - munich 14 24_bayern_guardiola_porto_dortmund
25 wedding - gabriel - noah - roxy - sandra 13 25_wedding_gabriel_noah_roxy
26 saracens - bosch - penalty - kick - rugby 12 26_saracens_bosch_penalty_kick
27 deal - summer - club - interest - contract 12 27_deal_summer_club_interest

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.6
Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.