{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "730ba509", "metadata": {}, "outputs": [], "source": [ "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = \"all\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "d9acd4b6", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import sys\n", "proj_dir = Path.cwd().parent\n", "\n", "sys.path.append(str(proj_dir))\n" ] }, { "cell_type": "code", "execution_count": 4, "id": "62452860", "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset" ] }, { "cell_type": "code", "execution_count": 10, "id": "9264a232", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using custom data configuration derek-thomas--dataset-creator-askreddit-806417599346c17a\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Downloading and preparing dataset None/None to /Users/derekthomas/.cache/huggingface/datasets/derek-thomas___parquet/derek-thomas--dataset-creator-askreddit-806417599346c17a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b65ec8c7f33a40eeac5d15e6a527f830", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Downloading data files: 0%| | 0/1 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2d93949f1f0144779349c73c58a68ca9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Extracting data files: 0%| | 0/1 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating all_days split: 0%| | 0/2468888 [00:00, ? examples/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Dataset parquet downloaded and prepared to /Users/derekthomas/.cache/huggingface/datasets/derek-thomas___parquet/derek-thomas--dataset-creator-askreddit-806417599346c17a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec. Subsequent calls will reuse this data.\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0e62c7e8b3c74aa5af3b87ab17e6cb1f", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/1 [00:00, ?it/s]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dataset = load_dataset('derek-thomas/dataset-creator-askreddit', download_mode=\"reuse_cache_if_exists\", ignore_verifications=True)" ] }, { "cell_type": "code", "execution_count": 12, "id": "ba84be68", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | score | \n", "num_comments | \n", "title | \n", "permalink | \n", "selftext | \n", "url | \n", "created_utc | \n", "author | \n", "id | \n", "downs | \n", "ups | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2 | \n", "4 | \n", "Reddit, if someone had to describe you to a st... | \n", "/r/AskReddit/comments/15sn6y/reddit_if_someone... | \n", "They would be talking about you without your p... | \n", "http://www.reddit.com/r/AskReddit/comments/15s... | \n", "2013-01-01 23:59:40 | \n", "[deleted] | \n", "15sn6y | \n", "0 | \n", "2 | \n", "
1 | \n", "5 | \n", "24 | \n", "What kind of car does the average \\nRedditor d... | \n", "/r/AskReddit/comments/15sn6m/what_kind_of_car_... | \n", "I've always wanted to know what kind of car th... | \n", "http://www.reddit.com/r/AskReddit/comments/15s... | \n", "2013-01-01 23:59:31 | \n", "PaytonAdams | \n", "15sn6m | \n", "0 | \n", "5 | \n", "
2 | \n", "1 | \n", "5 | \n", "What movies have made you go back to the theat... | \n", "/r/AskReddit/comments/15sn6b/what_movies_have_... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/15s... | \n", "2013-01-01 23:59:20 | \n", "[deleted] | \n", "15sn6b | \n", "0 | \n", "1 | \n", "
3 | \n", "0 | \n", "18 | \n", "Worst fear(s)? | \n", "/r/AskReddit/comments/15sn4u/worst_fears/ | \n", "So what is your worst fear, reddit? | \n", "http://www.reddit.com/r/AskReddit/comments/15s... | \n", "2013-01-01 23:58:37 | \n", "[deleted] | \n", "15sn4u | \n", "0 | \n", "0 | \n", "
4 | \n", "11 | \n", "29 | \n", "If there was a type of ink that lasted only fo... | \n", "/r/AskReddit/comments/15sn44/if_there_was_a_ty... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/15s... | \n", "2013-01-01 23:58:15 | \n", "Honeybeard | \n", "15sn44 | \n", "0 | \n", "11 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
3293628 | \n", "1 | \n", "1 | \n", "Help me get an idea of cost of living | \n", "/r/AskReddit/comments/2cjj63/help_me_get_an_id... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/2cj... | \n", "2014-08-04 00:01:20 | \n", "bbent4698 | \n", "2cjj63 | \n", "0 | \n", "1 | \n", "
3293629 | \n", "2 | \n", "0 | \n", "If you used a prism to separate light and then... | \n", "/r/AskReddit/comments/2cjj5v/if_you_used_a_pri... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/2cj... | \n", "2014-08-04 00:01:19 | \n", "Ajmb_88 | \n", "2cjj5v | \n", "0 | \n", "2 | \n", "
3293630 | \n", "0 | \n", "11 | \n", "Reddit, what was it like the first time you go... | \n", "/r/AskReddit/comments/2cjj4s/reddit_what_was_i... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/2cj... | \n", "2014-08-04 00:01:01 | \n", "da-gonzo | \n", "2cjj4s | \n", "0 | \n", "0 | \n", "
3293631 | \n", "1452 | \n", "3140 | \n", "People who refuse to be organ donors, why do y... | \n", "/r/AskReddit/comments/2cjj31/people_who_refuse... | \n", "R.I.P my inbox | \n", "http://www.reddit.com/r/AskReddit/comments/2cj... | \n", "2014-08-04 00:00:36 | \n", "JohnnySniperr | \n", "2cjj31 | \n", "0 | \n", "1452 | \n", "
3293632 | \n", "2 | \n", "9 | \n", "What always happens when you travel abroad? | \n", "/r/AskReddit/comments/2cjj2a/what_always_happe... | \n", "\n", " | http://www.reddit.com/r/AskReddit/comments/2cj... | \n", "2014-08-04 00:00:23 | \n", "Nicopip | \n", "2cjj2a | \n", "0 | \n", "2 | \n", "
3293633 rows × 11 columns
\n", "