{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analyzing text about Data Science\n", "\n", "In this example, let's do a simple exercise that covers all steps of a traditional data science process. You do not have to write any code, you can just click on the cells below to execute them and observe the result. As a challenge, you are encouraged to try this code out with different data. \n", "\n", "## Goal\n", "\n", "In this section, we have been discussing different concepts related to Data Science. Let's try to discover more related concepts by doing some **text mining**. We will start with a text about Data Science, extract keywords from it, and then try to visualize the result.\n", "\n", "As a text, I will use the page on Data Science from Wikipedia:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ipytest\n", "import unittest\n", "import pytest\n", "\n", "ipytest.autoconfig()\n", "\n", "url = 'https://en.wikipedia.org/wiki/Data_science'" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Step 1: Getting the data\n", "\n", "First step in every data science process is getting the data. We will use `requests` library to do that as below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "def http_get(url):\n", " return requests.____.____\n", "\n", "text = http_get(url)\n", "print(text[:1000])" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "
get
method of requests
library to do that.\n",
"\n",
"