{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "3ea82a12-74d3-40b5-919a-8e89d88e8d63", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython" ] }, { "cell_type": "markdown", "id": "30074321", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "fe9eb643", "metadata": {}, "source": [ "\n", "# Visualizing relationships: all about honey 🍯\n", "\n", "Continuing with the nature focus of our research, let's discover interesting visualizations to show the relationships between various types of honey, according to a dataset derived from the [United States Department of Agriculture](https://www.nass.usda.gov/About_NASS/index.php). \n", "\n", "This dataset of about 600 items displays honey production in many U.S. states. So, for example, you can look at the number of colonies, yield per colony, total production, stocks, price per pound, and value of the honey produced in a given state from 1998-2012, with one row per year for each state. \n", "\n", "It will be interesting to visualize the relationship between a given state's production per year and, for example, the price of honey in that state. Alternatively, you could visualize the relationship between states' honey yield per colony. This year's span covers the devastating 'CCD' or '[Colony Collapse Disorder](http://npic.orst.edu/envir/ccd.html)' first seen in 2006, so it is a poignant dataset to study. 🐝\n", "\n", "In this section, you can use Seaborn, which you have used before, as a good library to visualize relationships between variables. Particularly interesting is the use of Seaborn's `relplot` function that allows scatter plots and line plots to quickly visualize '[statistical relationships](https://seaborn.pydata.org/tutorial/relational.html?highlight=relationships)', which allows the data scientist to better understand how variables relate to each other.\n", "\n", "## Scatterplots\n", "\n", "Use a scatterplot to show how the price of honey has evolved, year over year, per state. Seaborn, using `relplot`, conveniently groups the state data and displays data points for both categorical and numeric data. \n", "\n", "Let's start by importing the data and Seaborn:" ] }, { "cell_type": "code", "execution_count": 2, "id": "8348f573", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output-scoll", "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", " | state | \n", "numcol | \n", "yieldpercol | \n", "totalprod | \n", "stocks | \n", "priceperlb | \n", "prodvalue | \n", "year | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "AL | \n", "16000.0 | \n", "71 | \n", "1136000.0 | \n", "159000.0 | \n", "0.72 | \n", "818000.0 | \n", "1998 | \n", "
1 | \n", "AZ | \n", "55000.0 | \n", "60 | \n", "3300000.0 | \n", "1485000.0 | \n", "0.64 | \n", "2112000.0 | \n", "1998 | \n", "
2 | \n", "AR | \n", "53000.0 | \n", "65 | \n", "3445000.0 | \n", "1688000.0 | \n", "0.59 | \n", "2033000.0 | \n", "1998 | \n", "
3 | \n", "CA | \n", "450000.0 | \n", "83 | \n", "37350000.0 | \n", "12326000.0 | \n", "0.62 | \n", "23157000.0 | \n", "1998 | \n", "
4 | \n", "CO | \n", "27000.0 | \n", "72 | \n", "1944000.0 | \n", "1594000.0 | \n", "0.70 | \n", "1361000.0 | \n", "1998 | \n", "