{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "994a515b-161c-4c96-88d7-307dc8d05f4d", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: Ignoring invalid distribution -rotobuf (d:\\conda\\envs\\py39\\lib\\site-packages)\n", "WARNING: Ignoring invalid distribution -rotobuf (d:\\conda\\envs\\py39\\lib\\site-packages)\n", "\n", "[notice] A new release of pip is available: 23.2.1 -> 23.3.1\n", "[notice] To update, run: python.exe -m pip install --upgrade pip\n" ] } ], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython birds" ] }, { "cell_type": "markdown", "id": "00a3829e", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "11821a59", "metadata": {}, "source": [ "# Visualizing distributions\n", "\n", "In the previous section, you learned some interesting facts about a dataset about the birds of Minnesota. You found some erroneous data by visualizing outliers and looked at the differences between bird categories by their maximum length.\n", "\n", "## Explore the birds dataset\n", "\n", "Another way to dig into data is by looking at its distribution, or how the data is organized along an axis. Perhaps, for example, you'd like to learn about the general distribution, for this dataset, of the maximum wingspan or maximum body mass for the birds of Minnesota. \n", "\n", "Let's discover some facts about the distributions of data in this dataset." ] }, { "cell_type": "code", "execution_count": 2, "id": "54f5b4ef", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "output_scroll", "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "ScientificName | \n", "Category | \n", "Order | \n", "Family | \n", "Genus | \n", "ConservationStatus | \n", "MinLength | \n", "MaxLength | \n", "MinBodyMass | \n", "MaxBodyMass | \n", "MinWingspan | \n", "MaxWingspan | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Black-bellied whistling-duck | \n", "Dendrocygna autumnalis | \n", "Ducks/Geese/Waterfowl | \n", "Anseriformes | \n", "Anatidae | \n", "Dendrocygna | \n", "LC | \n", "47.0 | \n", "56.0 | \n", "652.0 | \n", "1020.0 | \n", "76.0 | \n", "94.0 | \n", "
1 | \n", "Fulvous whistling-duck | \n", "Dendrocygna bicolor | \n", "Ducks/Geese/Waterfowl | \n", "Anseriformes | \n", "Anatidae | \n", "Dendrocygna | \n", "LC | \n", "45.0 | \n", "53.0 | \n", "712.0 | \n", "1050.0 | \n", "85.0 | \n", "93.0 | \n", "
2 | \n", "Snow goose | \n", "Anser caerulescens | \n", "Ducks/Geese/Waterfowl | \n", "Anseriformes | \n", "Anatidae | \n", "Anser | \n", "LC | \n", "64.0 | \n", "79.0 | \n", "2050.0 | \n", "4050.0 | \n", "135.0 | \n", "165.0 | \n", "
3 | \n", "Ross's goose | \n", "Anser rossii | \n", "Ducks/Geese/Waterfowl | \n", "Anseriformes | \n", "Anatidae | \n", "Anser | \n", "LC | \n", "57.3 | \n", "64.0 | \n", "1066.0 | \n", "1567.0 | \n", "113.0 | \n", "116.0 | \n", "
4 | \n", "Greater white-fronted goose | \n", "Anser albifrons | \n", "Ducks/Geese/Waterfowl | \n", "Anseriformes | \n", "Anatidae | \n", "Anser | \n", "LC | \n", "64.0 | \n", "81.0 | \n", "1930.0 | \n", "3310.0 | \n", "130.0 | \n", "165.0 | \n", "