{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "f1931205-8c05-40ca-b266-c0f14e26cff3", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "import numpy as np\n", "import pandas as pd\n", "!{sys.executable} -m pip install --quiet jupyterlab_myst ipython" ] }, { "cell_type": "markdown", "id": "c35520d3", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "70c2694f-98d3-4846-a4d2-a88ac4da4a56", "metadata": {}, "source": [ "# Data Selection " ] }, { "cell_type": "markdown", "id": "3ca294fb-5274-4c7b-b293-a374877b524b", "metadata": {}, "source": [ "## Overview\n", "\n", "In this section, we'll focus on how to slice, dice, and generally get and set subsets of Pandas objects." ] }, { "cell_type": "markdown", "id": "281fa7e2", "metadata": {}, "source": [ "## Selection by label" ] }, { "cell_type": "markdown", "id": "8cfbc1d9-62b9-4f12-a249-0fb7af77d6f3", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called `chained assignment` and should be avoided." ] }, { "cell_type": "markdown", "id": "9dd00162-d4ac-4b84-9da4-9fe7e36cbcb5", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "`.loc` is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a `DatetimeIndex`. These will raise a `TypeError`." ] }, { "cell_type": "code", "execution_count": 2, "id": "19faf0a0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "dfl = pd.DataFrame(np.random.randn(5, 4),\n", " columns=list('ABCD'),\n", " index=pd.date_range('20130101', periods=5))" ] }, { "cell_type": "code", "execution_count": null, "id": "5cd6165e", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "dfl.loc[2:3]" ] }, { "cell_type": "markdown", "id": "f2b699ce-d01f-4afa-8323-c43b9df24b38", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "String likes in slicing can be convertible to the type of the index and lead to natural slicing." ] }, { "cell_type": "code", "execution_count": null, "id": "3f5fb2f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | A | \n", "B | \n", "C | \n", "D | \n", "
---|---|---|---|---|
2013-01-02 | \n", "1.481584 | \n", "-1.691289 | \n", "-0.086724 | \n", "-0.393754 | \n", "
2013-01-03 | \n", "0.476774 | \n", "0.605450 | \n", "-0.091083 | \n", "-1.410096 | \n", "
2013-01-04 | \n", "0.035828 | \n", "-0.095133 | \n", "1.377407 | \n", "0.495220 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "C | \n", "D | \n", "
---|---|---|---|---|
a | \n", "-0.829219 | \n", "1.185075 | \n", "0.093787 | \n", "-0.442140 | \n", "
b | \n", "-0.473605 | \n", "-0.317633 | \n", "-0.047595 | \n", "-1.409355 | \n", "
d | \n", "-0.721064 | \n", "1.436217 | \n", "-2.073527 | \n", "0.452794 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "C | \n", "
---|---|---|---|
d | \n", "-0.721064 | \n", "1.436217 | \n", "-2.073527 | \n", "
e | \n", "0.400573 | \n", "1.644355 | \n", "-0.021278 | \n", "
f | \n", "-0.282458 | \n", "-0.657392 | \n", "-0.091122 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | B | \n", "C | \n", "
---|---|---|
a | \n", "1.185075 | \n", "0.093787 | \n", "
b | \n", "-0.317633 | \n", "-0.047595 | \n", "
c | \n", "0.205571 | \n", "-1.191746 | \n", "
d | \n", "1.436217 | \n", "-2.073527 | \n", "
e | \n", "1.644355 | \n", "-0.021278 | \n", "
f | \n", "-0.657392 | \n", "-0.091122 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "C | \n", "D | \n", "
---|---|---|---|---|
a | \n", "-0.829219 | \n", "1.185075 | \n", "0.093787 | \n", "-0.442140 | \n", "
c | \n", "1.195465 | \n", "0.205571 | \n", "-1.191746 | \n", "-0.836474 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | 0 | \n", "2 | \n", "4 | \n", "6 | \n", "
---|---|---|---|---|
0 | \n", "0.176708 | \n", "-0.734049 | \n", "-0.874521 | \n", "0.013537 | \n", "
2 | \n", "1.809582 | \n", "0.802905 | \n", "-0.563674 | \n", "-0.466175 | \n", "
4 | \n", "0.813012 | \n", "-0.131666 | \n", "1.373226 | \n", "-0.568180 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | 4 | \n", "6 | \n", "
---|---|---|
2 | \n", "-0.563674 | \n", "-0.466175 | \n", "
4 | \n", "1.373226 | \n", "-0.568180 | \n", "
6 | \n", "-0.467455 | \n", "1.028096 | \n", "
8 | \n", "0.156377 | \n", "-0.368254 | \n", "
\n", " | 2 | \n", "6 | \n", "
---|---|---|
2 | \n", "0.802905 | \n", "-0.466175 | \n", "
6 | \n", "-1.760254 | \n", "1.028096 | \n", "
10 | \n", "-1.020584 | \n", "1.987550 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | 0 | \n", "2 | \n", "4 | \n", "6 | \n", "
---|---|---|---|---|
2 | \n", "1.809582 | \n", "0.802905 | \n", "-0.563674 | \n", "-0.466175 | \n", "
4 | \n", "0.813012 | \n", "-0.131666 | \n", "1.373226 | \n", "-0.568180 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | 2 | \n", "4 | \n", "
---|---|---|
0 | \n", "-0.734049 | \n", "-0.874521 | \n", "
2 | \n", "0.802905 | \n", "-0.563674 | \n", "
4 | \n", "-0.131666 | \n", "1.373226 | \n", "
6 | \n", "-1.760254 | \n", "-0.467455 | \n", "
8 | \n", "-1.629683 | \n", "0.156377 | \n", "
10 | \n", "-1.020584 | \n", "-0.194566 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " |
---|
0 | \n", "
1 | \n", "
2 | \n", "
3 | \n", "
4 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | B | \n", "
---|---|
0 | \n", "-0.491934 | \n", "
1 | \n", "-0.758957 | \n", "
2 | \n", "1.793034 | \n", "
3 | \n", "-0.330006 | \n", "
4 | \n", "1.362746 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "
---|---|---|
4 | \n", "0.238833 | \n", "1.362746 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "C | \n", "D | \n", "
---|---|---|---|---|
b | \n", "0.206097 | \n", "0.325348 | \n", "-0.811762 | \n", "0.696057 | \n", "
c | \n", "1.369032 | \n", "1.861469 | \n", "0.355490 | \n", "0.416873 | \n", "
d | \n", "0.028375 | \n", "0.855487 | \n", "0.998617 | \n", "-1.899382 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "
---|---|---|
a | \n", "-0.889154 | \n", "-0.228248 | \n", "
b | \n", "0.206097 | \n", "0.325348 | \n", "
c | \n", "1.369032 | \n", "1.861469 | \n", "
d | \n", "0.028375 | \n", "0.855487 | \n", "
e | \n", "-0.344703 | \n", "1.783202 | \n", "
f | \n", "-0.660587 | \n", "0.034734 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "
---|---|---|
a | \n", "-0.889154 | \n", "-0.228248 | \n", "
b | \n", "0.206097 | \n", "0.325348 | \n", "
c | \n", "1.369032 | \n", "1.861469 | \n", "
d | \n", "0.028375 | \n", "0.855487 | \n", "
e | \n", "-0.344703 | \n", "1.783202 | \n", "
f | \n", "-0.660587 | \n", "0.034734 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "\n", " | A | \n", "B | \n", "
---|---|---|
a | \n", "1 | \n", "4 | \n", "
c | \n", "3 | \n", "6 | \n", "
Let's visualize it! 🎥
\n", "Let's visualize it! 🎥
\n", "