{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "f1931205-8c05-40ca-b266-c0f14e26cff3", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "import numpy as np\n", "import pandas as pd\n", "!{sys.executable} -m pip install --quiet jupyterlab_myst ipython" ] }, { "cell_type": "markdown", "id": "c35520d3", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "id": "70c2694f-98d3-4846-a4d2-a88ac4da4a56", "metadata": {}, "source": [ "# Data Selection " ] }, { "cell_type": "markdown", "id": "3ca294fb-5274-4c7b-b293-a374877b524b", "metadata": {}, "source": [ "## Overview\n", "\n", "In this section, we'll focus on how to slice, dice, and generally get and set subsets of Pandas objects." ] }, { "cell_type": "markdown", "id": "281fa7e2", "metadata": {}, "source": [ "## Selection by label" ] }, { "cell_type": "markdown", "id": "8cfbc1d9-62b9-4f12-a249-0fb7af77d6f3", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called `chained assignment` and should be avoided." ] }, { "cell_type": "markdown", "id": "9dd00162-d4ac-4b84-9da4-9fe7e36cbcb5", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "`.loc` is strict when you present slicers that are not compatible (or convertible) with the index type. For example using integers in a `DatetimeIndex`. These will raise a `TypeError`." ] }, { "cell_type": "code", "execution_count": 2, "id": "19faf0a0", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [], "source": [ "dfl = pd.DataFrame(np.random.randn(5, 4),\n", " columns=list('ABCD'),\n", " index=pd.date_range('20130101', periods=5))" ] }, { "cell_type": "code", "execution_count": null, "id": "5cd6165e", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "dfl.loc[2:3]" ] }, { "cell_type": "markdown", "id": "f2b699ce-d01f-4afa-8323-c43b9df24b38", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "String likes in slicing can be convertible to the type of the index and lead to natural slicing." ] }, { "cell_type": "code", "execution_count": null, "id": "3f5fb2f0", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-021.481584-1.691289-0.086724-0.393754
2013-01-030.4767740.605450-0.091083-1.410096
2013-01-040.035828-0.0951331.3774070.495220
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 1.481584 -1.691289 -0.086724 -0.393754\n", "2013-01-03 0.476774 0.605450 -0.091083 -1.410096\n", "2013-01-04 0.035828 -0.095133 1.377407 0.495220" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfl.loc['20130102':'20130104']" ] }, { "cell_type": "code", "execution_count": null, "id": "abe5968b-ffe5-4302-9918-81a1d97ed568", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "f3c046d5-19dc-47cb-828f-880f008d02d4", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Pandas will raise a `KeyError` if indexing with a list with missing labels." ] }, { "cell_type": "markdown", "id": "29221dda", "metadata": {}, "source": [ "Pandas provides a suite of methods in order to have **purely label-based indexing**. This is a strict inclusion-based protocol. Every label asked for must be in the index, or a `KeyError` will be raised. When slicing, both the start bound **AND** the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label **and not the position**.\n", "\n", "- The `.loc` attribute is the primary access method. The following are valid inputs:\n", "\n", "- A single label, e.g. `5` or `'a'` (Note that `5` is interpreted as a label of the index. This use is not an integer position along the index.).\n", "\n", "- A list or array of labels `['a', 'b', 'c']`.\n", "\n", "- A slice object with labels `'a':'f'` (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index!\n", "\n", "- A boolean array.\n", "\n", "- A `callable`." ] }, { "cell_type": "code", "execution_count": null, "id": "8a174f11", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "c 0.152695\n", "d -0.615396\n", "e 0.203773\n", "f 1.487611\n", "dtype: float64" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 = pd.Series(np.random.randn(6), index=list('abcdef'))\n", "s1\n", "s1.loc['c':]" ] }, { "cell_type": "code", "execution_count": null, "id": "b276bd82-797f-4eb6-8886-51153d771bb0", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "11e56acc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.8417073794042274" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.loc['b']" ] }, { "cell_type": "code", "execution_count": null, "id": "74a7ae51-b334-4d5f-b9a2-e2080958663f", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "eb2dbf2d-cdd9-42e4-b374-fc7944f1996f", "metadata": {}, "source": [ "Note that the setting works as well:" ] }, { "cell_type": "code", "execution_count": null, "id": "8fe78c41", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a -0.221293\n", "b 1.841707\n", "c 0.000000\n", "d 0.000000\n", "e 0.000000\n", "f 0.000000\n", "dtype: float64" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.loc['c':] = 0\n", "s1" ] }, { "cell_type": "code", "execution_count": null, "id": "e32f82e4-6b3e-48a7-ab56-c6ea820274e5", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cfb25d9f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
a-0.8292191.1850750.093787-0.442140
b-0.473605-0.317633-0.047595-1.409355
d-0.7210641.436217-2.0735270.452794
\n", "
" ], "text/plain": [ " A B C D\n", "a -0.829219 1.185075 0.093787 -0.442140\n", "b -0.473605 -0.317633 -0.047595 -1.409355\n", "d -0.721064 1.436217 -2.073527 0.452794" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame(np.random.randn(6, 4),\n", " index=list('abcdef'),\n", " columns=list('ABCD'))\n", "df1\n", "df1.loc[['a', 'b', 'd'], :]" ] }, { "cell_type": "code", "execution_count": null, "id": "de1a7123-2c8e-4910-b435-cdd489baff5b", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "0493a65b-5915-4119-a2a8-00b0b8a728b0", "metadata": {}, "source": [ "Accessing via label slices:" ] }, { "cell_type": "code", "execution_count": null, "id": "2934e9e8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
d-0.7210641.436217-2.073527
e0.4005731.644355-0.021278
f-0.282458-0.657392-0.091122
\n", "
" ], "text/plain": [ " A B C\n", "d -0.721064 1.436217 -2.073527\n", "e 0.400573 1.644355 -0.021278\n", "f -0.282458 -0.657392 -0.091122" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc['d':, 'A':'C']" ] }, { "cell_type": "markdown", "id": "29f4ac9b-3d4b-4199-a04a-b9ec886b26f6", "metadata": {}, "source": [ "For getting a cross-section using a label (equivalent to `df.xs('a')`):" ] }, { "cell_type": "code", "execution_count": null, "id": "ccbffe12", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A -0.829219\n", "B 1.185075\n", "C 0.093787\n", "D -0.442140\n", "Name: a, dtype: float64" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc['a']" ] }, { "cell_type": "code", "execution_count": null, "id": "c9570d12-8020-4328-94e8-91266619e666", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "589a2a99", "metadata": {}, "source": [ "For getting values with a boolean array:" ] }, { "cell_type": "code", "execution_count": null, "id": "e60fdddf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "A False\n", "B True\n", "C True\n", "D False\n", "Name: a, dtype: bool" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc['a'] > 0" ] }, { "cell_type": "code", "execution_count": null, "id": "4a9f2648-9f92-4077-a7ec-00836c2f28fd", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d6226934", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BC
a1.1850750.093787
b-0.317633-0.047595
c0.205571-1.191746
d1.436217-2.073527
e1.644355-0.021278
f-0.657392-0.091122
\n", "
" ], "text/plain": [ " B C\n", "a 1.185075 0.093787\n", "b -0.317633 -0.047595\n", "c 0.205571 -1.191746\n", "d 1.436217 -2.073527\n", "e 1.644355 -0.021278\n", "f -0.657392 -0.091122" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc[:, df1.loc['a'] > 0]" ] }, { "cell_type": "code", "execution_count": null, "id": "f8ae65cd-dbea-4f40-a464-7b07554b9b11", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "0e52a617", "metadata": {}, "source": [ "NA values in a boolean array propagate as `False`:" ] }, { "cell_type": "code", "execution_count": null, "id": "0ca93c29", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "\n", "[True, False, True, False, , False]\n", "Length: 6, dtype: boolean" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mask = pd.array([True, False, True, False, pd.NA, False], dtype=\"boolean\")\n", "mask" ] }, { "cell_type": "code", "execution_count": null, "id": "fd577bd5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
a-0.8292191.1850750.093787-0.442140
c1.1954650.205571-1.191746-0.836474
\n", "
" ], "text/plain": [ " A B C D\n", "a -0.829219 1.185075 0.093787 -0.442140\n", "c 1.195465 0.205571 -1.191746 -0.836474" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[mask]" ] }, { "cell_type": "code", "execution_count": null, "id": "4f1b5f67-5c56-4e47-8953-4d6383f283e1", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "2ff30b9c", "metadata": {}, "source": [ "For getting a value explicitly:" ] }, { "cell_type": "code", "execution_count": null, "id": "7e425a66", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.8292186214151204" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc['a', 'A'] # this is also equivalent to ``df1.at['a','A']``" ] }, { "cell_type": "code", "execution_count": null, "id": "50e88f3d-07f0-443d-994c-d7fb36c4dc7a", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "b29c0cd3", "metadata": {}, "source": [ "## Slicing with labels\n", "\n", "When using `.loc` with slices, if both the start and the stop labels are present in the index, then elements located between the two (including them) are returned:" ] }, { "cell_type": "code", "execution_count": null, "id": "2bd13eab", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 b\n", "2 c\n", "5 d\n", "dtype: object" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4])\n", "s.loc[3:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "63081450-8216-403c-8b53-04b2cc18e442", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "0a1f8d46", "metadata": {}, "source": [ "If at least one of the two is absent, but the index is sorted, and can be compared against start and stop labels, then slicing will still work as expected, by selecting labels which rank between the two:" ] }, { "cell_type": "code", "execution_count": null, "id": "a08caf62", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 a\n", "2 c\n", "3 b\n", "4 e\n", "5 d\n", "dtype: object" ] }, "execution_count": 99, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.sort_index()" ] }, { "cell_type": "code", "execution_count": null, "id": "7d665bb1-9bd1-4826-9a0f-f13496d64549", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a5f5d2ba", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2 c\n", "3 b\n", "4 e\n", "5 d\n", "dtype: object" ] }, "execution_count": 101, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.sort_index().loc[1:6]" ] }, { "cell_type": "code", "execution_count": null, "id": "81114a6f-4511-4f2e-990b-c7edd5e4cf86", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "5115e1d2", "metadata": {}, "source": [ "However, if at least one of the two is absent and the index is not sorted, an error will be raised (since doing otherwise would be computationally expensive, as well as potentially ambiguous for mixed-type indexes). For instance, in the above example, `s.loc[1:6]` would raise `KeyError`." ] }, { "cell_type": "code", "execution_count": null, "id": "318b8e37", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3 b\n", "2 c\n", "5 d\n", "dtype: object" ] }, "execution_count": 103, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(list('abcdef'), index=[0, 3, 2, 5, 4, 2])\n", "s.loc[3:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "537dd0b6-b4fc-468b-88a4-5d828eba5ed8", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "ce05682d", "metadata": {}, "source": [ "\n", "Also, if the index has duplicate labels and either the start or the stop label is duplicated, an error will be raised. For instance, in the above example, `s.loc[2:5]` would raise a `KeyError`.\n", "\n", "## Selection by position" ] }, { "cell_type": "markdown", "id": "099c8fa7-d8df-4304-a513-1b142c1021d5", "metadata": { "attributes": { "classes": [ "warning" ], "id": "" } }, "source": [ "Whether a copy or a reference is returned for a setting operation, may depend on the context. This is sometimes called `chained assignment` and should be avoided." ] }, { "cell_type": "markdown", "id": "9c2e1dab", "metadata": {}, "source": [ "Pandas provides a suite of methods in order to get purely integer-based indexing. The semantics follow closely Python and NumPy slicing. These are 0-based indexing. When slicing, the start bound is included, while the upper bound is excluded. Trying to use a non-integer, even a valid label will raise an `IndexError`.\n", "\n", "The `.iloc` attribute is the primary access method. The following are valid inputs:\n", "\n", "- An integer e.g. `5`.\n", "\n", "- A list or array of integers `[4, 3, 0]`.\n", "\n", "- A slice object with ints `1:7`.\n", "\n", "- A boolean array.\n", "\n", "- A `callable`." ] }, { "cell_type": "code", "execution_count": null, "id": "e7b93cb1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -0.124201\n", "2 1.294954\n", "4 -0.793453\n", "dtype: float64" ] }, "execution_count": 105, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1 = pd.Series(np.random.randn(5), index=list(range(0, 10, 2)))\n", "s1\n", "s1.iloc[:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "24d4de8c-5c42-484b-89d7-e21ebb0ba7c3", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "fe63cdf3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-0.9449480012698949" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.iloc[3]" ] }, { "cell_type": "code", "execution_count": null, "id": "ed15834b-fd14-4000-bbdb-0eb86a214984", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "4ac478c2", "metadata": {}, "source": [ "Note that setting works as well:" ] }, { "cell_type": "code", "execution_count": null, "id": "9c4e8129", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.000000\n", "2 0.000000\n", "4 0.000000\n", "6 -0.944948\n", "8 0.385288\n", "dtype: float64" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s1.iloc[:3] = 0\n", "s1" ] }, { "cell_type": "code", "execution_count": null, "id": "5b793d9f-5ddb-4121-8218-8a5eda713eab", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "56ced074", "metadata": {}, "source": [ "With a DataFrame,Select via integer slicing:" ] }, { "cell_type": "code", "execution_count": null, "id": "3d55d682", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0246
00.176708-0.734049-0.8745210.013537
21.8095820.802905-0.563674-0.466175
40.813012-0.1316661.373226-0.568180
\n", "
" ], "text/plain": [ " 0 2 4 6\n", "0 0.176708 -0.734049 -0.874521 0.013537\n", "2 1.809582 0.802905 -0.563674 -0.466175\n", "4 0.813012 -0.131666 1.373226 -0.568180" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame(np.random.randn(6, 4),\n", " index=list(range(0, 12, 2)),\n", " columns=list(range(0, 8, 2)))\n", "df1\n", "df1.iloc[:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "172e44bf-8faf-42a1-b9a7-3adab79b97d1", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b5427ec6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
46
2-0.563674-0.466175
41.373226-0.568180
6-0.4674551.028096
80.156377-0.368254
\n", "
" ], "text/plain": [ " 4 6\n", "2 -0.563674 -0.466175\n", "4 1.373226 -0.568180\n", "6 -0.467455 1.028096\n", "8 0.156377 -0.368254" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[1:5, 2:4]" ] }, { "cell_type": "markdown", "id": "550715ab", "metadata": {}, "source": [ "Select via integer list:" ] }, { "cell_type": "code", "execution_count": null, "id": "d86dd6d1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
26
20.802905-0.466175
6-1.7602541.028096
10-1.0205841.987550
\n", "
" ], "text/plain": [ " 2 6\n", "2 0.802905 -0.466175\n", "6 -1.760254 1.028096\n", "10 -1.020584 1.987550" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[[1, 3, 5], [1, 3]]" ] }, { "cell_type": "code", "execution_count": null, "id": "a5e2a6ba-671b-4aab-b63d-5ab4ee92501f", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8528cc39", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0246
21.8095820.802905-0.563674-0.466175
40.813012-0.1316661.373226-0.568180
\n", "
" ], "text/plain": [ " 0 2 4 6\n", "2 1.809582 0.802905 -0.563674 -0.466175\n", "4 0.813012 -0.131666 1.373226 -0.568180" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[1:3, :]" ] }, { "cell_type": "code", "execution_count": null, "id": "178d6f69-464f-464e-ad45-fac857b9a370", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f9288433", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
24
0-0.734049-0.874521
20.802905-0.563674
4-0.1316661.373226
6-1.760254-0.467455
8-1.6296830.156377
10-1.020584-0.194566
\n", "
" ], "text/plain": [ " 2 4\n", "0 -0.734049 -0.874521\n", "2 0.802905 -0.563674\n", "4 -0.131666 1.373226\n", "6 -1.760254 -0.467455\n", "8 -1.629683 0.156377\n", "10 -1.020584 -0.194566" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[:, 1:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "71859ce4-7ad5-4bea-9df2-f5929c0c2470", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "eb3f25f3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.8029050594558378" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[1, 1] # this is also equivalent to ``df1.iat[1,1]``" ] }, { "cell_type": "code", "execution_count": null, "id": "5dad7d1a-0bf5-40d8-a4ef-2c3e573ae6fc", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "6cb0234e", "metadata": {}, "source": [ "\n", "For getting a cross-section using an integer position (equiv to `df.xs(1)`):" ] }, { "cell_type": "code", "execution_count": null, "id": "cc95030f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.809582\n", "2 0.802905\n", "4 -0.563674\n", "6 -0.466175\n", "Name: 2, dtype: float64" ] }, "execution_count": 122, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[1]" ] }, { "cell_type": "code", "execution_count": null, "id": "bfa6df43-353d-4ba4-94a0-e65c9a659468", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "bc5305b8", "metadata": {}, "source": [ "Out-of-range slice indexes are handled gracefully just as in Python/NumPy." ] }, { "cell_type": "code", "execution_count": null, "id": "0c635e2f", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "['a', 'b', 'c', 'd', 'e', 'f']" ] }, "execution_count": 124, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = list('abcdef') # these are allowed in Python/NumPy.\n", "x" ] }, { "cell_type": "code", "execution_count": null, "id": "bae9b708", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "['e', 'f']" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[4:10]" ] }, { "cell_type": "code", "execution_count": null, "id": "ccb95b2c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x[8:10]" ] }, { "cell_type": "code", "execution_count": null, "id": "fcaaeb73", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "0 a\n", "1 b\n", "2 c\n", "3 d\n", "4 e\n", "5 f\n", "dtype: object" ] }, "execution_count": 127, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series(x)\n", "s" ] }, { "cell_type": "code", "execution_count": null, "id": "19e7f165", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4 e\n", "5 f\n", "dtype: object" ] }, "execution_count": 128, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.iloc[4:10]" ] }, { "cell_type": "code", "execution_count": null, "id": "3b612356-7774-472e-849e-0f3dc267b578", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2a25cc5c", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" } }, "outputs": [ { "data": { "text/plain": [ "Series([], dtype: object)" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s.iloc[8:10]" ] }, { "cell_type": "markdown", "id": "23aa8371", "metadata": {}, "source": [ "Note that using slices that go out of bounds can result in an empty axis (e.g. an empty DataFrame being returned)." ] }, { "cell_type": "code", "execution_count": null, "id": "f9024d15", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [] }, "outputs": [], "source": [ "dfl = pd.DataFrame(np.random.randn(5, 2), columns=list('AB'))" ] }, { "cell_type": "code", "execution_count": null, "id": "5837f585", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
1
2
3
4
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: []\n", "Index: [0, 1, 2, 3, 4]" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfl.iloc[:, 2:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "4b81ac82-5d47-4410-90b9-040f0dac662b", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d0e19553", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
B
0-0.491934
1-0.758957
21.793034
3-0.330006
41.362746
\n", "
" ], "text/plain": [ " B\n", "0 -0.491934\n", "1 -0.758957\n", "2 1.793034\n", "3 -0.330006\n", "4 1.362746" ] }, "execution_count": 134, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfl.iloc[:, 1:3]" ] }, { "cell_type": "code", "execution_count": null, "id": "39dab713-a3f6-4189-bad9-cba564f56951", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f91ab868", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
40.2388331.362746
\n", "
" ], "text/plain": [ " A B\n", "4 0.238833 1.362746" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfl.iloc[4:6]" ] }, { "cell_type": "code", "execution_count": null, "id": "220aa5af-5003-45e9-87cf-c4f5d0ac6d93", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "59c65e15", "metadata": {}, "source": [ "\n", "A single indexer that is out of bounds will raise an `IndexError`. A list of indexers where any element is out of bounds will raise an `IndexError`." ] }, { "cell_type": "code", "execution_count": null, "id": "f3496be2", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "dfl.iloc[[4, 5, 6]]" ] }, { "cell_type": "code", "execution_count": null, "id": "7b081f89", "metadata": { "attributes": { "classes": [ "code-cell" ], "id": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "dfl.iloc[:, 4]" ] }, { "cell_type": "markdown", "id": "b3fe22e7", "metadata": {}, "source": [ "## Selection by callable\n", "\n", "`.loc`, `.iloc`, and also `[]` indexing can accept a `callable` as indexer. The `callable` must be a function with one argument (the calling Series or DataFrame) that returns valid output for indexing." ] }, { "cell_type": "code", "execution_count": null, "id": "72420538", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
b0.2060970.325348-0.8117620.696057
c1.3690321.8614690.3554900.416873
d0.0283750.8554870.998617-1.899382
\n", "
" ], "text/plain": [ " A B C D\n", "b 0.206097 0.325348 -0.811762 0.696057\n", "c 1.369032 1.861469 0.355490 0.416873\n", "d 0.028375 0.855487 0.998617 -1.899382" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.DataFrame(np.random.randn(6, 4),\n", " index=list('abcdef'),\n", " columns=list('ABCD'))\n", "df1\n", "df1.loc[lambda df: df['A'] > 0, :]" ] }, { "cell_type": "code", "execution_count": null, "id": "7206088f-3aa5-4392-9982-cadec553e616", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ab18a18f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
a-0.889154-0.228248
b0.2060970.325348
c1.3690321.861469
d0.0283750.855487
e-0.3447031.783202
f-0.6605870.034734
\n", "
" ], "text/plain": [ " A B\n", "a -0.889154 -0.228248\n", "b 0.206097 0.325348\n", "c 1.369032 1.861469\n", "d 0.028375 0.855487\n", "e -0.344703 1.783202\n", "f -0.660587 0.034734" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.loc[:, lambda df: ['A', 'B']]" ] }, { "cell_type": "code", "execution_count": null, "id": "2166496e-975d-4539-a3b6-54cedd012e73", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "aeb4a77e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
a-0.889154-0.228248
b0.2060970.325348
c1.3690321.861469
d0.0283750.855487
e-0.3447031.783202
f-0.6605870.034734
\n", "
" ], "text/plain": [ " A B\n", "a -0.889154 -0.228248\n", "b 0.206097 0.325348\n", "c 1.369032 1.861469\n", "d 0.028375 0.855487\n", "e -0.344703 1.783202\n", "f -0.660587 0.034734" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1.iloc[:, lambda df: [0, 1]]" ] }, { "cell_type": "code", "execution_count": null, "id": "e8fe3be5-15de-4036-ab8a-d6483abf265f", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ec331b54", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a -0.889154\n", "b 0.206097\n", "c 1.369032\n", "d 0.028375\n", "e -0.344703\n", "f -0.660587\n", "Name: A, dtype: float64" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1[lambda df: df.columns[0]]" ] }, { "cell_type": "code", "execution_count": null, "id": "31840764-a775-4e5f-8023-6c4762005ff6", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "861f0e5e", "metadata": {}, "source": [ "\n", "You can use callable indexing in `Series`." ] }, { "cell_type": "code", "execution_count": null, "id": "d4e60491", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b 0.206097\n", "c 1.369032\n", "d 0.028375\n", "Name: A, dtype: float64" ] }, "execution_count": 148, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1['A'].loc[lambda s: s > 0]" ] }, { "cell_type": "code", "execution_count": null, "id": "1d7a46f1-98ce-4d87-924a-288812c6b4ed", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "12d2d96d", "metadata": {}, "source": [ "\n", "### Combining positional and label-based indexing\n", "\n", "If you wish to get the 0th and the 2nd elements from the index in the `'A'` column, you can do:" ] }, { "cell_type": "code", "execution_count": null, "id": "978312bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "c 3\n", "Name: A, dtype: int64" ] }, "execution_count": 150, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfd = pd.DataFrame({'A': [1, 2, 3],\n", " 'B': [4, 5, 6]},\n", " index=list('abc'))\n", "dfd\n", "dfd.loc[dfd.index[[0, 2]], 'A']" ] }, { "cell_type": "code", "execution_count": null, "id": "a8844d1c-fdc5-4c85-923c-092ac6367692", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "11210c0d", "metadata": {}, "source": [ "\n", "This can also be expressed using `.iloc`, by explicitly getting locations on the indexers, and using positional indexing to select things." ] }, { "cell_type": "code", "execution_count": null, "id": "2e7e25d2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "a 1\n", "c 3\n", "Name: A, dtype: int64" ] }, "execution_count": 152, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfd.iloc[[0, 2], dfd.columns.get_loc('A')]" ] }, { "cell_type": "code", "execution_count": null, "id": "48f7feb0-9334-441f-893a-42815523e739", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "d6c36e79", "metadata": {}, "source": [ "\n", "For getting multiple indexers, using `.get_indexer`:" ] }, { "cell_type": "code", "execution_count": null, "id": "7c0b22e6", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
a14
c36
\n", "
" ], "text/plain": [ " A B\n", "a 1 4\n", "c 3 6" ] }, "execution_count": 154, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])]" ] }, { "cell_type": "code", "execution_count": null, "id": "c0924629-67d8-43b6-a435-d91bb8bf6408", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import HTML\n", "\n", "display(\n", " HTML(\n", " \"\"\"\n", "\n", "\n", "
\n", "
\n", "

Let's visualize it! 🎥

\n", "
\n", " \n", "
\n", "
\n", "
\n", "\n", "\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "markdown", "id": "d97622c3-93dd-44af-94cb-9b4e4401b11b", "metadata": {}, "source": [ "## Acknowledgments\n", "\n", "Thanks for [Pandas user guide](https://pandas.pydata.org/docs/user_guide/index.html). It contributes the majority of the content in this chapter." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 }