42.37. Exploring for answers#

This is a continuation of the previous section’s assignment, where we briefly took a look at the data set. Now we will be taking a deeper look at the data.

Again, the question the client wants to know: Do yellow taxi passengers in New York City tip drivers more in the winter or summer?

Your team is in the Analyzing stage of the Data Science Lifecycle, where you are responsible for doing exploratory data analysis on the dataset. You have been provided a notebook and dataset that contains 200 taxi transactions from January and July 2019.

42.37.1. Instructions#

Below data is from the Taxi & Limousine Commission. Refer to the dataset’s dictionary and user guide for more information about the data.

Use some the techniques in this section to do your own EDA in the notebook (add cells if you’d like) and answer the following questions:

  • What other influences in the data could affect the tip amount?

  • What columns will most likely not be needed to answer the client’s questions?

  • Based on what has been provided so far, does the data seem to provide any evidence of seasonal tipping behavior?

Use the cells below to do your own Exploratory data analysis

import pandas as pd

path = '../../data/taxi.csv'

#Load the csv file into a dataframe
df = pd.read_csv(path)

#Print the dataframe
print(df)

42.37.2. Rubric#

Exemplary

Adequate

Needs Improvement

—

—

–

42.37.3. Acknowledgments#

Thanks to Microsoft for creating the open source course Data Science for Beginners. It inspires the majority of the content in this chapter.