42.89. Introduction#

Climate Prediction-Random Forest is a model that uses a combination of climate variables and machine learning algorithms to predict future climate conditions. The model is trained on a large dataset of climate observations and uses a random forest approach to generate predictions. The predictions are based on the relationships between the climate variables and the random forest algorithm is able to capture complex patterns in the data.

42.89.1. Importing Libraries#

# Pandas is used for data manipulation
import pandas as pd

# Use numpy to convert to arrays
import numpy as np

# Import tools needed for visualization

import matplotlib.pyplot as plt
%matplotlib inline

42.89.2. Data Exploration#

# Reading the data to a dataframe 
df = pd.read_csv('')
# displaying first 5 rows
year month day week temp_2 temp_1 average actual friend
0 2019 1 1 Fri 45 45 45.6 45 29
1 2019 1 2 Sat 44 45 45.7 44 61
2 2019 1 3 Sun 45 44 45.8 41 56
3 2019 1 4 Mon 44 41 45.9 40 53
4 2019 1 5 Tues 41 40 46.0 44 41
# the shape of our features
(348, 9)
# column names
Index(['year', 'month', 'day', 'week', 'temp_2', 'temp_1', 'average', 'actual',
# checking for null values
year       0
month      0
day        0
week       0
temp_2     0
temp_1     0
average    0
actual     0
friend     0
dtype: int64

There are no null values

42.89.3. One-Hot Encoding#

A one hot encoding allows the representation of categorical data to be more expressive.

# One-hot encode categorical features
df = pd.get_dummies(df)
year month day temp_2 temp_1 average actual friend week_Fri week_Mon week_Sat week_Sun week_Thurs week_Tues week_Wed
0 2019 1 1 45 45 45.6 45 29 True False False False False False False
1 2019 1 2 44 45 45.7 44 61 False False True False False False False
2 2019 1 3 45 44 45.8 41 56 False False False True False False False
3 2019 1 4 44 41 45.9 40 53 False True False False False False False
4 2019 1 5 41 40 46.0 44 41 False False False False False True False
print('Shape of features after one-hot encoding:', df.shape)
Shape of features after one-hot encoding: (348, 15)

42.89.4. Features and Labels#

# Labels are the values we want to predict
labels = df['actual']

# Remove the labels from the features
df = df.drop('actual', axis = 1)

# Saving feature names for later use
feature_list = list(df.columns)

42.89.5. Train Test Split#

# Using Skicit-learn to split data into training and testing sets
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
train_features, test_features, train_labels, test_labels = train_test_split(df,
                                                                            test_size = 0.20,
                                                                            random_state = 42)
print('Training Features Shape:', train_features.shape)
print('Training Labels Shape:', train_labels.shape)
print('Testing Features Shape:', test_features.shape)
print('Testing Labels Shape:', test_labels.shape)
Training Features Shape: (278, 14)
Training Labels Shape: (278,)
Testing Features Shape: (70, 14)
Testing Labels Shape: (70,)

42.89.6. Training the Forest#

# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model 
rf = RandomForestRegressor(n_estimators= 1000, random_state=42)

# Train the model on training data, train_labels);

42.89.7. Make Predictions on Test Data#

# Use the forest's predict method on the test data
predictions = rf.predict(test_features)

# Calculate the absolute errors
errors = abs(predictions - test_labels)

# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')
Mean Absolute Error: 3.78 degrees.
# Calculate mean absolute percentage error (MAPE)
mape = 100 * (errors / test_labels)

# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
Accuracy: 94.02 %.

42.89.8. Visualizing a Single Decision Tree#

Decision Tree

42.89.9. Your turn! 🚀#

You can practice your random-forest skills by following the assignment Climate Prediction-Random Forest.

42.89.10. Acknowledgments#

Thanks to Kaggle for creating the open source course Climate Prediction-Random Forest. It contributes some of the content in this chapter.