42.96. Sign Language Digits Classification with CNN#

42.96.1. Load libraries#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout, MaxPool2D, Conv2D, Flatten
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator

42.96.2. Load data from numpy file#

X = np.load(
    "https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/deep-learning/cnn/sign-language-digits-classification-with-cnn/X.npy"
)
y = np.load(
    "https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/deep-learning/cnn/sign-language-digits-classification-with-cnn/Y.npy"
)
# reshape X
X = X.reshape(-1, 64, 64, 1)
print("X Shape:", X.shape)
print("Y Shape:", y.shape)
plt.figure(figsize=(20, 6))
for i, j in enumerate([0, 205, 411, 617, 823, 1030, 1237, 1444, 1650, 1858]):
    plt.subplot(2, 5, i + 1)
    plt.subplots_adjust(top=2, bottom=1)
    plt.imshow(X[j].reshape(64, 64))
    plt.title(np.argmax(y[j]))
    plt.axis("off")
  • As you can see, labels and images don’t match correctly. So first of all we will re-organize them.

  • Image size is 64x64

  • There are 2062 images in dataset.

list_y = []
list_y = [np.where(i == 1)[0][0] for i in y]
count = pd.Series(list_y).value_counts()
print(count)
plt.figure(figsize=(10, 5))
sns.countplot(np.array(list_y))
plt.show()
  • We have a balanced dataset.

42.96.3. Preparing Data#

We will re-organize data to match labels and images correctly.

  • 204-409 => 0

  • 822-1028 => 1

  • 1649-1855 => 2

  • 1443-1649 => 3

  • 1236-1443 => 4

  • 1855-2062 => 5

  • 615-822 => 6

  • 409-615 => 7

  • 1028-1236 => 8

  • 0-204 => 9

X_organized = np.concatenate(
    (
        X[204:409, :],
        X[822:1028, :],
        X[1649:1855, :],
        X[1443:1649, :],
        X[1236:1443, :],
        X[1855:2062, :],
        X[615:822, :],
        X[409:615, :],
        X[1028:1236, :],
        X[0:204, :],
    ),
    axis=0,
)
plt.figure(figsize=(20, 6))
for i, j in enumerate([0, 205, 411, 617, 823, 1030, 1237, 1444, 1650, 1858]):
    plt.subplot(2, 5, i + 1)
    plt.subplots_adjust(top=2, bottom=1)
    plt.imshow(X_organized[j].reshape(64, 64))
    plt.title(np.argmax(y[j]))
    plt.axis("off")
  • Now labels and images are matched correctly.

42.96.4. Train Test Split#

x_train, x_test, y_train, y_test = train_test_split(
    X_organized, y, test_size=0.2, random_state=42
)

print("x_train shape:", x_train.shape)
print("x_test shape:", x_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)
  • Now our test and train datasets are ready. We can start to create CNN model.

42.96.5. Implementation of CNN#

42.96.6. Data Augmentation With Keras API#

Data augmentation is a technique which generates new training samples without changing labels of images. To generate new samples, some features of images are changed like brightness, rotation or zoom level. To apply it, ImageDataGenerator class is used in KERAS API. This class refers parameters and changes images. After complete the changing process, it returns new samples. This is important! ImageDataGenerator returns only new images. It means that out training dataset consists of different from original dataset. It provides more generalizaton for model anf of course it is desirable.

So, in implementation of CNN part, we will use data augmentation and we will change rotation and zoom level of images. we chose these parameters with a simple logic. Think of test data that we might encounter in real life. we don’t always hold our hand at 90 degrees. So it is quite possible that we have a rotational change when using sign language. Likewise, the zoom level of the photo to be taken may also change. So we thought we could train my model better by creating a more general data set with these two parameters. Let’s take a closer look at these parameters.

  • rotation_range: Rotation augmentation randomly rotates the image clockwise by a given number between 0 and 360.

  • zoom_range: The percentage of the zoom can be a single float or a range as an array or tuple. If a float is specified, then the range for the zoom will be [1-value, 1+value].

We will apply data augmentation with this parameters.

  • rotation = 45

  • zoom_range = 0.5

Before continue to CNN implementation, let’s look some samples to see effects of data augmentation on dataset.

from keras.preprocessing.image import ImageDataGenerator
def show_new_samples(new_images):
    plt.figure(figsize=(20, 6))
    for i in range(10):
        plt.subplot(2, 5, i + 1)
        image = new_images.next()
        plt.imshow(image[0].reshape(64, 64))
        plt.axis("off")

    plt.show()

42.96.7. Changin zoom level#

datagen = ImageDataGenerator(zoom_range=0.5)
new_images = datagen.flow(x_train, batch_size=250)
show_new_samples(new_images)

42.96.8. Changing rotaion#

datagen = ImageDataGenerator(rotation_range=45)
new_images = datagen.flow(x_train, batch_size=250)
show_new_samples(new_images)

42.96.9. Changing rotaion, zoom#

datagen = ImageDataGenerator(zoom_range=0.5, rotation_range=45)
new_images = datagen.flow(x_train, batch_size=1)
show_new_samples(new_images)

42.96.10. Model Implementation#

model = Sequential()

model.add(
    Conv2D(
        filters=32,
        kernel_size=(9, 9),
        padding="Same",
        activation="relu",
        input_shape=(64, 64, 1),
    )
)
model.add(MaxPool2D(pool_size=(5, 5)))
model.add(Dropout(0.2))

model.add(Conv2D(filters=64, kernel_size=(7, 7), padding="Same", activation="relu"))
model.add(MaxPool2D(pool_size=(4, 4), strides=(3, 3)))
model.add(Dropout(0.2))

model.add(Conv2D(filters=128, kernel_size=(5, 5), padding="Same", activation="relu"))
model.add(MaxPool2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dropout(0.2))
model.add(Dense(256, activation="relu"))
model.add(Dense(10, activation="softmax"))

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999)

model.compile(
    optimizer=optimizer, loss="categorical_crossentropy", metrics=["accuracy"]
)

datagen = ImageDataGenerator(zoom_range=0.5, rotation_range=45)
datagen.fit(x_train)

history = model.fit(
    datagen.flow(x_train, y_train, batch_size=250),
    epochs=100,
    validation_data=(x_test, y_test),
)

42.96.11. Conclusion#

plt.figure(figsize=(10, 5))
plt.plot(history.history["val_loss"], color="b", label="validation loss")
plt.title("Test Loss")
plt.xlabel("Number of Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()
y_predict = model.predict(x_test)
y_predict_classes = np.argmax(y_predict, axis=1)
y_true = np.argmax(y_test, axis=1)
confusion_mtx = confusion_matrix(y_true, y_predict_classes)
plt.figure(figsize=(10, 10))
sns.heatmap(confusion_mtx, annot=True, fmt=".1f")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

42.97. Acknowledgments#

Thanks to Görkem Günay for creating sign-language-digits-classification-with-cnn. It inspires the majority of the content in this chapter.