{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys\n", "!{sys.executable} -m pip install --quiet numpy pandas matplotlib scikit-learn jupyterlab-myst ipython seaborn ipywidgets" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.svm import SVC\n", "from sklearn.datasets import make_circles\n", "from mpl_toolkits import mplot3d\n", "from ipywidgets import interact, fixed\n", "from sklearn.datasets import make_blobs\n", "\n", "sns.set()" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Kernel method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVMs are a powerful and flexible class of algorithms used for classification and regression. In this section, we will explore the intuition behind SVMs and their use in classification problems.\n", "To start with, let's understand the basic concept of SVMs. \n", "Support Vector Machines (SVMs) are supervised learning algorithms that can be used for classification and regression tasks. SVMs try to find the best decision boundary that separates data points of different classes. The decision boundary is chosen such that it maximizes the margin between the data points of different classes.\n", "The margin is defined as the minimum distance between the decision boundary and the closest data points of each class. This makes SVMs very robust to outliers as the decision boundary is chosen based on the data that is closest to it.\n", "In classification problems, the goal is to find a decision boundary that separates the data into two or more classes. SVMs can be used for binary classification, where the classes are only two, or for multiclass classification, where there are more than two classes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "\n", "A demo of SVM. [source]\n", "
\n" ], "text/plain": [ "\n", "\n", "A demo of SVM. [source]\n", "
\n", "\"\"\"\n", " )\n", ")\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "hide-input" ] }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "A demo of SVM. [source]\n", "
\n" ], "text/plain": [ "\n", "\n", "A demo of SVM. [source]\n", "
\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Motivating Support Vector Machines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this section we will consider differentiated classification: rather than modelling each category, we simply find a line or curve or flowform that separates these categories from each other." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\n", "\n", "video of RBF kernel. [source]\n", "
\n" ], "text/plain": [ "\n", "\n", "video of RBF kernel. [source]\n", "
\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Support Vector Regression (SVR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SVR regression (Squeeze-and-VectorRegression), also known as Squeeze Regression, is an optimisation problem. The basic idea is to find a regression plane such that the variables in all data sets are closest to a point on this plane, so that all points on the plane are correlated.\n", "Objective function:\n", "In SVR regression, the objective function is to minimise the two-parametric number of weights while keeping the points in each training set as far away as possible from the support vector on one side of their own category." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/svm/svr1.jpeg\n", "---\n", "name: support vector regression\n", "width: 90%\n", "---\n", "An illustration of support vector regression\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SVM v.s. logistic regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarities: Both are linear classification algorithms\n", "Differences:\n", "1. Different loss functions\n", "LR: based on the assumption of \"given x and parameters, y obeys binomial distribution\", derived from the maximum likelihood estimation\n", "SVM: standard representation of hinge loss + L2 regularization, based on the principle of geometric interval maximization\n", "2. Support vector machines only consider local points near the interval boundary, whereas logistic regression considers the global (points far from the boundary line also play a role in determining the boundary line). Support vector machines do not cause changes in the separation hyperplane by changing the non-support vector samples\n", "3. SVM's loss function is self-regularising, which is why SVM is a structural risk minimisation algorithm!!! And LR must additionally add a regular term to the loss function !!! Structural risk minimisation, meaning seeking a balance between training error and model complexity to prevent overfitting.\n", "4. Optimization methods: LR is generally based on gradient descent, SVM on SMO\n", "5. For non-linear separable problems, SVM is more scalable than LR\n", "\n", "\n", "https://www.geeksforgeeks.org/differentiate-between-support-vector-machine-and-logistic-regression/\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SVR v.s. linear regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The problem of overfitting is a very tricky one in the field of statistics. In this case, many machine learning methods, such as least squares (OLSE), will perform poorly. Support vector machines (SVR), on the other hand, can minimise the overfitting problem. SVR allows for non-linear fitting when there is enough data for training.\n", "A final issue to consider is OLSE linear regression. While linear regression is effective for most problems, it is not always effective for some special cases. For example, in OLSE linear regression, there is some bias in fitting the variables linearly because the model is not hyperplane. In contrast, SVR allows for non-linear fitting problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Support Vector Machine Summary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The basic model of SVM: is defined as a linear classifier with maximum interval on the feature space. Specifically, when linearly separable, it finds the optimal classification hyperplane for both classes of samples in the original space. When linearly indistinguishable, relaxation variables are added and samples from a low-dimensional input space are mapped to a higher-dimensional space via a non-linear mapping to make them linearly distinguishable, so that the optimal classification hyperplane can be found in that feature space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Advantages of SVM:\n", "1) Solves machine learning in the small sample case.\n", "2) No increase in computational complexity when mapping to higher dimensional spaces as the dimensional catastrophe and non-linear differentiability are overcome by using the kernel function approach. (Since the final decision function of the support vector machine algorithm is determined by only a small number of support vectors, the computational complexity depends on the number of support vectors rather than the dimensionality of the entire sample space).\n", "Disadvantages of SVMs:\n", "1) Support vector machine algorithms are difficult to implement for large training samples. This is because support vector algorithms solve support vectors with the help of quadratic programming, which will be designed to compute matrices of order m, so a large amount of machine memory and computing time will be consumed when the order of the matrix is large.\n", "2) The classical SVM only gives algorithms for binary classification, while in data mining, it is generally necessary to solve multi-classification classification problems, and support vector machines are not ideal for solving multi-classification problems.\n", "3) The common SVM theories nowadays use a fixed penalty factor C, but the losses caused by two kinds of errors in positive and negative samples are different." ] }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## [Optional] Let's dive into the math of SVM ..." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "math of SVM. [source]\n", "
\n" ], "text/plain": [ "\n", "\n", "math of SVM. [source]\n", "
\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Your turn! 🚀\n", "You can follow this [assignment](../assignments/ml-advanced/kernel-method/kernel-method-assignment-1.ipynb) to practise Support Vector Machines with examples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgement\n", "\n", "Thanks for [jakevdp](https://jakevdp.github.io/PythonDataScienceHandbook/05.07-support-vector-machines.html), for which the code part is licenced under MIT licence." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.16" } }, "nbformat": 4, "nbformat_minor": 4 }