how to use svm for data classification

如何用 SVM 做分類問題

Yiwei Chen2016.10

import numpy as npfrom sklearn import datasetsfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.svm import SVC

dataset = datasets.load_iris()X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target)

scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X_train)

param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)grid.fit(X_scaled, y_train)

clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000)clf.fit(X_scaled, y_train)

novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)print(novel_X_scaled)print(clf.predict(novel_X_scaled))

X_test_scaled = scaler.transform(X_test)print(clf.predict(X_test_scaled))print(clf.score(X_test_scaled, y_test))

如果看得懂前兩頁，就可以跳出這份投影片了

學習的方式很多

學習的目的也不同

notsweet

sweet

從經驗中學習冥冥之定數

Learn the Mother Naturefrom experience

這份投影片著重在

監督式分類(Supervised classification)

Mother Nature

甜不甜不甜甜 ??

??

甜 / 不甜 ?

train甜/不甜?

model

甜不甜不甜甜

??

甜 / 不甜 ?

predict

model

甜甜/不甜?

甜不甜不甜甜

Supervised Classification

● 有 training data: 一些物品/事情 + 其類別 (classes)

● 你要訓練出一個模型 (train a model)，之後

有新的物品進來，能預測 (predicts) 其類別

類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)

Support Vector Machine (SVM)

● 有 training data: 向量 (vectors) + 其類別

● 你要訓練出一個模型 -- 為一個函數 (function),之後有新的向量進來，能預測其類別

類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)

(1.2, 0, 0, 1, …, 57)

trainƒ: →

model

O

(8.7, 1, 0, 0, …, -3)X

(2.4, 1, 0, 0, …, 22)O

(0.3, 0, 1, 0, …, 33)X

⋮⋮

(1.2, 0, 0, 1, …, 57)

ƒ: →

model

O

(8.7, 1, 0, 0, …, -3)X

(2.4, 1, 0, 0, …, 22)O

(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)

predict

X

O

⋮⋮

Feature engineering

● 用同樣方式，把物品轉成向量

● Size: 8cm or 80mm?● red/yellow/green: (1,0,0)/(0,1,0)/(0,0,1)

解決監督式分類問題有很多種方法

● SVM● Decision trees● Neural networks● Deep learning● …

他們可以解決監督式分類問題

不代表他們只能解決監督式分類問題

Agenda

● Supervised classification● Support Vector Machine● Software environment● Use Support Vector Machines

(1.2, 0, 0, 1, …, 57)

trainƒ: →

model

O

(8.7, 1, 0, 0, …, 22)X

(2.4, 1, 0, 0, …, -3)O

(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)

predict

X

O

⋮⋮

Support Vector Machine ??

例子: 二維的向量，兩個分類

Feature 1

Feature 2

train

Model (function)

Support Vector Machine ??

例子: 二維的向量，兩個分類

predict

Model

?

? Model

Maximum Margin

SVM 的性質

● 和距離相關 (Distance related)● 分越開越好 (Maximum margin)

Characteristics in SVM

● 和距離相關 (Distance related)● 分越開越好 (Maximum margin)● 參數化 (Parameterized)

○ 邊界有可能是彎的

○ 可以分錯，但要懲罰

用不同參數訓練，有不同結果 ...

Agenda


用 python 的話

scikit-learn(sklearn)

numpy

SVM, decision trees,...

arrays, ... scipy

python

variance, ...

Anaconda: 願望一次滿足

● 跑在 python 上的開源科學平台

○ Linux / OSX / Windows● 想得到的都幫你安裝

● 快。不花腦。

● https://www.continuum.io/anaconda-overview

https://www.continuum.io/anaconda-overview

https://www.continuum.io/anaconda-overview

Agenda


(1.2, 0, 0, 1, …, 57)

trainƒ: →

model

O

(8.7, 1, 0, 0, …, 22)X

(2.4, 1, 0, 0, …, -3)O

(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)

predict

X

O

⋮⋮

一般流程

定好評估公式+基礎預測

上線預測訓練

● Accuracy○ Training accuracy○ Testing accuracy

● precision, recall, Type I / Type II error, AUC, …

進行任何訓練前，先決定好你要怎麼評估結果！

評估 (Evaluation)

● Simple and easy, 閉著眼睛猜

● 拿來「比較」用（你知道你做的比Baseline還差嗎）

基礎的預測 (Baseline predictor)

train ALL

用 SVM 的流程

定好評估公式+基礎預測

處理資料處理資料

縮放 features

尋找最好的參數

訓練模型

縮放 features

預測

1. Data preparation

● Transform object → vector● Whole training data at once

○ X in numpy.array (2-D) or scipy.sparse.csr_matrix○ y in numpy.array

(1.2, 0, 57)O

(8.7, 1, 22)X

(2.4, 1, -3)O X=np.array([[2.4, 1, -3], [8.7, 1, 22], [1.2, 0, 57]])

y=np.array([1,0,1])

2. Feature Scaling

(1.2, 0, 0, …)O

(8.7, 1, 0, …)X

(2.4, 1, 0, …)O

(0.3, 0, 1, …)X

⋮⋮

0.3 ~ 10.3

(n−0.3) ×0.1

0 ~ 1

0 ~ 1

(n+0) ×1

0 ~ 1

(0.09, 0, 0, …)O

(0.84, 1, 0, …)X

O

(0 , 0, 1, …)X

⋮⋮

(0.21, 1, 0, …)

scale

2. Feature Scaling

(1.2, 0, 0, …)O

(8.7, 1, 0, …)X

(2.4, 1, 0, …)O

(0.3, 0, 1, …)X

⋮⋮

(0.09, 0, 0, …)O

(0.84, 1, 0, …)X

O

(0 , 0, 1, …)X

⋮⋮

(0.21, 1, 0, …)

scale

scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X)

3. Search for the best parameter

param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}

grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)

grid.fit(X_scaled, y_train)

3. Search for best (??) C and

3. what is “best”?

甜不甜不甜甜 ??

train

model

你還不知道

3. Search for the best - validation

train

model

當做新的，沒看過

validate

甜不甜不甜甜

3. Search for the best - cross-validation

Cross-validation (CV): each fold validates in turn

train validate

train validate train

validate train

Given C=12, =34, the validation accuracy=0.56

3. Search for the best parameter - Grid

C

3. Search for the best parameter

param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}

grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)

grid.fit(X_scaled, y_train)

4. Train Model

use the best parameter in CV to train


Predict a novel data

● Scaling● Predict

novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)

print(clf.predict(novel_X_scaled))

Scale Training Data

(1.2, 0, 0, …)O

(8.7, 1, 0, …)X

(2.4, 1, 0, …)O

(0.3, 0, 1, …)X

⋮⋮

0.3 ~ 10.3

(n−0.3) ×0.1

0 ~ 1

0 ~ 1

(n+0) ×1

0 ~ 1

(0.09, 0, 0, …)O

(0.84, 1, 0, …)X

O

(0 , 0, 1, …)X

⋮⋮

(0.21, 1, 0, …)

scale

Scale Testing Data

(2.3, 0, 0, …)O

(-0.7, 1, 1, …)X

(1.3, 1, 1, …)O

(100, 0, 0, …)X

⋮⋮

(n−0.3) ×0.1

(n+0) ×1

(0.20, 0, 0, …)O

(-0.1, 1, 1, …)X

O

(9.97, 0, 0, …)X

⋮⋮

(0.10, 1, 1, …)

scale

Agenda


Takeaway…

??

甜 / 不甜 ?

train甜/不甜?

model

甜不甜不甜甜

??

甜 / 不甜 ?

predict

model

甜甜/不甜?

甜不甜不甜甜

用 SVM 的流程

Evaluation criteria + Baseline predictor

prepare dataprepare data

scale features

search best param:CV on grid

train model

scale features

predict

知道怎麼正確使用微波爐之後...

● Data collection (準備食材)● Model evaluation monitoring (客戶滿意?)● Feature engineering (處理食材)● Model update from novel data (與時俱進)● Training / prediction in large scale (大量食材)● A robust pipeline that integrates these altogether

(開餐廳)

Happy Training!

More materials

“Support” Vectors?

Maximum Margin

Why scaling?

Model Serialization

http://scikit-learn.org/stable/modules/model_persistence.html