how to use svm for data classification
TRANSCRIPT
如何用 SVM 做分類問題
Yiwei Chen2016.10
import numpy as npfrom sklearn import datasetsfrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import MinMaxScalerfrom sklearn.svm import SVC
dataset = datasets.load_iris()X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X_train)
param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000)clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)print(novel_X_scaled)print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)print(clf.predict(X_test_scaled))print(clf.score(X_test_scaled, y_test))
如果看得懂前兩頁,就可以跳出這份投影片了
學習的方式很多
學習的目的也不同
notsweet
sweet
從經驗中學習冥冥之定數
Learn the Mother Naturefrom experience
這份投影片著重在
監督式分類(Supervised classification)
Mother Nature
甜 不甜 不甜 甜 ??
??
甜 / 不甜 ?
train甜/不甜?
model
甜 不甜 不甜 甜
??
甜 / 不甜 ?
predict
model
甜甜/不甜?
甜 不甜 不甜 甜
Supervised Classification
● 有 training data: 一些物品/事情 + 其類別 (classes)
● 你要訓練出一個模型 (train a model),之後
有新的物品進來,能預測 (predicts) 其類別
類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)
Support Vector Machine (SVM)
● 有 training data: 向量 (vectors) + 其類別
● 你要訓練出一個模型 -- 為一個函數 (function),之後有新的向量進來,能預測其類別
類別可以有兩個 (甜/不甜, binary classification) 或者更多個 (台/日/韓, multi-class classification)
(1.2, 0, 0, 1, …, 57)
trainƒ: →
model
O
(8.7, 1, 0, 0, …, -3)X
(2.4, 1, 0, 0, …, 22)O
(0.3, 0, 1, 0, …, 33)X
⋮⋮
(1.2, 0, 0, 1, …, 57)
ƒ: →
model
O
(8.7, 1, 0, 0, …, -3)X
(2.4, 1, 0, 0, …, 22)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
Feature engineering
● 用同樣方式,把物品轉成向量
● Size: 8cm or 80mm?● red/yellow/green: (1,0,0)/(0,1,0)/(0,0,1)
解決監督式分類問題有很多種方法
● SVM● Decision trees● Neural networks● Deep learning● …
他們可以解決監督式分類問題
不代表他們只能解決監督式分類問題
Agenda
● Supervised classification● Support Vector Machine● Software environment● Use Support Vector Machines
(1.2, 0, 0, 1, …, 57)
trainƒ: →
model
O
(8.7, 1, 0, 0, …, 22)X
(2.4, 1, 0, 0, …, -3)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
Support Vector Machine ??
例子: 二維的向量,兩個分類
Feature 1
Feature 2
train
Model (function)
Support Vector Machine ??
例子: 二維的向量,兩個分類
predict
Model
?
? Model
Maximum Margin
SVM 的性質
● 和距離相關 (Distance related)● 分越開越好 (Maximum margin)
Characteristics in SVM
● 和距離相關 (Distance related)● 分越開越好 (Maximum margin)● 參數化 (Parameterized)
○ 邊界有可能是彎的
○ 可以分錯,但要懲罰
用不同參數訓練,有不同結果 ...
Agenda
● Supervised classification● Support Vector Machine● Software environment● Use Support Vector Machines
用 python 的話
scikit-learn(sklearn)
numpy
SVM, decision trees,...
arrays, ... scipy
python
variance, ...
Anaconda: 願望一次滿足
● 跑在 python 上的開源科學平台
○ Linux / OSX / Windows● 想得到的都幫你安裝
● 快。不花腦。
● https://www.continuum.io/anaconda-overview
Agenda
● Supervised classification● Support Vector Machine● Software environment● Use Support Vector Machines
(1.2, 0, 0, 1, …, 57)
trainƒ: →
model
O
(8.7, 1, 0, 0, …, 22)X
(2.4, 1, 0, 0, …, -3)O
(0.3, 0, 1, 0, …, 33)X (1.2, 0, 1, …, 8)
predict
X
O
⋮⋮
一般流程
定好 評估公式+基礎預測
上線預測訓練
● Accuracy○ Training accuracy○ Testing accuracy
● precision, recall, Type I / Type II error, AUC, …
進行任何訓練前,先決定好你要怎麼評估結果!
評估 (Evaluation)
● Simple and easy, 閉著眼睛猜
● 拿來「比較」用(你知道你做的比Baseline還差嗎)
基礎的預測 (Baseline predictor)
train ALL
用 SVM 的流程
定好 評估公式+基礎預測
處理資料處理資料
縮放 features
尋找最好的參數
訓練模型
縮放 features
預測
dataset = datasets.load_iris()X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X_train)
param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000)clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)print(novel_X_scaled)print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)print(clf.predict(X_test_scaled))print(clf.score(X_test_scaled, y_test))
1. Data preparation
● Transform object → vector● Whole training data at once
○ X in numpy.array (2-D) or scipy.sparse.csr_matrix○ y in numpy.array
(1.2, 0, 57)O
(8.7, 1, 22)X
(2.4, 1, -3)O X=np.array([[2.4, 1, -3], [8.7, 1, 22], [1.2, 0, 57]])
y=np.array([1,0,1])
2. Feature Scaling
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
0.3 ~ 10.3
(n−0.3) ×0.1
0 ~ 1
0 ~ 1
(n+0) ×1
0 ~ 1
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
2. Feature Scaling
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X)
3. Search for the best parameter
param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}
grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
3. Search for best (??) C and
3. what is “best”?
甜 不甜 不甜 甜 ??
train
model
你還不知道
3. Search for the best - validation
train
model
當做新的,沒看過
validate
甜 不甜 不甜 甜
3. Search for the best - cross-validation
Cross-validation (CV): each fold validates in turn
train validate
train validate train
validate train
Given C=12, =34, the validation accuracy=0.56
3. Search for the best parameter - Grid
C
3. Search for the best parameter
param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}
grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)
grid.fit(X_scaled, y_train)
4. Train Model
use the best parameter in CV to train
clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000)clf.fit(X_scaled, y_train)
Predict a novel data
● Scaling● Predict
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)
print(clf.predict(novel_X_scaled))
Scale Training Data
(1.2, 0, 0, …)O
(8.7, 1, 0, …)X
(2.4, 1, 0, …)O
(0.3, 0, 1, …)X
⋮⋮
0.3 ~ 10.3
(n−0.3) ×0.1
0 ~ 1
0 ~ 1
(n+0) ×1
0 ~ 1
(0.09, 0, 0, …)O
(0.84, 1, 0, …)X
O
(0 , 0, 1, …)X
⋮⋮
(0.21, 1, 0, …)
scale
Scale Testing Data
(2.3, 0, 0, …)O
(-0.7, 1, 1, …)X
(1.3, 1, 1, …)O
(100, 0, 0, …)X
⋮⋮
(n−0.3) ×0.1
(n+0) ×1
(0.20, 0, 0, …)O
(-0.1, 1, 1, …)X
O
(9.97, 0, 0, …)X
⋮⋮
(0.10, 1, 1, …)
scale
dataset = datasets.load_iris()X_train, X_test, y_train, y_test = train_test_split( dataset.data, dataset.target, test_size=0.1, stratify=dataset.target)
scaler = MinMaxScaler()X_scaled = scaler.fit_transform(X_train)
param_grid = { "C": np.logspace(-5, 15, num=6, base=2), "gamma": np.logspace(-13, 3, num=5, base=2)}grid = GridSearchCV( estimator=SVC(kernel="rbf", max_iter=10000000), param_grid=param_grid, cv=5)grid.fit(X_scaled, y_train)
clf = SVC(kernel="rbf", C=grid.best_params_["C"], gamma=grid.best_params_["gamma"], max_iter=10000000)clf.fit(X_scaled, y_train)
novel_X = np.array([[5.9, 3.2, 3.9, 1.5]])novel_X_scaled = scaler.transform(novel_X)print(novel_X_scaled)print(clf.predict(novel_X_scaled))
X_test_scaled = scaler.transform(X_test)print(clf.predict(X_test_scaled))print(clf.score(X_test_scaled, y_test))
Agenda
● Supervised classification● Support Vector Machine● Software environment● Use Support Vector Machines
Takeaway…
??
甜 / 不甜 ?
train甜/不甜?
model
甜 不甜 不甜 甜
??
甜 / 不甜 ?
predict
model
甜甜/不甜?
甜 不甜 不甜 甜
用 SVM 的流程
Evaluation criteria + Baseline predictor
prepare dataprepare data
scale features
search best param:CV on grid
train model
scale features
predict
知道怎麼正確使用微波爐之後...
● Data collection (準備食材)● Model evaluation monitoring (客戶滿意?)● Feature engineering (處理食材)● Model update from novel data (與時俱進)● Training / prediction in large scale (大量食材)● A robust pipeline that integrates these altogether
(開餐廳)
Happy Training!
More materials
“Support” Vectors?
Maximum Margin
Why scaling?
Model Serialization
http://scikit-learn.org/stable/modules/model_persistence.html