kmeans with canopy clustering

k-means, canopy clustering Machine Learning Study (2015-03-05) BDS연구소 플랫폼연구팀 정성현 선임

Classification Supervised learning

Training set:

ClassB

ClassA

Training set:

ClassB

ClassA

Decision boundary

Training set:

ClassB

ClassA

Decision boundary

new input

Training set:

ClassB

ClassA

Decision boundary

new input classify as classB

Clustering Un-Supervised learning

Training set:

Clustering Un-Supervised learning

Training set:

ClusterA

ClusterB

k-means clustering

Input: -‐  (number of clusters) -‐  Training set

k-means clustering

Ini?al cluster centroids

Input: -‐  (number of clusters) -‐  Training set

k-means clustering 10

Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster }

k-means clustering

= index of cluster (1,2,…, ) to which example is currently assigned

= cluster centroid ( ) = cluster centroid of cluster to which example has been assigned

K-means optimization objective 17

Randomly ini?alize cluster centroids

K-means algorithm

Should have Randomly pick training examples. Set equal to these examples.

Random initialization

Local optima 20

Local optima 21

Local optima 22

For i = 1 to 100 {

Randomly ini?alize K-‐means. Run K-‐means. Get . Compute cost func?on (distor?on)

Pick clustering that gave lowest cost

Random initialization

Elbow method:

1 2 3 4 5 6 7 8

Cost fu

(no. of clusters)

Choose K=3

Choosing the value of K

Elbow method:

1 2 3 4 5 6 7 8

Cost fu

(no. of clusters) 1 2 3 4 5 6 7 8 9

Cost fu

(no. of clusters)

Choose K=3

Choosing the value of K

DEMO k-means

Demo Starbuck area clustering

k-‐=5 k-‐=3

서울, 경기 강원

부산, 경상

전라, 제주

충청

서울, 경기, 강원

충청, 전라, 제주 부산, 경상

Demo Local optima

Demo Find Elbow

# of K 2 3 4 5 6 7

Canopy Clustering Threshold T1, T2

Select one of samples for cluster center

Inside T2, member of cluster cannot be a cluster center

Outside T2, inside T1, member of cluster, could also be a cluster center itself

Canopy Clustering Overlap

Overlap

choose closest cluster

Canopy clustering Finding the perfect k using canopy clustering

> 5% of popula?on > 10% of popula?on sample data

Finding the perfect k using canopy clustering

SEEDING K-‐MEANS CENTROIDS USING CANOPY GENERATION

-‐ Mahout in ac?on -‐

DEMO canopy, canopy & k-means

Demo Starbuck area clustering

[k-‐means] K=5, elapsed ?me=2.57 sec [canopy] canopy[K]=5, elapsed ?me=0.012 sec

Demo K-means with canopy

[k-‐means] K=5, elapsed ?me=2.57 sec [k-‐means + canopy] canopy[K]=5, elapsed ?me=1.89 sec

Demo Canopy clustering to find user POI(Point Of Interest)

GPS data of one month

All Canopies

Canopies > 10% of popula?on

Reference 41

Reference hhp://www.coursera.com Machine Learning (Andrew Ng) Clustering Chapter

Reference 42

Reference hhp://mahout.apache.org Mahout Mahout in ac?on

THANKS

kmeans with canopy clustering

Data & Analytics

canopy general

canopy 12 fte rfi

global canopy programme

aplikasi kmeans pengelompokan rmh tangga tseptioko ha...

construccion de canopy

kmeans clustering-results.pdf

penerapan metode ga-kmeans untuk...

steve hoey canopy

text-mining: clustering - philosophische fakultät ·...

mal pais canopy

en tanagra et les autres kmeans

implementasi kmeans clustering pada lingkungan … ·...

classificação de imagens não supervisionada - kmeans e...

manual de usuario canopy

antenas canopy motorola

1. session clustering · 2016-08-22 · session clustering...

pedicraft canopy enclosed bed enclosed bed manual.pdf ·...

cah kmeans avec python

canopy 2012

mo443/mc920 introdução ao processamento de imagem digital...