groups-keeping solution path algorithm for sparse regression

KDD2017論文紹介

Group-Keeping Solution Path Algorithm for Sparse Regression with Automatic

Feature Grouping

データサイエンスグループ

吉永尊洸

論文概要

論文ジャンルスパースモデリング、特に特徴量間のグループ構造を自動抽出する手法であるOSCARに関する話

論文の背景・課題グループ構造を「保ったまま」ハイパーパラメータ調整を行うアルゴリズムが知られていない

論文のメイン内容グループ構造を「保ったまま」パラメータ調整を行うアルゴリズムを提案：OscarGKPath

注）グループ構造を壊さない方向しか探索しないので、本当に最適値を探索しているわけではありません

結論全パラメータをグリッドサーチする場合と比較して、精度を損なわずに遥かに高速計算可能なアルゴリズムを提唱できたスパースモデリング

Contents

• Introduction to sparse modeling

• Review for OSCAR

• Proposal : OscarGKPath

• Results

• Summary and Discussion

Sparse Modeling

• A machine leaning method for high-dimensional data Genetic data, Medical image, ...

Important task : Feature selection

Sparse modeling features which have non-zero coefficients are called “seleced”

Feature Selection

• Conventional feature selection AIC/BIC w/ stepwise

Equivalent to 𝐿0-norm regularization

1-1

1

1-1

1

min𝛽

𝑖=1

𝑙

𝑦𝑖 − 𝑥𝑖𝑇𝛽

2+ 𝜆 𝛽 1min

𝛽

𝑖=1

𝑙


2+ 𝜆

𝑗

𝐼 𝛽𝑗 ≠ 0

• Lasso 𝐿1-norm : convex cone of 𝐿0-norm in [-1, 1]

𝛽 𝛽

Discontinuous and non-convex Continuous and convexContinuousApproximation

# of selected features

Variation of Sparse Modeling

• Lasso

• Elastic net

• SCAD

• Adaptive Lasso

• Fused Lasso

• Generalized Lasso

• (Non-overlapping/Overlapping) Group Lasso

• Clustered Lasso

OSCAR Extract Group structure

Generalization of Lasso

Respect the consistency of feature selection

Basic

Contents




• Results


OSCAR(Octagonal Shrinkage and Clustering Algorithm for Regression)

• Formulation

The method of Lagrange multiplier

min𝛽

1

2

𝑖=1

𝑙


2𝑠. 𝑡. 𝛽 1 + 𝑐

𝑗>𝑘

max 𝛽𝑗 , 𝛽𝑘 ≤ 𝑡

𝐹 𝛽, 𝜆1, 𝜆2 = min𝛽

1

2

𝑖=1

𝑙


2+ 𝜆1 𝛽 1 +𝜆2

𝑗>𝑘

max 𝛽𝑗 , 𝛽𝑘

where 𝑐 ≥ 0 and 𝑡 ≥ 0 are tuning parameters

where 𝜆1 ≥ 0 and 𝜆2 ≥ 0 are regularization parameters

𝐿1-norm 𝐿∞-norm

variables are normalized and/or standardized

Pictorial Image

• Solutions for correlated data ex) two features

Regularization parameters

𝜆1, 𝜆2

[Zeng and Figueired, 2013]

𝐿1-norm

𝐿∞-norm

𝐿1 + 𝐿∞

Lasso vs OSCAR

• OSCAR : Grouping structure is formulated

Data : Facebook Comment Dataset

Adjustment hyper parameter

• Solution Path ex) Lasso

Regularized parameter

Adjustment hyper parameter

• OSCAR No group-keeping solution path algorithm

OSCAR Path

？


Proposal

• OscarGKPath Group-keeping solution path algorithm of OSCAR

OSCAR Path


Contents




• Results


OSCAR revisited

• Re-formulation

min𝜃

1

2

𝑖=1

𝑙

𝑦𝑖 − 𝑥𝑖𝑇𝜃

2+

𝑔=1

𝐺

𝑤𝑔𝜃𝑔 𝑠. 𝑡. 0 ≤ 𝜃1 < 𝜃2 < ⋯ < 𝜃𝐺

where 𝑥𝑖 = 𝑥𝑖1 𝑥𝑖2 ⋯ 𝑥𝑖𝐺 and 𝑥𝑖𝑔 = 𝑗∈ℊ𝑔sign 𝛽𝑗 𝑥𝑖𝑗,

𝑤𝑔 = 𝑗∈ℊ𝑔𝜆1 + 𝑜 𝑗 − 1 𝜆2 , 𝑜 𝑗 ∈ 1,⋯ , 𝑑 , ℊ𝑔 ⊆ 1,⋯ , 𝑑 ,

and we define an active sets:

𝐴 = 𝑔 ∈ 1,⋯ , 𝐺 𝜃𝑔 > 0 , 𝐴 = 1,⋯ , 𝐺 − 𝐴

Input parameters

• Direction to the change of 𝜆1,2

𝑑 =𝑑1

𝑑2Δ𝜆 = 𝑑Δ𝜂, Δ𝜂 is determined by algorithm

• Accuracy 𝜖

The proposed algorithm is approximated one

• Interval of 𝜂

𝜂, 𝜂

OscarGKPath : Algorithm

using optimality condition for OSCAR (abbreviation)

using “termination condition”

using the dual problem

Any solution in the solution path can satisfy the duality gap : proved

Termination condition

1. A regression coefficients become zero : Δ𝜂𝛢

2. Order of regression coefficients change : Δ𝜂𝑂

※Optimality condition of OSCAR is based on a given order of coefficients

3. 𝜂 reaches 𝜂 : 𝜂 − 𝜂

Δ𝜂max = min Δ𝜂𝛢, Δ𝜂𝑂, 𝜂 − 𝜂

Contents




• Results


Setup

• Data Sets :

• 5-fold cross-validation

• direction : 𝑑 =10.5

,11

, 12

• 𝜂 : log2 𝜂 − 4 ≤ log2 𝜂 ≤ 15 OscarGKPath : 10 trials

“Batch Search” (search on 20 uniform grid linearly spaced by 0.1) ×400×5

• Duality gap : G 𝜃 𝜂 , 𝑑1𝜂, 𝑑2𝜂 ≤ 𝜀 = 0.1 × 𝐹 𝛽∗, 𝑑1, 𝑑2

At one trial, only limited solution path is produced by “Batch search”

Batch Search vs OscarGKPath

• Shorter time

• Maintained accuracy

Data : Right Ventricle Dataset

Grid Search

Proposal

Grid Search

Proposal

Contents




• Results


Summary and Discussion

論文ジャンルスパースモデリング、特に特徴量間のグループ構造を自動抽出する手法であるOSCARに関する話

論文の背景・課題グループ構造を「保ったまま」ハイパーパラメータ調整を行うアルゴリズムが知られていない

論文のメイン内容グループ構造を「保ったまま」パラメータ調整を行うアルゴリズムを提案：OscarGKPath

注）グループ構造を壊さない方向しか探索しないので、本当に最適値を探索しているわけではありません

結論全パラメータをグリッドサーチする場合と比較して、精度を損なわずに遥かに高速計算可能なアルゴリズムを提唱できた

所見（個人的）どういうグループ化が良いかはアルゴリズムのスコープ外なので、使用条件は限定的かもスパースモデリング

groups-keeping solution path algorithm for sparse regression

Science