03. linear regression

43
Jeonghun Yoon

Upload: jeonghun-yoon

Post on 10-Feb-2017

34 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: 03. linear regression

Jeonghun Yoon

Page 2: 03. linear regression

์ง€๋‚œ ์‹œ๊ฐ„.....Naive Bayes Classifier

argmax๐‘ฆ๐‘ƒ ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘‘ ๐‘ฆ ๐‘ƒ(๐‘ฆ) = argmax

๐‘ฆ ๐‘ƒ ๐‘ฅ๐‘– ๐‘ฆ ๐‘ƒ(๐‘ฆ)

๐‘‘

๐‘–=1

class ๐‘ฆ ์˜ ๋ฐœ์ƒ ํ™•๋ฅ ๊ณผ test set์—์„œ class ๐‘ฆ์˜ label์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ๋ฒกํ„ฐ์˜

์›์†Œ ๐‘ฅ๐‘– (๋ฌธ์„œ์˜ ์˜ˆ์—์„œ๋Š” ๋‹จ์–ด) ๊ฐ€ ๋‚˜์˜ฌ ํ™•๋ฅ ์˜ ๊ณฑ

ex) (I, love, you)๊ฐ€ spam์ธ์ง€ ์•„๋‹Œ์ง€ ์•Œ๊ธฐ ์œ„ํ•ด์„œ๋Š”,

test set์—์„œ spam์ด ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ๊ณผ

spam์œผ๋กœ labeling ๋œ ๋ฌธ์„œ์—์„œ I์™€ love์™€ you๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ํ™•๋ฅ ์„ ๋ชจ๋‘ ๊ณฑํ•œ ๊ฒƒ๊ณผ,

test set์—์„œ ham์ด ์ฐจ์ง€ํ•˜๋Š” ๋น„์œจ๊ณผ

ham์œผ๋กœ labeling ๋œ ๋ฌธ์„œ์—์„œ I์™€ love์™€ you๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ํ™•๋ฅ ์„ ๋ชจ๋‘ ๊ณฑํ•œ ๊ฒƒ์„,

๋น„๊ตํ•œ๋‹ค.

Page 3: 03. linear regression

์ง€๋‚œ ์‹œ๊ฐ„ ๋ฏธ๋น„ํ–ˆ๋˜ ์  ๋“ค... 1. Laplacian Smoothing (appendix ์ฐธ๊ณ )

2. MLE / MAP

1

Page 4: 03. linear regression

Bayesโ€™ Rule

๐‘ ๐œƒ ๐•ฉ =๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

posteriori (์‚ฌํ›„ ํ™•๋ฅ )

likelihood (์šฐ๋„ ๊ฐ’)

prior (์‚ฌ์ „ ํ™•๋ฅ )

์‚ฌํ›„ ํ™•๋ฅ  : ๊ด€์ฐฐ ๊ฐ’๋“ค์ด ๊ด€์ฐฐ ๋œ ํ›„์— ๋ชจ์ˆ˜(parameter)์˜ ๋ฐœ์ƒ ํ™•๋ฅ ์„ ๊ตฌํ•œ๋‹ค.

์‚ฌ์ „ ํ™•๋ฅ  : ๊ด€์ฐฐ ๊ฐ’๋“ค์ด ๊ด€์ฐฐ ๋˜๊ธฐ ์ „์— ๋ชจ์ˆ˜์˜ ๋ฐœ์ƒ ํ™•๋ฅ ์„ ๊ตฌํ•œ๋‹ค.

์šฐ๋„ ๊ฐ’ : ๋ชจ์ˆ˜์˜ ๊ฐ’์ด ์ฃผ์–ด์กŒ์„ ๋•Œ ๊ด€์ฐฐ ๊ฐ’๋“ค์ด ๋ฐœ์ƒํ•  ํ™•๋ฅ 

Page 5: 03. linear regression

Maximum Likelihood Estimate

๐•ฉ = (๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›)

๐“› ๐œฝ = ๐’‘ ๐•ฉ ๐œฝ

์šฐ๋„(likelihood)๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ ๋œ๋‹ค.

๋ณ€์ˆ˜(parameter) ๐œƒ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, data set ๐•ฉ = (๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›) (๊ด€์ฐฐ ๋œ, observed) ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š”(obtaining) ํ™•๋ฅ 

๐‘(๐•ฉ|๐œƒ)

๐‘‹

๐œƒ์˜ ํ•จ์ˆ˜. ๐œƒ์˜ pdf๋Š” ์•„๋‹˜.

๐•ฉ = (๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›)

Page 6: 03. linear regression

Maximum Likelihood Estimate๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ ๋œ๋‹ค.

๊ด€์ฐฐ ๋œ data set ๐•ฉ = ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘› ์„ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ ์ด ๊ฐ€์žฅ ํฐ ๐œƒ๊ฐ€ MLE์ด๋‹ค.

๐‘(๐•ฉ|๐œƒ1)

๐‘‹ ๐•ฉ = (๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘›)

๐œฝ = ๐š๐ซ๐ ๐ฆ๐š๐ฑ๐œฝ๐“› ๐œฝ = ๐š๐ซ๐ ๐ฆ๐š๐ฑ

๐œฝ๐’‘(๐•ฉ|๐œฝ) ฬ‚

๐‘(๐•ฉ|๐œƒ2) ๐‘(๐•ฉ|๐œƒ3)

๐‘(๐•ฉ|๐œƒ) ๐œƒ = ๐œƒ2 ฬ‚

Page 7: 03. linear regression

์šฐ๋ฆฌ๊ฐ€ likelihood function ๐‘(๐•ฉ|๐œƒ)์™€ prior ๐‘(๐œƒ)๋ฅผ ์•Œ ๋•Œ, Bayes rule์— ์˜ํ•˜์—ฌ posteriori function์˜ ๊ฐ’์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

๐’‘ ๐œฝ ๐•ฉ โˆ ๐’‘ ๐•ฉ ๐œฝ ๐’‘(๐œฝ)

Maximum A Posteriori Estimate

๐‘ ๐œƒ ๐•ฉ =๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

posteriori (์‚ฌํ›„ ํ™•๋ฅ )

likelihood (์šฐ๋„ ๊ฐ’)

prior (์‚ฌ์ „ ํ™•๋ฅ )

Page 8: 03. linear regression

Likelihood ๐‘(๐•ฉ|๐œƒ)

Prior ๐‘(๐œƒ)

Posterior ๐‘ ๐œƒ ๐•ฉ โˆ ๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

Page 9: 03. linear regression

Likelihood ๐‘(๐•ฉ|๐œƒ)

Prior ๐‘(๐œƒ)

Posterior ๐‘ ๐œƒ ๐•ฉ โˆ ๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

Page 10: 03. linear regression

๐œฝ = ๐š๐ซ๐ ๐ฆ๐š๐ฑ๐œฝ๐’‘(๐œฝ|๐•ฉ)

Likelihood ๐‘(๐•ฉ|๐œƒ)

Prior ๐‘(๐œƒ)

Posterior ๐‘ ๐œƒ ๐•ฉ โˆ ๐‘ ๐•ฉ ๐œƒ ๐‘(๐œƒ)

Page 11: 03. linear regression

Regression

Page 12: 03. linear regression

๋‚˜๋Š” ํฐ ์‹ ๋ฐœํšŒ์‚ฌ์˜ CEO์ด๋‹ค. ๋งŽ์€ ์ง€์ ๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ด๋ฒˆ์— ์ƒˆ๋กœ์šด ์ง€์ ์„ ๋‚ด๊ณ  ์‹ถ๋‹ค. ์–ด๋Š ์ง€์—ญ์— ๋‚ด์•ผ ๋ ๊นŒ?

๋‚ด๊ฐ€ ์ƒˆ๋กœ์šด ์ง€์ ์„ ๋‚ด๊ณ  ์‹ถ์–ดํ•˜๋Š” ์ง€์—ญ๋“ค์˜ ์˜ˆ์ƒ ์ˆ˜์ต๋งŒ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์œผ๋ฉด

ํฐ ๋„์›€์ด ๋  ๊ฒƒ์ธ๋ฐ!

๋‚ด๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ž๋ฃŒ(data)๋Š” ๊ฐ ์ง€์ ์˜ ์ˆ˜์ต(profits)๊ณผ ๊ฐ ์ง€์ ์ด ์žˆ๋Š” ์ง€์—ญ์˜

์ธ๊ตฌ์ˆ˜(populations)์ด๋‹ค.

ํ•ด๊ฒฐ์ฑ…! Linear Regression!

์ด๊ฒƒ์„ ํ†ตํ•˜์—ฌ, ์ƒˆ๋กœ์šด ์ง€์—ญ์˜ ์ธ๊ตฌ์ˆ˜๋ฅผ ์•Œ๊ฒŒ ๋  ๊ฒฝ์šฐ, ๊ทธ ์ง€์—ญ์˜ ์˜ˆ์ƒ ์ˆ˜์ต์„ ๊ตฌ

ํ•  ์ˆ˜ ์žˆ๋‹ค.

Example 1)

Page 13: 03. linear regression

Example 2)

๋‚˜๋Š” ์ง€๊ธˆ Pittsburgh๋กœ ์ด์‚ฌ๋ฅผ ์™”๋‹ค ๋‚˜๋Š” ๊ฐ€์žฅ ํ•ฉ๋ฆฌ์ ์ธ ๊ฐ€๊ฒฉ์˜ ์•„ํŒŒํŠธ๋ฅผ ์–ป๊ธฐ ์›ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‹ค์Œ์˜ ์กฐ๊ฑด๋“ค์€ ๋‚ด๊ฐ€ ์ง‘์„ ์‚ฌ๊ธฐ ์œ„ํ•ด ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ๋“ค์ด๋‹ค. square-ft(ํ‰๋ฐฉ๋ฏธํ„ฐ), ์นจ์‹ค์˜ ์ˆ˜, ํ•™๊ต ๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ... ๋‚ด๊ฐ€ ์›ํ•˜๋Š” ํฌ๊ธฐ์™€ ์นจ์‹ค์˜ ์ˆ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์ง‘์˜ ๊ฐ€๊ฒฉ์€ ๊ณผ์—ฐ ์–ผ๋งˆ์ผ๊นŒ?

Page 14: 03. linear regression

โ‘  Given an input ๐‘ฅ we would like to compute an output ๐‘ฆ. (๋‚ด๊ฐ€ ์›ํ•˜๋Š” ์ง‘์˜ ํฌ๊ธฐ์™€, ๋ฐฉ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ž…๋ ฅํ–ˆ์„ ๋•Œ, ์ง‘ ๊ฐ€๊ฒฉ์˜ ์˜ˆ์ธก ๊ฐ’์„ ๊ณ„์‚ฐ)

โ‘ก For example 1) Predict height from age (height = ๐‘ฆ, age = ๐‘ฅ) 2) Predict Google`s price from Yahoo`s price (Google's price = ๐‘ฆ, Yahoo's price = ๐‘ฅ)

๐‘ฆ = ๐œƒ0 + ๐œƒ1๐‘ฅ

์ฆ‰, ๊ธฐ์กด์˜ data๋“ค์—์„œ

์ง์„ (๐‘ฆ = ๐œƒ0 + ๐œƒ1๐‘ฅ)์„ ์ฐพ์•„๋‚ด๋ฉด,

์ƒˆ๋กœ์šด ๊ฐ’ ๐‘ฅ๐‘›๐‘’๐‘ค๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ,

ํ•ด๋‹นํ•˜๋Š” ๐‘ฆ์˜ ๊ฐ’์„ ์˜ˆ์ธกํ•  ์ˆ˜

์žˆ๊ฒ ๊ตฌ๋‚˜!

learning, training

prediction

Page 15: 03. linear regression

Input : ์ง‘์˜ ํฌ๊ธฐ(๐‘ฅ1), ๋ฐฉ์˜ ๊ฐœ์ˆ˜(๐‘ฅ2), ํ•™๊ต๊นŒ์ง€์˜ ๊ฑฐ๋ฆฌ(๐‘ฅ3),.....

(๐‘ฅ1, ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘›) : ํŠน์„ฑ ๋ฒกํ„ฐ feature vector

Output : ์ง‘ ๊ฐ’(๐‘ฆ)

๐’š = ๐œฝ๐ŸŽ + ๐œฝ๐Ÿ๐’™๐Ÿ + ๐œฝ๐Ÿ๐’™๐Ÿ +โ‹ฏ+ ๐œฝ๐’๐’™๐’

training set์„ ํ†ตํ•˜์—ฌ ํ•™์Šต(learning)

Page 16: 03. linear regression

Simple Linear Regression

Page 17: 03. linear regression

๐‘ฆ๐‘– = ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘– + ๐œ€๐‘–

๐‘–๋ฒˆ์งธ ๊ด€์ฐฐ์  ๐‘ฆ๐‘– , ๐‘ฅ๐‘– ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๋‹จ์ˆœ ํšŒ๊ท€ ๋ชจํ˜•์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๐œ–3

๐œ–๐‘– : ๐‘–๋ฒˆ์งธ ๊ด€์ฐฐ์ ์—์„œ ์šฐ๋ฆฌ๊ฐ€ ๊ตฌํ•˜๊ณ ์ž ํ•˜๋Š” ํšŒ๊ท€์ง์„ ๊ณผ ์‹ค์ œ ๊ด€์ฐฐ๋œ ๐‘ฆ๐‘–์˜ ์ฐจ์ด (error)

์šฐ๋ฆฌ๋Š” ์˜ค๋ฅ˜์˜ ํ•ฉ์„ ๊ฐ€์žฅ ์ž‘๊ฒŒ ๋งŒ๋“œ๋Š” ์ง์„ ์„ ์ฐพ๊ณ  ์‹ถ๋‹ค. ์ฆ‰ ๊ทธ๋ ‡๊ฒŒ ๋งŒ๋“œ๋Š” ๐œฝ๐ŸŽ์™€ ๐œฝ๐Ÿ์„ ์ถ”์ •ํ•˜๊ณ  ์‹ถ๋‹ค ! How!! ์ตœ์†Œ ์ œ๊ณฑ ๋ฒ•! (Least Squares Method)

min ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘–2

๐‘–

= ๐‘š๐‘–๐‘› ๐œ–๐‘–2

๐‘–

๐‘ฆ = ๐œƒ0 + ๐œƒ1๐‘ฅ

์‹ค์ œ ๊ด€์ธก ๊ฐ’ ํšŒ๊ท€ ์ง์„ ์˜ ๊ฐ’(์ด์ƒ์ ์ธ ๊ฐ’)

์ข…์† ๋ณ€์ˆ˜ ์„ค๋ช… ๋ณ€์ˆ˜, ๋…๋ฆฝ ๋ณ€์ˆ˜

Page 18: 03. linear regression

min ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘–2

๐‘–

= min ๐œ–๐‘–2

๐‘–

์‹ค์ œ ๊ด€์ธก ๊ฐ’ ํšŒ๊ท€ ์ง์„ ์˜ ๊ฐ’(์ด์ƒ์ ์ธ ๊ฐ’)

์œ„์˜ ์‹์„ ์ตœ๋Œ€ํ•œ ๋งŒ์กฑ ์‹œํ‚ค๋Š” ๐œƒ0, ๐œƒ1์„ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋ฌด์—‡์ผ๊นŒ?

(์ด๋Ÿฌํ•œ ๐œƒ1, ๐œƒ2๋ฅผ ๐œƒ1, ๐œƒ2 ๋ผ๊ณ  ํ•˜์ž.)

- Normal Equation

- Steepest Gradient Descent

ห† ห†

Page 19: 03. linear regression

What is normal equation?

๊ทน๋Œ€ ๊ฐ’, ๊ทน์†Œ ๊ฐ’์„ ๊ตฌํ•  ๋•Œ, ์ฃผ์–ด์ง„ ์‹์„ ๋ฏธ๋ถ„ํ•œ ํ›„์—, ๋ฏธ๋ถ„ํ•œ ์‹์„ 0์œผ๋กœ ๋งŒ๋“œ๋Š” ๊ฐ’์„ ์ฐพ๋Š”๋‹ค.

min ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘–2

๐‘–

๋จผ์ €, ๐œƒ0์— ๋Œ€ํ•˜์—ฌ ๋ฏธ๋ถ„ํ•˜์ž. โˆ’ ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘– = 0

๐‘–

๐œ•

๐œ•๐œƒ0 ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘–

2

๐‘–

=

๋‹ค์Œ์œผ๋กœ, ๐œƒ1์— ๋Œ€ํ•˜์—ฌ ๋ฏธ๋ถ„ํ•˜์ž. โˆ’ ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘– ๐‘ฅ๐‘– = 0

๐‘–

๐œ•

๐œ•๐œƒ1 ๐‘ฆ๐‘– โˆ’ ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘–

2

๐‘–

=

์œ„ ์˜ ๋‘ ์‹์„ 0์œผ๋กœ ๋งŒ์กฑ์‹œํ‚ค๋Š” ๐œƒ0, ๐œƒ1๋ฅผ ์ฐพ์œผ๋ฉด ๋œ๋‹ค. ์ด์ฒ˜๋Ÿผ 2๊ฐœ์˜ ๋ฏธ์ง€์ˆ˜์— ๋Œ€ํ•˜์—ฌ,

2๊ฐœ์˜ ๋ฐฉ์ •์‹(system)์ด ์žˆ์„ ๋•Œ, ์šฐ๋ฆฌ๋Š” ์ด system์„ normal equation(์ •๊ทœ๋ฐฉ์ •์‹)์ด๋ผ ๋ถ€๋ฅธ๋‹ค.

Page 20: 03. linear regression

The normal equation form

๐•ฉ๐‘– = 1, ๐‘ฅ๐‘–๐‘‡, ฮ˜ = ๐œƒ0, ๐œƒ1

๐‘‡, ๐•ช = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘›๐‘‡ , ๐‘‹ =

11โ€ฆ

๐‘ฅ1๐‘ฅ2โ€ฆ

1 ๐‘ฅ๐‘›

, ๐•– = (๐œ–1, โ€ฆ , ๐œ–๐‘›) ๋ผ๊ณ  ํ•˜์ž.

๐•ช = ๐‘‹ฮ˜ + ๐•–

๐‘ฆ1 = ๐œƒ0 + ๐œƒ1๐‘ฅ1 + ๐œ–1

๐‘ฆ2 = ๐œƒ0 + ๐œƒ1๐‘ฅ2 + ๐œ–2

.......

๐‘ฆ๐‘›โˆ’1 = ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘›โˆ’1 + ๐œ–๐‘›โˆ’1

๐‘ฆ๐‘› = ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘› + ๐œ–๐‘›

๐‘›๊ฐœ์˜ ๊ด€์ธก ๊ฐ’ (๐‘ฅ๐‘– , ๐‘ฆ๐‘–)์€ ์•„๋ž˜์™€ ๊ฐ™์€ ํšŒ๊ท€ ๋ชจํ˜•์„ ๊ฐ€์ง„๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ž.

๐‘ฆ1๐‘ฆ2๐‘ฆ3โ€ฆ๐‘ฆ๐‘›

=

111โ€ฆ

๐‘ฅ1๐‘ฅ2๐‘ฅ3โ€ฆ

1 ๐‘ฅ๐‘›

๐œƒ0๐œƒ1

+

๐œ–1๐œ–2๐œ–3โ€ฆ๐œ–๐‘›

Page 21: 03. linear regression

๐œ–๐‘—2

๐‘›

๐‘—=1

= ๐•–๐‘‡๐•– = ๐•ช โˆ’ ๐‘‹ฮ˜ ๐‘‡(๐•ช โˆ’ ๐‘‹ฮ˜)

= ๐•ช๐‘‡๐•ช โˆ’ ฮ˜๐‘‡๐‘‹๐‘‡๐•ช โˆ’ ๐•ช๐‘‡๐‘‹ฮ˜ + ฮ˜๐‘‡๐‘‹๐‘‡๐‘‹ฮ˜ = ๐•ช๐‘‡๐•ช โˆ’ 2ฮ˜๐‘‡๐‘‹๐‘‡๐•ช + ฮ˜๐‘‡๐‘‹๐‘‡๐‘‹ฮ˜

1 by 1 ํ–‰๋ ฌ์ด๋ฏ€๋กœ ์ „์น˜ํ–‰๋ ฌ์˜ ๊ฐ’์ด ๊ฐ™๋‹ค!

๐œ•(๐•–๐‘‡๐•–)

๐œ•ฮ˜= ๐ŸŽ

๐œ•(๐•–๐‘‡๐•–)

๐œ•ฮ˜= โˆ’2๐‘‹๐‘‡๐•ช + 2๐‘‹๐‘‡๐‘‹ฮ˜ = ๐ŸŽ

๐‘‹๐‘‡๐‘‹๐šฏ = ๐‘‹๐‘‡๐•ช ๐šฏ = ๐‘‹๐‘‡๐‘‹ โˆ’1๐‘‹๐‘‡๐•ช ห†

์ •๊ทœ๋ฐฉ์ •์‹

๐•ช = ๐‘‹ฮ˜ + ๐•– ๐•– = ๐•ช โˆ’ ๐‘‹ฮ˜

Minimize ๐œ–๐‘—2

๐‘›

๐‘—=1

Page 22: 03. linear regression

What is Gradient Descent?

machine learning์—์„œ๋Š” ๋งค๊ฐœ ๋ณ€์ˆ˜(parameter, ์„ ํ˜•ํšŒ๊ท€์—์„œ๋Š” ๐œƒ0, ๐œƒ1)๊ฐ€ ์ˆ˜์‹ญ~

์ˆ˜๋ฐฑ ์ฐจ์›์˜ ๋ฒกํ„ฐ์ธ ๊ฒฝ์šฐ๊ฐ€ ๋Œ€๋ถ€๋ถ„์ด๋‹ค. ๋˜ํ•œ ๋ชฉ์  ํ•จ์ˆ˜(์„ ํ˜•ํšŒ๊ท€์—์„œ๋Š” ฮฃ๐œ–๐‘–2)๊ฐ€

๋ชจ๋“  ๊ตฌ๊ฐ„์—์„œ ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๋ณด์žฅ์ด ํ•ญ์ƒ ์žˆ๋Š” ๊ฒƒ๋„ ์•„๋‹ˆ๋‹ค.

๋”ฐ๋ผ์„œ ํ•œ ๋ฒˆ์˜ ์ˆ˜์‹ ์ „๊ฐœ๋กœ ํ•ด๋ฅผ ๊ตฌํ•  ์ˆ˜ ์—†๋Š” ์ƒํ™ฉ์ด ์ ์ง€ ์•Š๊ฒŒ ์žˆ๋‹ค.

์ด๋Ÿฐ ๊ฒฝ์šฐ์—๋Š” ์ดˆ๊ธฐ ํ•ด์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ํ•ด๋ฅผ ๋ฐ˜๋ณต์ ์œผ๋กœ ๊ฐœ์„ ํ•ด ๋‚˜๊ฐ€๋Š” ์ˆ˜์น˜์ 

๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•œ๋‹ค. (๋ฏธ๋ถ„์ด ์‚ฌ์šฉ ๋จ)

Page 23: 03. linear regression

What is Gradient Descent?

์ดˆ๊ธฐํ•ด ๐›ผ0 ์„ค์ • ๐‘ก = 0

๐›ผ๐‘ก๊ฐ€ ๋งŒ์กฑ์Šค๋Ÿฝ๋‚˜?

๐›ผ๐‘ก+1 = ๐‘ˆ ๐›ผ๐‘ก ๐‘ก = ๐‘ก + 1

๐›ผ = ๐›ผ๐‘ก ห† No

Yes

Page 24: 03. linear regression

What is Gradient Descent?

Gradient Descent

ํ˜„์žฌ ์œ„์น˜์—์„œ ๊ฒฝ์‚ฌ๊ฐ€ ๊ฐ€์žฅ ๊ธ‰ํ•˜๊ฒŒ ํ•˜๊ฐ•ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ฐพ๊ณ ,

๊ทธ ๋ฐฉํ–ฅ์œผ๋กœ ์•ฝ๊ฐ„ ์ด๋™ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์œ„์น˜๋ฅผ ์žก๋Š”๋‹ค.

์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•จ์œผ๋กœ์จ ๊ฐ€์žฅ ๋‚ฎ์€ ์ง€์ (์ฆ‰ ์ตœ์ € ์ )์„ ์ฐพ์•„ ๊ฐ„๋‹ค.

Gradient Ascent

ํ˜„์žฌ ์œ„์น˜์—์„œ ๊ฒฝ์‚ฌ๊ฐ€ ๊ฐ€์žฅ ๊ธ‰ํ•˜๊ฒŒ ์ƒ์Šนํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ฐพ๊ณ ,

๊ทธ ๋ฐฉํ–ฅ์œผ๋กœ ์•ฝ๊ฐ„ ์ด๋™ํ•˜์—ฌ ์ƒˆ๋กœ์šด ์œ„์น˜๋ฅผ ์žก๋Š”๋‹ค.

์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•จ์œผ๋กœ์จ ๊ฐ€์žฅ ๋†’์€ ์ง€์ (์ฆ‰ ์ตœ๋Œ€ ์ )์„ ์ฐพ์•„ ๊ฐ„๋‹ค.

Page 25: 03. linear regression

What is Gradient Descent?

Gradient Descent

๐›ผ๐‘ก+1 = ๐›ผ๐‘ก โˆ’ ๐œŒ๐œ•๐ฝ

๐œ•๐›ผ ๐›ผ๐‘ก

๐ฝ =๋ชฉ์ ํ•จ์ˆ˜

๐œ•๐ฝ

๐œ•๐›ผ ๐›ผ๐‘ก: ๐›ผ๐‘ก์—์„œ์˜ ๋„ํ•จ์ˆ˜

๐œ•๐ฝ

๐œ•๐›ผ์˜ ๊ฐ’

๐›ผ๐‘ก ๐›ผ๐‘ก+1

โˆ’๐๐‘ฑ

๐๐œถ ๐œถ๐’•

๐๐‘ฑ

๐๐œถ ๐œถ๐’•

๐›ผ๐‘ก์—์„œ์˜ ๋ฏธ๋ถ„๊ฐ’์€ ์Œ์ˆ˜์ด๋‹ค.

๊ทธ๋ž˜์„œ ๐œ•J

๐œ•ฮฑ ฮฑt ๋ฅผ ๋”ํ•˜๊ฒŒ ๋˜๋ฉด

์™ผ์ชฝ์œผ๋กœ ์ด๋™ํ•˜๊ฒŒ ๋œ๋‹ค.

๊ทธ๋Ÿฌ๋ฉด ๋ชฉ์ ํ•จ์ˆ˜์˜ ๊ฐ’์ด ์ฆ๊ฐ€ํ•˜๋Š”

๋ฐฉํ–ฅ์œผ๋กœ ์ด๋™ํ•˜๊ฒŒ ๋œ๋‹ค.

๋”ฐ๋ผ์„œ ๐œ•J

๐œ•ฮฑ ฮฑt๋ฅผ ๋นผ์ค€๋‹ค.

๊ทธ๋ฆฌ๊ณ  ์ ๋‹นํ•œ ๐œŒ๋ฅผ ๊ณฑํ•ด์ฃผ์–ด์„œ ์กฐ๊ธˆ๋งŒ

์ด๋™ํ•˜๊ฒŒ ํ•œ๋‹ค.

โˆ’๐†๐๐‘ฑ

๐๐œถ ๐œถ๐’•

Page 26: 03. linear regression

What is Gradient Descent?

Gradient Descent

๐›ผ๐‘ก+1 = ๐›ผ๐‘ก โˆ’ ๐œŒ๐œ•๐ฝ

๐œ•๐›ผ ๐›ผ๐‘ก

Gradient Ascent

๐›ผ๐‘ก+1 = ๐›ผ๐‘ก + ๐œŒ๐œ•๐ฝ

๐œ•๐›ผ ๐›ผ๐‘ก

๐ฝ =๋ชฉ์ ํ•จ์ˆ˜

๐œ•๐ฝ

๐œ•๐›ผ ๐›ผ๐‘ก: ๐›ผ๐‘ก์—์„œ์˜ ๋„ํ•จ์ˆ˜

๐œ•๐ฝ

๐œ•๐›ผ์˜ ๊ฐ’

Gradient Descent, Gradient Ascent๋Š” ์ „ํ˜•์ ์ธ Greedy algorithm์ด๋‹ค.

๊ณผ๊ฑฐ ๋˜๋Š” ๋ฏธ๋ž˜๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ํ˜„์žฌ ์ƒํ™ฉ์—์„œ ๊ฐ€์žฅ ์œ ๋ฆฌํ•œ ๋‹ค์Œ ์œ„์น˜๋ฅผ ์ฐพ์•„

Local optimal point๋กœ ๋๋‚  ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค.

Page 27: 03. linear regression

๐ฝ ฮ˜ = 1

2 ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–

2

๐‘›

๐‘–=1

= 1

2 ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–

2

๐‘›

๐‘–=1

๐•ฉ๐‘– = 1, ๐‘ฅ๐‘–๐‘‡, ฮ˜ = ๐œƒ0, ๐œƒ1

๐‘‡, ๐•ช = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘›๐‘‡ , ๐‘‹ =

11โ€ฆ

๐‘ฅ1๐‘ฅ2โ€ฆ

1 ๐‘ฅ๐‘›

, ๐•– = (๐œ–1, โ€ฆ , ๐œ–๐‘›) ๋ผ๊ณ  ํ•˜์ž.

๐œƒ0๐‘ก+1 = ๐œƒ0

๐‘ก โˆ’ ๐›ผ๐œ•

๐œ•๐œƒ0๐ฝ(ฮ˜)๐‘ก

๐œƒ1๐‘ก+1 = ๐œƒ1

๐‘ก โˆ’ ๐›ผ๐œ•

๐œ•๐œƒ1๐ฝ(ฮ˜)๐‘ก

๐œƒ0์˜ ๐‘ก๋ฒˆ์งธ ๊ฐ’์„,

๐ฝ(ฮ˜)๋ฅผ ๐œƒ0์œผ๋กœ ๋ฏธ๋ถ„ํ•œ ์‹์—๋‹ค๊ฐ€ ๋Œ€์ž….

๊ทธ ํ›„์—, ์ด ๊ฐ’์„ ๐œƒ0์—์„œ ๋นผ ์คŒ.

๋ฏธ๋ถ„ํ•  ๋•Œ ์ด์šฉ.

Gradient descent๋ฅผ ์ค‘์ง€ํ•˜๋Š”

๊ธฐ์ค€์ด ๋˜๋Š” ํ•จ์ˆ˜

Page 28: 03. linear regression

๐ฝ ฮ˜ = 1

2 ๐œƒ0 + ๐œƒ1๐‘ฅ๐‘– โˆ’ ๐‘ฆ๐‘–

2

๐‘›

๐‘–=1

= 1

2 ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–

2

๐‘›

๐‘–=1

๐•ฉ๐‘– = 1, ๐‘ฅ๐‘–๐‘‡, ฮ˜ = ๐œƒ0, ๐œƒ1

๐‘‡, ๐•ช = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘›๐‘‡ , ๐‘‹ =

11โ€ฆ

๐‘ฅ1๐‘ฅ2โ€ฆ

1 ๐‘ฅ๐‘›

, ๐•– = (๐œ–1, โ€ฆ , ๐œ–๐‘›) ๋ผ๊ณ  ํ•˜์ž.

Gradient of ๐ฝ(ฮ˜)

๐œ•

๐œ•๐œƒ0๐ฝ ๐œƒ = (ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–)

๐‘›

๐‘–=1

1 ๐œ•

๐œ•๐œƒ1๐ฝ ๐œƒ = (ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–)

๐‘›

๐‘–=1

๐‘ฅ๐‘–

๐›ป๐ฝ ฮ˜ =๐œ•

๐œ•๐œƒ0๐ฝ ฮ˜ ,๐œ•

๐œ•๐œƒ1๐ฝ ฮ˜

๐‘‡

= ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘– ๐•ฉ๐‘–

๐‘›

๐‘–=1

Page 29: 03. linear regression

๐•ฉ๐‘– = 1, ๐‘ฅ๐‘–๐‘‡, ฮ˜ = ๐œƒ0, ๐œƒ1

๐‘‡, ๐•ช = ๐‘ฆ1, ๐‘ฆ2, โ€ฆ , ๐‘ฆ๐‘›๐‘‡ , ๐‘‹ =

11โ€ฆ

๐‘ฅ1๐‘ฅ2โ€ฆ

1 ๐‘ฅ๐‘›

, ๐•– = (๐œ–1, โ€ฆ , ๐œ–๐‘›) ๋ผ๊ณ  ํ•˜์ž.

๐œƒ0๐‘ก+1 = ๐œƒ0

๐‘ก โˆ’ ๐›ผ (ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–)

๐‘›

๐‘–=1

1 ๋‹จ, ์ด ๋•Œ์˜ ฮ˜์ž๋ฆฌ์—๋Š”

๐‘ก๋ฒˆ์งธ์— ์–ป์–ด์ง„ ฮ˜๊ฐ’์„ ๋Œ€์ž…ํ•ด์•ผ ํ•œ๋‹ค.

๐œƒ1๐‘ก+1 = ๐œƒ1

๐‘ก โˆ’ ๐›ผ ฮ˜๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘– ๐‘ฅ๐‘–

๐‘›

๐‘–=1

Page 30: 03. linear regression

Steepest Descent

Page 31: 03. linear regression

Steepest Descent

์žฅ์  : easy to implement, conceptually clean, guaranteed convergence

๋‹จ์  : often slow converging

ฮ˜๐‘ก+1 = ฮ˜๐‘ก โˆ’ ๐›ผ {(ฮ˜๐‘ก)๐‘‡๐•ฉ๐‘– โˆ’ ๐‘ฆ๐‘–}๐•ฉ๐‘–

๐‘›

๐‘–=1

Normal Equations

์žฅ์  : a single-shot algorithm! Easiest to implement.

๋‹จ์  : need to compute pseudo-inverse ๐‘‹๐‘‡๐‘‹ โˆ’1, expensive, numerical issues

(e.g., matrix is singular..), although there are ways to get around this ...

๐•– = ๐‘‹๐‘‡๐‘‹ โˆ’1๐‘‹๐‘‡๐•ช ห†

Page 32: 03. linear regression

Multivariate Linear Regression

Page 33: 03. linear regression

๐’š = ๐œฝ๐ŸŽ + ๐œฝ๐Ÿ๐’™๐Ÿ + ๐œฝ๐Ÿ๐’™๐Ÿ +โ‹ฏ+ ๐œฝ๐’๐’™๐’

๋‹จ์ˆœ ์„ ํ˜• ํšŒ๊ท€ ๋ถ„์„์€, input ๋ณ€์ˆ˜๊ฐ€ 1. ๋‹ค์ค‘ ์„ ํ˜• ํšŒ๊ท€ ๋ถ„์„์€, input ๋ณ€์ˆ˜๊ฐ€ 2๊ฐœ ์ด์ƒ.

Google์˜ ์ฃผ์‹ ๊ฐ€๊ฒฉ

Yahoo์˜ ์ฃผ์‹ ๊ฐ€๊ฒฉ

Microsoft์˜ ์ฃผ์‹ ๊ฐ€๊ฒฉ

Page 34: 03. linear regression

๐’š = ๐œฝ๐ŸŽ + ๐œฝ๐Ÿ๐’™๐Ÿ๐Ÿ + ๐œฝ๐Ÿ๐’™๐Ÿ

๐Ÿ’ + ๐

์˜ˆ๋ฅผ ๋“ค์–ด, ์•„๋ž˜์™€ ๊ฐ™์€ ์‹์„ ์„ ํ˜•์œผ๋กœ ์ƒ๊ฐํ•˜์—ฌ ํ’€ ์ˆ˜ ์žˆ๋Š”๊ฐ€?

๋ฌผ๋ก , input ๋ณ€์ˆ˜๊ฐ€ polynomial(๋‹คํ•ญ์‹)์˜ ํ˜•ํƒœ์ด์ง€๋งŒ, coefficients ๐œƒ๐‘–๊ฐ€ ์„ ํ˜•(linear)์ด๋ฏ€๋กœ ์„ ํ˜• ํšŒ๊ท€ ๋ถ„์„์˜ ํ•ด๋ฒ•์œผ๋กœ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

๐šฏ = ๐‘‹๐‘‡๐‘‹ โˆ’1๐‘‹๐‘‡๐•ช ห†

๐œƒ0, ๐œƒ1, โ€ฆ , ๐œƒ๐‘›๐‘‡

Page 35: 03. linear regression

General Linear Regression

Page 36: 03. linear regression

๐’š = ๐œฝ๐ŸŽ + ๐œฝ๐Ÿ๐’™๐Ÿ + ๐œฝ๐Ÿ๐’™๐Ÿ +โ‹ฏ+ ๐œฝ๐’๐’™๐’ ์ค‘ ํšŒ๊ท€ ๋ถ„์„

์ผ๋ฐ˜ ํšŒ๊ท€ ๋ถ„์„ ๐’š = ๐œฝ๐ŸŽ + ๐œฝ๐Ÿ๐’ˆ๐Ÿ(๐’™๐Ÿ) + ๐œฝ๐Ÿ๐’ˆ๐Ÿ(๐’™๐Ÿ) + โ‹ฏ+ ๐œฝ๐’๐’ˆ๐’(๐’™๐’)

๐‘”๐‘—๋Š” ๐‘ฅ๐‘— ๋˜๋Š”

(๐‘ฅโˆ’๐œ‡๐‘—)

2๐œŽ๐‘— ๋˜๋Š”

1

1+exp(โˆ’๐‘ ๐‘—๐‘ฅ)๋“ฑ์˜ ํ•จ์ˆ˜๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.

์ด๊ฒƒ๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์„ ํ˜• ํšŒ๊ท€ ํ’€์ด ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ’€ ์ˆ˜ ์žˆ๋‹ค.

Page 37: 03. linear regression

๐‘ค๐‘‡ = (๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘›)

๐œ™ ๐‘ฅ๐‘–๐‘‡= ๐œ™0 ๐‘ฅ

๐‘– , ๐œ™1 ๐‘ฅ๐‘– , โ€ฆ , ๐œ™๐‘› ๐‘ฅ

๐‘–

Page 38: 03. linear regression

๐‘ค๐‘‡ = (๐‘ค0, ๐‘ค1, โ€ฆ , ๐‘ค๐‘›)

๐œ™ ๐‘ฅ๐‘–๐‘‡= ๐œ™0 ๐‘ฅ

๐‘– , ๐œ™1 ๐‘ฅ๐‘– , โ€ฆ , ๐œ™๐‘› ๐‘ฅ

๐‘–

normal equation

Page 39: 03. linear regression

[ ์ž๋ฃŒ์˜ ๋ถ„์„ ]

โ‘  ๋ชฉ์  : ์ง‘์„ ํŒ”๊ธฐ ์›ํ•จ. ์•Œ๋งž์€ ๊ฐ€๊ฒฉ์„ ์ฐพ๊ธฐ ์›ํ•จ.

โ‘ก ๊ณ ๋ คํ•  ๋ณ€์ˆ˜(feature) : ์ง‘์˜ ํฌ๊ธฐ(in square feet), ์นจ์‹ค์˜ ๊ฐœ์ˆ˜, ์ง‘ ๊ฐ€๊ฒฉ

Page 40: 03. linear regression

(์ถœ์ฒ˜ : http://aimotion.blogspot.kr/2011/10/machine-learning-with-python-linear.html)

โ‘ข ์ฃผ์˜์‚ฌํ•ญ : ์ง‘์˜ ํฌ๊ธฐ์™€ ์นจ์‹ค์˜ ๊ฐœ์ˆ˜์˜ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ง‘์˜ ํฌ๊ธฐ๊ฐ€ 4000 square feet์ธ๋ฐ,

์นจ์‹ค์˜ ๊ฐœ์ˆ˜๋Š” 3๊ฐœ์ด๋‹ค. ์ฆ‰, ๋ฐ์ดํ„ฐ ์ƒ feature๋“ค ๊ฐ„ ๊ทœ๋ชจ์˜ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค. ์ด๋Ÿด ๊ฒฝ์šฐ,

feature์˜ ๊ฐ’์„ ์ •๊ทœํ™”(normalizing)๋ฅผ ํ•ด์ค€๋‹ค. ๊ทธ๋ž˜์•ผ, Gradient Descent๋ฅผ ์ˆ˜ํ–‰ํ•  ๋•Œ,

๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋‹ค.

โ‘ฃ ์ •๊ทœํ™”์˜ ๋ฐฉ๋ฒ•

- feature์˜ mean(ํ‰๊ท )์„ ๊ตฌํ•œ ํ›„, feature๋‚ด์˜ ๋ชจ๋“  data์˜ ๊ฐ’์—์„œ mean์„ ๋นผ์ค€๋‹ค.

- data์—์„œ mean์„ ๋นผ ์ค€ ๊ฐ’์„, ๊ทธ data๊ฐ€ ์†ํ•˜๋Š” standard deviation(ํ‘œ์ค€ ํŽธ์ฐจ)๋กœ ๋‚˜๋ˆ„์–ด ์ค€๋‹ค. (scaling)

์ดํ•ด๊ฐ€ ์•ˆ ๋˜๋ฉด, ์šฐ๋ฆฌ๊ฐ€ ๊ณ ๋“ฑํ•™๊ต ๋•Œ ๋ฐฐ์› ๋˜ ์ •๊ทœ๋ถ„ํฌ๋ฅผ ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ๋กœ ๋ฐ”๊พธ์–ด์ฃผ๋Š” ๊ฒƒ์„ ๋– ์˜ฌ๋ ค๋ณด์ž.

ํ‘œ์ค€์ •๊ทœ๋ถ„ํฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ  ์ค‘ ํ•˜๋‚˜๋Š”, ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ ๋ถ„ํฌ, ์ฆ‰ ๋น„๊ต๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ฑฐ๋‚˜ ์–ด๋ ค์šด ๋‘ ๋ถ„ํฌ๋ฅผ ์‰ฝ๊ฒŒ

๋น„๊ตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.

๐‘ = ๐‘‹ โˆ’ ๐œ‡

๐œŽ If ๐‘‹~(๐œ‡, ๐œŽ) then ๐‘~๐‘(1,0)

Page 41: 03. linear regression

1. http://www.cs.cmu.edu/~epxing/Class/10701/Lecture/lecture5-LiR.pdf

2. http://www.cs.cmu.edu/~10701/lecture/RegNew.pdf

3. ํšŒ๊ท€๋ถ„์„ ์ œ 3ํŒ (๋ฐ•์„ฑํ˜„ ์ €)

4. ํŒจํ„ด์ธ์‹ (์˜ค์ผ์„ ์ง€์Œ)

5. ์ˆ˜๋ฆฌํ†ต๊ณ„ํ•™ ์ œ 3ํŒ (์ „๋ช…์‹ ์ง€์Œ)

Page 42: 03. linear regression

Laplacian Smoothing

multinomial random variable ๐‘ง : ๐‘ง๋Š” 1๋ถ€ํ„ฐ ๐‘˜๊นŒ์ง€์˜ ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋‹ค.

์šฐ๋ฆฌ๋Š” test set์œผ๋กœ ๐‘š๊ฐœ์˜ ๋…๋ฆฝ์ธ ๊ด€์ฐฐ ๊ฐ’ ๐‘ง 1 , โ€ฆ , ๐‘ง ๐‘š ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ๊ด€์ฐฐ ๊ฐ’์„ ํ†ตํ•ด, ๐’‘(๐’› = ๐’Š) ๋ฅผ ์ถ”์ •ํ•˜๊ณ  ์‹ถ๋‹ค. (๐‘– = 1, โ€ฆ , ๐‘˜)

์ถ”์ • ๊ฐ’(MLE)์€,

๐‘ ๐‘ง = ๐‘— = ๐ผ{๐‘ง ๐‘– = ๐‘—}๐‘š๐‘–=1

๐‘š

์ด๋‹ค. ์—ฌ๊ธฐ์„œ ๐ผ . ๋Š” ์ง€์‹œ ํ•จ์ˆ˜ ์ด๋‹ค. ๊ด€์ฐฐ ๊ฐ’ ๋‚ด์—์„œ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”์ •ํ•œ๋‹ค.

ํ•œ ๊ฐ€์ง€ ์ฃผ์˜ ํ•  ๊ฒƒ์€, ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ •ํ•˜๋ ค๋Š” ๊ฐ’์€ ๋ชจ์ง‘๋‹จ(population)์—์„œ์˜ ๋ชจ์ˆ˜

๐‘(๐‘ง = ๐‘–)๋ผ๋Š” ๊ฒƒ์ด๋‹ค. ์ถ”์ •ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ test set(or ํ‘œ๋ณธ ์ง‘๋‹จ)์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ๋ฟ์ด๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, ๐‘ง(๐‘–) โ‰  3 for all ๐‘– = 1, โ€ฆ ,๐‘š ์ด๋ผ๋ฉด, ๐‘ ๐‘ง = 3 = 0 ์ด ๋˜๋Š” ๊ฒƒ์ด๋‹ค.

์ด๊ฒƒ์€, ํ†ต๊ณ„์ ์œผ๋กœ ๋ณผ ๋•Œ, ์ข‹์ง€ ์•Š์€ ์ƒ๊ฐ์ด๋‹ค. ๋‹จ์ง€, ํ‘œ๋ณธ ์ง‘๋‹จ์—์„œ ๋ณด์ด์ง€

์•Š๋Š” ๋‹ค๋Š” ์ด์œ ๋กœ ์šฐ๋ฆฌ๊ฐ€ ์ถ”์ •ํ•˜๊ณ ์ž ํ•˜๋Š” ๋ชจ์ง‘๋‹จ์˜ ๋ชจ์ˆ˜ ๊ฐ’์„ 0์œผ๋กœ ํ•œ๋‹ค๋Š” ๊ฒƒ์€

ํ†ต๊ณ„์ ์œผ๋กœ ์ข‹์ง€ ์•Š์€ ์ƒ๊ฐ(bad idea)์ด๋‹ค. (MLE์˜ ์•ฝ์ )

Page 43: 03. linear regression

์ด๊ฒƒ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š”,

โ‘  ๋ถ„์ž๊ฐ€ 0์ด ๋˜์–ด์„œ๋Š” ์•ˆ ๋œ๋‹ค.

โ‘ก ์ถ”์ • ๊ฐ’์˜ ํ•ฉ์ด 1์ด ๋˜์–ด์•ผ ํ•œ๋‹ค. ๐‘ ๐‘ง = ๐‘—๐‘ง =1 (โˆต ํ™•๋ฅ ์˜ ํ•ฉ์€ 1์ด ๋˜์–ด์•ผ ํ•จ)

๋”ฐ๋ผ์„œ,

๐’‘ ๐’› = ๐’‹ = ๐‘ฐ ๐’› ๐’Š = ๐’‹ + ๐Ÿ๐’Ž๐’Š=๐Ÿ

๐’Ž+ ๐’Œ

์ด๋ผ๊ณ  ํ•˜์ž.

โ‘ ์˜ ์„ฑ๋ฆฝ : test set ๋‚ด์— ๐‘—์˜ ๊ฐ’์ด ์—†์–ด๋„, ํ•ด๋‹น ์ถ”์ • ๊ฐ’์€ 0์ด ๋˜์ง€ ์•Š๋Š”๋‹ค.

โ‘ก์˜ ์„ฑ๋ฆฝ : ๐‘ง(๐‘–) = ๐‘—์ธ data์˜ ์ˆ˜๋ฅผ ๐‘›๐‘—๋ผ๊ณ  ํ•˜์ž. ๐‘ ๐‘ง = 1 = ๐‘›1+1

๐‘š+๐‘˜, โ€ฆ , ๐‘ ๐‘ง = ๐‘˜ =

๐‘›๐‘˜+1

๐‘š+๐‘˜

์ด๋‹ค. ๊ฐ ์ถ”์ • ๊ฐ’์„ ๋‹ค ๋”ํ•˜๊ฒŒ ๋˜๋ฉด 1์ด ๋‚˜์˜จ๋‹ค.

์ด๊ฒƒ์ด ๋ฐ”๋กœ Laplacian smoothing์ด๋‹ค.

๐‘ง๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ๊ฐ’์ด 1๋ถ€ํ„ฐ ๐‘˜๊นŒ์ง€ ๊ท ๋“ฑํ•˜๊ฒŒ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์ •์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค๊ณ 

์ง๊ด€์ ์œผ๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค. 1