linear regression ð ð¶ - kangwoncs.kangwon.ac.kr/~parkce/seminar/2015_machinelearning/03... ·...
TRANSCRIPT
ð ðððð ð¶
Machine Learning
ð ðððð ð¶
2015.06.06.
Linear Regression
ð ðððð ð¶ 2
Issues
ð ðððð ð¶ 3
Issues
⢠https://www.facebook.com/Architecturearts/videos/1107531579263808/
⢠â8ìŽì§ëŠ¬ 조칎ìê² ë°ìŽí°ë² ìŽì€(DB)ê° ë¬Žììžì§ 3ì€ì 묞ì¥ìŒë¡ ì€ëª íìì€â
⢠6ê°ìëì ìµë 25ë²ìŽë ëë ë©Žì ìíì ê±°ì³ êµ¬êžë¬(êµ¬êž ì§ìì ìŒì»«ë ë§)ê° ë íë¥ ì 0.25%. íë²ëëë³Žë€ 25ë°° ë€ìŽê°êž° ìŽë µë€.
⢠âì°ëŠ¬ë â구êžë€ìŽâ(Being Googley) ìžì¬ë€ë§ ëœëë€â⢠íì¬ì ëê° ë€ë¥ž ê°ì¹ë ì¬ë¥ì ê°ì žë€ ì€ ì ìëì§
⢠ìë¡ìŽ ì§ìì ë°ìë€ìŒ ì€ ìë ì§ì ìž ê²žì·ì ì°íšì ê°ì·ëì§
⢠굎ë¬ë€ëë ì°ë Ʞ륌 ì€ì€ë¡ ì€ë ìë°ì ìž ì¬ëìžì§
⢠ë§ì겜 ì±ë¥ì ê°ì íëë ë¬ì ì°ì£Œì ì ìë ê² ë«ë€ë ìì â묞ì·ì±í¹â ì¶ì²: ì€ììŒë³Ž
ð ðððð ð¶ 4
Issues
⢠ì€ëŠ¬ìœë°žëŠ¬ì ì€íížì âë¡ìœëªší°ëžë©ì€â ìŽììž(39) ëíë âêž°ì êž°ì ìì 몚ëê° ëê°ì 귌묎ìê°ì ì±ì°ë ê²ë³Žë€ ìµê³ ì ì€ë ¥ì ê°ì§ 1êž ê°ë°ìë€ìŽ ìµê³ ìì±ê³Œë¥Œ ëŒ ì ìëë¡ íë ê² ë ì€ìíë€.â
⢠âìŽë€ìŽ ìŽì§íì§ ìëë¡ ë¶ì¡ì ëë €ë©Ž ê³ ì¡ì°ëŽ ìžì, âìì â ê°ì íë¬ì€ ìíì ê°ì¹ë¥Œ ë ì€ìŒ íë€ë ê²ì€ëŠ¬ìœë°žëŠ¬ì 볎ížì ìž ë¶ìêž°â
⢠http://www.washingtonpost.com/graphics/business/robots/
ì¶ì²: ì€ììŒë³Ž
ð ðððð ð¶ 5
Issues
ð ðððð ð¶ 6
Linear Regression
⢠ììì ë°ìŽí°ê° ìì ë, ë°ìŽí° ìì§ ê°ì ìêŽêŽê³ë¥Œ ê³ ë €íë ê²
ì¹êµ¬ 1 ì¹êµ¬ 2 ì¹êµ¬ 3 ì¹êµ¬ 4 ì¹êµ¬ 5
í€ 160 165 170 170 175
ëªžë¬Žê² 50 50 55 50 60
ð ðððð ð¶ 7
Linear Regression
⢠ìŠ, íê· ë¬žì ë..
⢠ìì¹í 목ì ê°ì ììž¡íë ë°©ë²
⢠목ì ê°ì ëí ë°©ì ì íì⢠íê· ë°©ì ì(Regression equation)
â¢ ì§ ê°ì ìêž° ìíŽ ìëì ê°ì ë°©ì ìì ìŽì©
⢠Ex) ì§ ê° = 0.125 * íì + 0.5 * ìê¹ì§ì 거늬
⢠âíìâì âìê¹ì§ì 거늬â ì ë ¥ ë°ìŽí°
⢠âì§ ê°â ì¶ì ë°ìŽí°
⢠0.125ì 0.5ì ê° íê· ê°ì€ì¹(Regression weight)
⢠ì¬ìì¹êµ¬ì 몞묎ê²ë¥Œ ì¶ì íêž° ìíì¬..
⢠Ex) ëªžë¬Žê² = 0.05 * í€
⢠âí€â ì ë ¥ ë°ìŽí°
⢠â몞묎ê²â ì¶ì ë°ìŽí°
⢠0.05 íê· ê°ì€ì¹
ð ðððð ð¶ 8
Hypothesis
ðŠ = ð€ð¥ + ðð¥ì ë ¥ë°ìŽí°: í€
ðŠì¶ì ë°ìŽí°: 몞묎ê²
ð€íê·ê°ì€ì¹: êž°ìžêž°
Hypothesis
ð ðððð ð¶ 9
Hypothesis
0
1
2
3
0 1 2 3
0
1
2
3
0 1 2 3
0
1
2
3
0 1 2 3
Andrew Ng
ðŠ = ð€ð¥ + ð
ð ðððð ð¶ 10
Hypothesis
ðŠð = ð€0 +ð€ðð¥ð
ðŠð = ð€0 à 1 +
ð=1
ð
ð€ðð¥ð
ðŠð = ð=0ð ð€ðð¥ð ð€ð¥
ðŠ = ð€ð¥ + ð ðŠ = ð€ð¥
Variable Description
ðœ(ð), r Cost function vector, residual(r)
y Instance label vector
ðŠ, h(ð) hypothesis
ð€0, b Bias(b), y-intercept
ð¥ð Feature vector, ð¥0 = 1
W Weight set (ð€1, ð€2, ð€3, ⊠, ð€ð)
X Feature set (ð¥1, ð¥2, ð¥3, ⊠, ð¥ð)(generalization)
(generalization)
ð ðððð ð¶ 11
Regression: statistical example
⢠몚ì§ëš: ì íµêž°ê°ì ë°ë¥ž ë¹í믌 Cì íꎎë
⢠ë 늜 ë³ì Xê° ì£ŒìŽì¡ì ëYì ëí êž°ë ê°
ì íµêž°ê° (ìŒ) : X 15 20 25 30 35
ë¹í믌 C íꎎë (mg):Y
05
101520
1520253035
3035404550
5055606570
5560657075
ðŠ = ð€ð¥ + ð + ð
ðŠ = ðð¥ + ð
ð: disturbance term, error variable
ð ðððð ð¶ 12
Regression: statistical example
Random variable of Y
ð ðððð ð¶ 13
Residual
ã ¡ì ëµëªšëžã ¡ì¶ì 몚ëž
ì ëµë°ìŽí°ì¶ì ë°ìŽí°
Residual: ð(= ð)
ð1
ð2
ð3
ð4
ð5
⢠ìëìë§ììë¡ê°ìì믞⢠ì ëµë°ìŽí°ìì¶ì ë°ìŽí°ìì°šìŽâ¢ ì ëµëªšëžê³Œì¶ì 몚ëžìì°šìŽ
ðŠ = ð€ð¥ + ð, ð . ð¡. min(ð)
ð ðððð ð¶ 14
Least Square Error (LSE)
ð1
ð2ð3
ð4ð5
ð = ðŠ â âð(ð¥)
ðð = ðŠ â ðŠ
ð =
ð
(ðŠð â ðŠð)
ðð = ðŠð â ðŠð
ððð
ð=1
ð
ð2 = ððð
ð=1
ð
ðŠð â ðŠð2
Least squareð =
ð=1
ð
ðŠð â ð€ðð¥ð â ð 2
ð =1
2
ð=1
ð
ðŠð â ð€ðð¥ð â ð 2
= ðœ(ð) âcost functionâ
ðŠ â ð â âð(ð¥)
(residual)
ð ðððð ð¶ 15
0
1
2
3
0 1 2 3
y
x
(for fixed , this is a function of x) (function of the parameter )
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
Cost Function
ð ð¥1 = âð ð¥1 = ð1ð¥1 = 1 ðœ ð1 = ðŠ1 â ð(ð¥1)
ðœ ð1 = 1 â 1 = 0 = ð
Andrew Ng⎠min ðœ(ð) == min ð
ð ð¥1 = âð ð¥1 = ð€1ð¥1 = 1
ð ðððð ð¶ 16
Training
⢠Residualìì€ì¬ìŒíš LSEìê°ììµìííŽìŒíš
⢠2ì°šíšìíëììµìê°(minimum)ìê°ì§
â¢ ê° wìëíì ííšìê°ì°šìììµìê°ììììì
⢠ìŠ, ì ììµìê°(global minimum)ììììì
⢠ìŽìµìê°ìì°Ÿêž°ìíŽêž°ìžêž°íê°(gradient descent)ìì¬ì©
ðœ(ð) =1
2
ð=1
ð
ðŠð âð€ðð¥ð â ð 2
Minimum!!
ð ðððð ð¶ 17
Training: Gradient
â¢ ê° ë³ìì ëí ìŒì°š ížë¯žë¶ ê°ìŒë¡ 구ì±ëë 벡í°â¢ 벡í°: ð(. )ì ê°ìŽ ê°í륞 쪜ì ë°©í¥ì ëíë
⢠벡í°ì í¬êž°: ë²¡í° ìŠê°, ìŠ êž°ìžêž°ë¥Œ ëíë
⢠ìŽë€ ë€ë³ì íšì ð(ð¥1, ð¥2, ⊠, ð¥ð)ê° ìì ë, ðìgradientë ë€ì곌 ê°ì
ð»ð = (ðð
ðð¥1,ðð
ðð¥2, ⊠,
ðð
ðð¥ð)
⢠Gradient륌 ìŽì©í ë€ë³ì scalar íšì ðë ì ððì ê·Œì²ììì ì í ê·Œì¬ì (using Taylor expansion)
ð ð = ð ðð + ð»ð ðð ð â ðð + ð( ð â ðð )
ð ðððð ð¶ 18
Training: Gradient Descent
⢠Formula
ð ð+1 = ðð â ððð»ð ðð , ð ⥠0
ðð: ðððððððð ððð¡ð
⢠Algorithm
ððððð ðððð¡ ð, ð¡âððð âððð ð, ðð ð ð â ð + 1
ð â ð â ðð»ð ðððððð ðð»ð ð < 0
ðððððð ðððð
ì¶ì²: wikipedia
ð ðððð ð¶ 19
Training: Gradient Descent
min ðœ(ð) =1
2
ð=1
ð
ðŠð â ð€ðð¥ð2
ððœ(ð)
ðð€=
ð=1
ð
ðŠð â ð€ðð¥ð (âð¥ð)⢠벡í°ìëí믞ë¶
ð ð+1 = ðð â ððð»ð ðð , ð ⥠0
ð€ â ð€ â ððð
ðð€â¢ Weight update
ðììµìííë ð€ë¥Œì°ŸìëŒ!!
ð ðððð ð¶ 20
Training: Gradient Descent
(for fixed , this is a function of x) (function of the parameters )
Andrew Ng
ð ðððð ð¶ 21
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 22
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 23
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 24
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 25
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 26
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 27
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 28
(for fixed , this is a function of x) (function of the parameters )
Training: Gradient Descent
Andrew Ng
ð ðððð ð¶ 29
Training: Solution Derivation
⢠ë¶ìì ë°©ë²(analytic method)⢠ðœ(ð)륌 ê° ëªšëž íëŒë¯ží°ë€ë¡ ížë¯žë¶í íì ê·ž 결곌륌 0ìŒë¡
íì¬ ì°ëŠœë°©ì ì íìŽ
⢠ð ð¥ = ð€ð¥ + ð ìž ê²œì°ìë ëªšëž íëŒë¯ží° ð€ì ðë¡ ížë¯žë¶
ðð
ðð€=
ð=1
ð
ðŠð â ð€ðð¥ð â ð (âð¥ð) = 0
ðð
ðð=
ð=1
ð
ðŠð â ð€ðð¥ð â ð (â1) = 0
ð€ìëíížë¯žë¶
ðìëíížë¯žë¶
ð ðððð ð¶ 30
Training: Solution Derivation
ðð
ðð=
ð=1
ð
ðŠð â ð€ðð¥ð â ð (â1) = 0
ðìëíížë¯žë¶
ðð
ðð=
ð=1
ð
ðŠð â ð€ð
ð=1
ð
ð¥ð â ðð = 0
ðð
ðð=
ð=1
ð
ðŠð â ð€ð
ð=1
ð
ð¥ð = ðð
ðð
ðð= ðŠ â ð€ð ð¥ = ð
ð ðððð ð¶ 31
Training: Solution Derivation
ðð
ðð€=
ð=1
ð
ðŠð â ð€ðð¥ð â ð (âð¥ð) = 0
ð€ìëíížë¯žë¶
0 =
ð=1
ð
ðŠðð¥ð â ð€ðð¥ðð¥ð â ðð¥ð
0 =
ð=1
ð
ðŠðð¥ð â ð€ðð¥ðð¥ð â ( ðŠ â ð€ð ð¥)ð¥ð
0 =
ð=1
ð
ðŠðð¥ð â ð€ðð¥ðð¥ð â ðŠð¥ð + ð€ð ð¥ð¥ð
ð=1
ð
(ð€ð ð¥ð¥ð âð€ðð¥ðð¥ð) =
ð=1
ð
ðŠðð¥ð â ðŠð¥ð
(
ð=1
ð
ð¥ð¥ð â ð¥ðð¥ð ð€ð) =
ð=1
ð
ðŠðð¥ð â ðŠð¥ð
ð€ð =
ð=1
ð
ð¥ð¥ð â ð¥ðð¥ð
â1
ð=1
ð
ðŠðð¥ð â ðŠð¥ð
ðŠ â ð€ð ð¥ = ð
0ìê°ìê°ëìŽì ë몚ë instanceìê°ìëíëê²ê³Œíê· ì në²ëíëê²ìê°ìê°ìê°ê²íêž°ë묞
ð ðððð ð¶ 32
Training: Solution Derivation
ðð
ðð€=
ð=1
ð
ðŠð â ð€ðð¥ð â ð (âð¥ð) = 0
ð€ìëíížë¯žë¶
ð€ð =
ð=1
ð
ð¥ð¥ð â ð¥ðð¥ð
â1
ð=1
ð
ðŠðð¥ð â ðŠð¥ð
ð€ð =
ð=1
ð
ð¥ðð¥ðð â ð¥ðð¥ð + ( ð¥ ð¥ð â ð¥ð¥ð
ð)
â1
ð=1
ð
ðŠðð¥ð â ðŠð¥ð + ( ðŠ ð¥ â ðŠð ð¥)
ð€ð =
ð=1
ð
ð¥ð â ð¥)(ð¥ð â ð¥ ð
â1
ð=1
ð
ð¥ð â ð¥ (ðŠð â ðŠ)
ð€ð =
ð=1
ð
ð£ðð(ð¥ð)
â1
ð=1
ð
ððð£(ð¥ð , ðŠð)
solution
ð€ð =
ð=1
ð
ð¥ð â ð¥)(ð¥ð â ð¥ ð
â1
ð=1
ð
ð¥ð â ð¥ (ðŠð â ðŠ)
ð = ðŠ â ð€ð ð¥
ð ðððð ð¶ 33
Training: Algorithm
ð ðððð ð¶ 34
Regression: other problems
ð ðððð ð¶ 35
Regression: Multiple variables
⢠ì¹êµ¬ì ëí ì ë³Žê° ë§ì 겜ì°
í€ ëìŽ ë°í¬êž° ë€ëŠ¬êžžìŽ 몞묎ê²
ì¹êµ¬1 160 17 230 80 50
ì¹êµ¬2 165 20 235 85 50
ì¹êµ¬3 170 21 240 85 55
ì¹êµ¬4 170 24 245 90 60
ì¹êµ¬5 175 26 250 90 60
Features Label
Instance â ð
â ð¥ = ð€0ð¥0 +ð€1ð¥1 + ð€2ð¥2 + ð€3ð¥3 + ð€4ð¥4 + ð€5ð¥5Hypothesis:
ð€0, ð€1, ð€2, ð€3, ð€4, ð€5Parameters:
ð¥0, ð¥1, ð¥2, ð¥3, ð¥4, ð¥5Features:
ðŠð¥1 ð¥2 ð¥3 ð¥4
ð1
ð2
ð3
ð4
ð5
ð ðððð ð¶ 36
Regression: Multiple variables
⢠Hypothesis:
⢠Parameters:
⢠Features:
⢠Cost function:
â ð¥ = ð€ðð¥ = ð€0ð¥0 + ð€1ð¥1 + ð€2ð¥2 +â¯+ð€ðð¥ð
ð€0, ð€1, ð€2, ð€3, ð€4, ⊠, ð€ð
ð¥0, ð¥1, ð¥2, ð¥3, ð¥4, ⊠, ð¥ð
â âð+1
â âð+1
ðœ ð€0, ð€1, ⊠, ð€ð =1
2
ð=1
ð
ðŠð â â(ð¥ð)2
ð¥ =
ð¥0ð¥1ð¥2ð¥3âŠð¥ð
â âð+1 ð€ =
ð€0
ð€1
ð€2
ð€3
âŠð€ð
â âð+1
ð ðððð ð¶ 37
Multiple variables: Gradient descent
⢠Gradient descent
ððœ(ð)
ðð€=
ð=1
ð
ðŠð â ð€ðð¥ð (âð¥ð)
Standard (n=1), n: num. of features
Repeat {
}
ð€0 = ð€0 â ð
ð=1
ð
ðŠð âð€ðð¥ð
âð¥ðð â âð¥ð0 = 1
ð€1 = ð€1 â ð
ð=1
ð
ðŠð âð€ðð¥ð âð¥ð1
Multiple (n>=1)
Repeat {
}
ð€ð = ð€ð â ð
ð=1
ð
ðŠð â ð€ðð¥ð âð¥ðð
ð€0 = ð€0 â ð
ð=1
ð
ðŠð â ð€ðð¥ð âð¥ð0
ð€1 = ð€1 â ð
ð=1
ð
ðŠð âð€ðð¥ð âð¥ð1
ð€2 = ð€2 â ð
ð=1
ð
ðŠð âð€ðð¥ð âð¥ð2
âŠ
ð ðððð ð¶ 38
Multiple variables: Feature scaling
⢠Feature scaling
⢠ê°ê°ì ìì§ ê° ë²ìë€ìŽ ìë¡ ë€ëŠâ¢ í€: 160~175, ëìŽ: 17~26, ë° í¬êž°: 230~250, ë€ëŠ¬ êžžìŽ:
80~90
⢠Gradient descent í ë ìµì ê°ìŒë¡ ìë Žíëë° ì€ë걞늌
í€ ëìŽ ë°í¬êž° ë€ëŠ¬êžžìŽ 몞묎ê²
ì¹êµ¬1 160 17 230 80 50
ì¹êµ¬2 165 20 235 85 50
ì¹êµ¬3 170 21 240 85 55
ì¹êµ¬4 170 24 245 90 60
ì¹êµ¬5 175 26 250 90 60
ð ðððð ð¶ 39
Multiple variables: Feature scaling
⢠Feature scaling
⢠ìì§ ê° ë²ìê° ë묎 컀ì 귞늌곌 ê°ìŽ 믞ë¶ì ë§ìŽ íê² ëš, ìŠ iterationì ë§ìŽ ìííê² ëš
⢠ì륌 ë€ìŽâ¢ ìŽ ì ë ì°šìŽì ìì§ë€ì êŽì°®ì
â¢ ìŽ ì ë ì°šìŽì ìì§ë€ìŽ 묞ì
â0.5 †ð¥1 †0.5
â2 †ð¥2 †3
â1000 †ð¥1 †2000
0 †ð¥2 †5000
ð ðððð ð¶ 40
Multiple variables: Feature scaling
⢠Feature scaling
⢠ë°ëŒì ìì§ ê° ë²ì륌 â1 †ð¥ð †1 ì¬ìŽë¡ ì¬ì ì
Feature scaling
⢠Scaling: ð¥ð: ðððð¡ð¢ðð ððð¡ð
ðð: ððððð ðð ðððð¡ð¢ðð ððð¡ðð ðð = max ðððð¡. â min(ðððð¡. )
ðð = 230 †ð¥ð †250â range: 250 â 230 = 20
ð¥ð â ðððð ðð: ðððð ðð ðððð¡ð¢ðð ððð¡ðð
ð¥ð â 240
20ðð = 240
Example
ð¥1 = 230 â230 â 240
20= â0.5
ð¥5 = 230 â250 â 240
20= 0.5
ð ðððð ð¶ 41
Multiple variables: Feature scaling
⢠Feature scaling
⢠Feature scalingì íµíì¬ ì ê·í
⢠ê°ëší ì°ì°
⢠결êµì Gradient descentê° ë¹ ë¥Žê² ìë Ží ì ìì
ð ðððð ð¶ 42
Linear Regression: Normal equation
⢠ììì ë€ë€ë ë°©ë²ì ë€íìì ìŽì©í ë¶ìì ë°©ë²
⢠ë¶ìì ë°©ë²ì ê³ ì°š íšìë ë€ë³ì íšìê° ëë©Ž ê³ì°ìŽìŽë €ì
⢠ë°ëŒì ëìì ë°©ë²ìŒë¡ ì ê·Œ Normal equation
ë¶ìì ë°©ë²:
⢠Gradient Descent íì
ðì many iteration íì
⢠ðìŽë§ìŒë©Žì¢ìì±ë¥
Such as, ð training examples, ð features
ëìì ë°©ë²:
⢠Gradient Descent íììì
ðìmany iteration íììì
⢠ððð â1ìê³ì°ë§íì ð(ð3)
⢠ðìŽë§ìŒë©Žìëë늌
ð ðððð ð¶ 43
Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000)
1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 178
Examples:
Linear Regression: Normal equation
ð =
ð€0
ð€1
ð€2
ð€3
ð€4
⎠ðð = ðŠ
ð ðððð ð¶ 44
Size (feet2) Number of bedrooms Number of floors Age of home (years) Price ($1000)
1 2104 5 1 45 4601 1416 3 2 40 2321 1534 3 2 30 3151 852 2 1 36 1781
Examples:
Linear Regression: Normal equation
ð = ððð â1ðððŠðð = ðŠ â
ð ðððð ð¶ 45
Linear Regression: Normal equation
âð = ððð â1ðððŠâê°ì ë§ ððð ððð¢ðð2 í©ììµìë¡íë몚ëžìžê°?ìŽë»ê²ì ëíëê°?
ð = ðŠ â ðŠ â ð âðð 2
min( ð âðð 2)ìë§ì¡±íëð륌구íëŒ
⎠ðìížë¯žë¶íí 0ìŒë¡ëìŒë©Ž
â2ðð ð âðð = 0
â2ððð + 2ðððð = 0
2ðððð = 2ððð
⎠ð = ððð â1ððð
ðððð = ððð
ð ðððð ð¶ 46
References
⢠https://class.coursera.org/ml-007/lecture
⢠http://deepcumen.com/2015/04/linear-regression-2/
⢠http://www.aistudy.com/math/regression_lee.htm
⢠http://en.wikipedia.org/wiki/Linear_regression
ð ðððð ð¶ 47
QA
ê°ì¬í©ëë€.
ë°ì²ì, ë°ì°¬ë¯Œ, ìµì¬í, ë°ìžë¹, ìŽìì
ð ðððð ð¶ , ê°ìëíêµ
Email: [email protected]