1 monitoring nonlinear profiles with random effects by nonparametric regression jyh-jen horng shiau...

79
1 Monitoring Nonlinear Profiles with Random Effects by Nonparametric Regression Jyh-Jen Horng Shiau Institute of Statistics National Chiao Tung University ( 交交交交交交交 交交交 ) Sept. 25, 2009 NCTS Industrial Statistics Re search Group Seminar

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

1

Monitoring Nonlinear Profiles with Random Effects by

Nonparametric Regression

Jyh-Jen Horng Shiau Institute of Statistics

National Chiao Tung University(交通大學統計所 洪志真 )

Sept. 25, 2009

NCTS Industrial Statistics Research Group Seminar

2

Outline Introduction Linear Profile Monitoring

Fixed Effects vs. Random Effects

Nonlinear Profile MonitoringParametric Regression vs. Nonparametric Regression

• Fixed Effects vs. Random Effects

– Phase I Monitoring– Phase II Monitoring– Examples

Conclusions

3

Introduction

4

SPC: Variables vs. Profiles

Classical SPC: using one or multiple quality characteristics (a single univariate or multivariate variable) to measure the process quality.

However, in many situations, the response of interest is not a single variable but a function of one or more explanatory variables. This functional response is called a profile.

• Profile Monitoring

5

Other Terms for Profiles Waveform Signal (Jin and Shi, 2001) Signature (Gardner et al., 1997)

Example: Vertical Board Density Profile Data from Walker and Wright (JQT, 2002)

24 profiles of vertical density, each profile consists of 314 measurements.

6

explanatory variable (X)

resp

on

se

(Y)

j = 1 j = 2 j = k

j = 1,2,…,k sample profiles, n>2 observations in each profile

………

………

time

n=10

The objective is to monitor functional data over time.

Profile Monitoring

7

Applications: Dissolving Process of Aspartame

An example of a product characterized by a profile is aspartame, an artificial sweetener. An important characteristic of the product is the amount of aspartame that can be dissolved per liter of water at different temperatures. (Kang and Albin, 2000).

8

Semiconductor Gateoxide Thickness Surface

By Gardner et al. (1997)

Fig. (a) shows the gateoxide thickness surface of a wafer that was processed under fault-free conditions. Fig. (b) shows the gateoxide thickness surface of a wafer processed under known equipment faults.X and Y are the distances from the center of the wafer. Not only there is an apparent decrease in thickness between the two surfaces (from (a) to (b)), but also a change in spatial pattern.

9

Tonnage Stamping Process

The figure shows the complicated profile form of a Stamping Force Profile given by Jin and Shi (1999). Different local features are needed to be monitored in each interval. Jin and Shi used the term waveform signals to refer to profiles.

10

Bioassay Dose Response Curve

A dose response curve, given by Williams, Birch, Woodall, and Ferry (2006), according to different doses and different time periods.

of untreated specimen

at sampling period iM median

i

Ith profile, jth dose, kth replication

12

Two Approaches

Parametric RegressionFit a parametric model of known form to each profile

Monitor each parameter with a separate chart or Use a multivariate chart based on the vectors of

parameter estimates.

Nonparametric RegressionSmooth each profile data

Use various metrics to detect changes in profile shape.

13

Linear Profile Monitoring

Fixed-effect model

where are independent and normally

distributed with mean 0 and variance .

Kang and Albin (2000)Kim, Mahmound, and Woodall (2003)Mahmound and Woodall (2004)Mahmound (2004)

0 1 , l hY A A X X X X

2

14

Linear Profiles with Fixed-Effect Model

Kang and Albin (2000)

Monitor slope and intercept jointly with multivariate chart.

Treat the residuals between the sample and reference lines as a rational subgroup and monitor residuals with a combined EWMA/R chart.

Phase-I statistics are dependent and thus control limits can not be determined directly from marginal distributions.

2T

15

Linear Profiles with Fixed-Effect Model

Kim et al. (2003) considered the same model but coded the X-values by centering so that the least square estimators of intercept and slope are independent.

3EWMA

A two-sided EWMA to monitor intercept

A two-sided EWMA to monitor slope

A one-sided EWMA to monitor error variance

16

Why Random Effects? Under the fixed-effect model, the batch effect,

the change of humidity or temperature, the characteristics of the measuring equipment, etc., are all included in the error term, which may not be appropriate since these time-varying factors may affect the values of the intercept and slope of the linear profile.

By their nature, these hard-to-control factors should be considered as common causes of variations.

Should allow profile-to-profile variations.

17

For the ith observation of the jth profile, assume

where ~ N( , ),

~ N( , ), ~ N(0, ),

, , and are mutually independent.

0.x

ij

0 20

1 21 ij

2e

Let the set points be pre-coded so that

0ˆ ,j jy 1 ( )ˆ ,j xy j xxS S 2 2

1

ˆ ˆ( ) ( 2).n

ej ij iji

y y n

( )1 1

( ) ,n n

xy j i ij j i iji i

S x y y x y

2

1

n

xx ii

S x

where

Linear Profiles with Random Effects(Shiau, Lin, and Chen, 2006)

0 1 ,ij j j i ijY A A x 0 jA

1 jA

0 jA 1 jA

ix

18

A Simulated Example

0 1 ,ij j j i ijy A A x

~N(3, 0.09), 0where jA ~N(2, 0.09)1 jA ~N(0, 1).ijand

-30 -20 -10 0 10 20

-50

0

50

Xi

yij

19

Phase II Method

In Phase II, usually assume that2 2 2

0 1 0 1, , , , e are known.

Since 2

0 1ˆ ˆ ˆ, ,j j ej are mutually independent, set the individual

The control limits of 2

0 1, , e are

*0 0 02

UCL Z s *

0 0 02LCL Z s

*1 1 12

UCL Z s *

1 1 12LCL Z s

2 *

22

, 22e

en

UCLn

false-alarm rate of each chart at * 1 31 (1 )

.to achieve an overall false-alarm rate at

Adopt the combined-chart approach.

20

When the Fixed-effect Model is Mistakenly Used

Real (random effect) Misuse (fixed effect)

Upper control limit of A0 4.101054 3.469491

Lower control limit of A0 1.898946 2.530509

Upper control limit of A1 2.996472 2.032534

Lower control limit of A1 1.003528 1.967466

Upper control limit of error variance

1.759881 1.759881

ARL0 370.370370 1.078406

Using the wrong model causes incredibly many false alarms! ( 92.73 % are false alarms!)

21

Linear Profiles with Random Effects Phase I --- Estimation

Under the random-effect model with coded xi’s,

are i.i.d. 11 1ˆ ˆ(ii) ,..., k

01 0ˆ ˆ(i) ,..., k 2 20 0( , ).eN n are i.i.d.

2 21 1( , ).e xxN S

2 21

2 2

ˆ ˆ( 2) ( 2)(iii) ,..., are i.i.d.e ek

e e

n n 2

2.n

These statistics are mutually independent.

22

20 0

0 20

ˆ ˆ( )

ˆj

jTs

21 1

1 21

ˆ ˆ( )

ˆj

jTs

2

2

ˆ

ˆej

eje

T

The three monitoring statistics :

Phase-I Monitoring Statistics

0 01

1ˆ ˆ ,

k

jjk

2 2

1 1

ˆ ˆ( ) ( 2)k n

e ij ijj i

y y k n

2 2 20 0 es n

2 21 1 1

1

1ˆ ˆˆ ( ) ,

1

k

jj

sk

1 11

1ˆ ˆ ,

k

jjk

2 20 0 0

1

1ˆ ˆˆ ( ) ,

1

k

jj

sk

2 2 21 1 .e xxs S

where

23

Bonferroni Method

/ ,k

0 jTUCL2

1 2, ,2 2

( 1)k

kBeta

k

1 jTUCL2

1 2, ,2 2

( 1)k

kBeta

k

ejTUCL 2 ( 1)( 2), ,

2 2

n k nkBeta

Control limits using the Bonferroni Method:

where 1 31 (1 )k

profiles in the Phase I historical data set.

If we control each individual false-alarm rate at level

then the overall false-alarm rate for the profiles is controlled at level .

k

k

24

Evaluation Criteria for Phase I Methods

Main concern in evaluating Phase I methods Effectiveness in detecting out-of-control profiles

correctly. Commonly used criterion – “signal probability” Include both true and false alarms.

Proposed criteria “True-alarm rate” – the rate of detecting real out-of-

control profiles. “False-alarm rate” – the rate of claiming in-control

profiles out of control.

25

Bonferroni vs. Multiple FDR

The Multiple FDR method An extension of FDR (False Discovery rate)

The Multiple FDR method is better than the Bonferroni method in terms of detecting power, especially when there are more out-of-control profiles in the historical data.

The tradeoff is the slightly larger false-alarm rate, but still very small (less than 0.003).

26

Other Related Works

Mahmoud A. Mahmoud and William H. Woodall (2004). “Phase I Analysis of Linear Profiles with Calibration Applications”. Technometrics, Nov. 2004.

Mahmoud A. Mahmoud, Peter A. Parker, William H. Woodall and Douglas M. Hawkins (2006). “A Change Point Method for Linear Profile Data”. Qual. Reliab. Engng. Int. 2006.

CHRISTINA L. STAUDHAMMER, VALERIE M. LEMAY, ROBERT A. KOZAK, and THOMAS C. MANESS (2005). ” MIXED-MODEL

DEVELOPMENT FOR REAL-TIME STATISTICAL PROCESS CONTROL DATA IN WOOD PRODUCTS MANUFACTURING”. FBMIS Volume 1, 2005, 19-35.

Wang, K. and Tsung, F. (2005). “Using Profile Techniques for a data-rich Environment with Huge Sample Size”. Quality and Reliability Engineering International, 21, 7, 677-688.

WILLIS A. JENSEN, JEFFREY B. BIRCH, and WILLIAM H. WOODALL (2006). “Profile Monitoring via Linear Mixed Models” JSM 2006 Online Program.

27

Nonlinear Profile Monitoring

by Parametric Regression

28

Related Works

Jensen, W. A. Woodall, W. H, and Birch, J. B.(2003). "Phase I Monitoring of Nonlinear Profiles".

Ding, Y., Zeng, L., and Zhou, S., (2005). “Phase I Analysis for Monitoring Nonlinear Profile signals in Manufacturing Processes”, Journal of Quality Technology, 38(3), 199-216.

WILLIS A. JENSEN and JEFFREY B. BIRCH (2006). “Profile Monitoring via Nonlinear Mixed Models”. Technical Report.

J. D. Williams, J. B. Birch, W. H. Woodall, and N. M. Ferry (2006). “Statistical Monitoring of Heteroscedastic Dose-Response Profiles from

High-throughput Screening”, JSM 2006 Online Program.

Shiau, J.-J. H., Yen, C.-L., and Feng, Y.-W. (2006). “A New Robust Phase I Analysis for Monitoring of Nonlinear Profiles. Technical Report.

29

Nonlinear Profile Monitoring via

Nonparametric Regression

Fixed EffectsRandom Effects

30

( ) , 1,..., ,i i iy m x i n

Nonparametric Regression

i

,1( ) , 1,..., .

b

i l l k i ily c B x i n

lc

Consider the following nonparametric regression model:

where m(x) is a smooth regression curve and ’ s are i.i.d.

normal variates with mean zero and common variance .

With B-spline regression, the model is replaced by:

lc is the unknown B-spline coefficient of the lth B-spline basis to be estimated from data.

Estimate by 2

,1 1

minn b

i l l k ic

l l

y c B x

31

Nonparametric Fixed-Effect ModelShiau and Weng (2004)

Simulated example:

where are fixed constants.

Apply B-spline regression to each sample profile.

2( 1)( ) N xm x I Me , ,I M N

, 1, 2,..., .ij ij ije y y i n 1 .

n

ijij

ee

n

Monitor mean shifts EWMA chart

The EWMA statistic of the jth profile with smoothing constant :

1(1 ) ,j j jz e z

32

where

.

where

.

where

.

where

.

The R statistic of the jth profile (use range) :

max ( ) min ( ).j i ij i ijR e e

/ .j js e n b

The EWMSD statistic:

1(1 )j j jv s v

Monitor variation change R chart

Another chart for variation change EWMSD

where

Nonparametric Fixed Effect Model

33

Fixed vs. Random Effects

Fixed-effect model No profile-to-profile (subject-to-subject) variation The function is a fixed function, same for each profile.

Random-effect model There exists profile-to-profile variation caused by common causes.The profile function is a random function.Profiles are modeled as realizations of a stochastic process with a mean curve and a covariance function.

( )m x

( )m x

34

Nonparametric Random-Effect Model

• Shiau, J.-J. H., Huang, H.-L., Lin, S.-H., and Tsai, M.-Y. (2009). “Monitoring Nonlinear Profiles with Random Effects by Nonparametric Regression”. Communications in Statistics-Theory and Methods. 38, 1664-1679.

35

Adopt the random-effect model to provide more variability we often observe in many profile data.

Motivated example: aspartame Original model to generate aspartame profiles:

where2~ (1, 0.2 )iI N

~ (15, 1)iM N2~ ( 1.5, 0.3 )iN N

2~ (0, 0.3 )ij N

2( 1)i jN x

ij i i ijY I M e 1,... ; 1,...i n j p

Nonparametric Random-Effect Model

i.i.d.

i.i.d.

i.i.d.

i.i.d.

Random Variables !

Represent common-cause variations among profiles.

36

37

Original Model

38

Problems with the Original Model

• Not Gaussian

• Covariance matrix depends on

• Too complicated to analyze

and M N

39

Stochastic Gaussian Process Model

• Gaussian process with• Mean function • Covariance function• In-control process

( , )G s t

40

Out-of-control Process

• When the mean function is shifted, say,

41

Data Smoothing

~ ( , )N μ '

1 2where . . . and 1,...,i i ipY Y Y i n

represents the profile-to-profile variations

Preprocessing: Smooth each profile. Smoothing splines or B-spline regression Other smoothing techniques: kernel smoothing, local polynomial smoothing, wavelets

After sample profiles are de-noised (i.e., to eliminate the effects of ), we have the smoothed profiles :

42

Principal Component Analysis (PCA)

Method: Apply principal component analysis (PCA) on to obtain the principal modes of variations

PCA is to find an orthogonal matrix such that

Eigen-analysisEigenvectors are principal components

' ', where I and is diagonal B B BB

B

43

Phase I Monitoring (1)

• A set of historical profiles is available• Smooth each profile

• Apply PCA to sample covariance matrix – Eigenvectors are principal components (PC)– – PC-score

n

44

Phase I Monitoring (2)

• Select “effective” principal components by– Total variation explained

• Choose the first K such that

reaches a desired level

– Parsimoniousness

• Score vector of the th profilei

1( ,..., ) 'i iKS S

K

45

Phase I Monitoring (3)

• Hotelling statistics

• The usual sample mean and sample covariance matrix of the score vectors

2T

46

Phase I Monitoring (4)

• Since score vectors are asymptotically multivariate normal, we have

• upper control limit of chart:2

11 , ,

2 2

( 1)K n K

n

nUCL Beta

n

2T

47

Phase I Monitoring (5)

Note that the monitoring statistics across curves in Phase-I are not independent. So the prescribed overall false-alarm rate cannot be achieved by the marginal distribution of the monitoring statistics.

We can adopt the Bonferroni approach to control the overall false-alarm rate (i.e., type I error) at level .

48

In Phase II, we usually assume that is known. In practice, is estimated by the sample

covariance matrix of Phase-I in-control profiles. Apply PCA to to obtain eigenvalues

and eigenvectors

Choose K effective PCs

00

Phase II Monitoring (1)

0

1,..., Kv v

1,..., pv v

49

Phase II Monitoring (2)

Now for the new incoming profile First smooth then project it onto the K PCs

to obtain K independent PC-scores:

If the process is in control

0( ' , )r r rS N v

50

Individual PC-score Charts

rth PC-score chartMonitoring Statistic:

Control limits:

rS

51

A Combined Chart• Signals when any of the K individual charts signals• Equivalent to monitoring the statistic:

• Control limits:

where individual false alarm rate is set at

so that overall false alarm rate is 1/' 1 (1 ) K

52

A Chart

• Monitoring statistic

• Follows chi-square distribution with K degrees of freedom

• Upper control limit:

2T

53

Performance Evaluation for Phase II

• Average Run Length (ARL)

• Mean shift from

• probability of detecting the shift

• ARL =1/p

to 0 0μ μ +δ

:p

54

• Individual chart

• Combined chart

• A chart2T

55

In 50 curves, there is one outlier with shifted from 1 to 1+5*0.2.

I

More PC-scores More detecting power?

Det

ectin

g P

ower

Per

cent

age

of E

xpla

natio

n

56

A Simulated Aspartame Example--Phase I Monitoring

57

58

59

ARL Comparisons--Phase II Monitoring

60

A Case Study--VDP Example

61

62

Conclusions

• We propose and discuss monitoring schemes for nonlinear profiles based on PCA:– Phase I

• Hotelling control chart– Phase II

• individual PC-score charts• combined chart • chart

2T

2T

63

Conclusions

• When the shift corresponds to a mode of variation that a particular principal component represents – use the individual PC-score chart for better

power

• Unfortunately, this ideal situation is rare in practice.

64

Conclusions

• The chart performs somewhat better than the combined chart in terms of the average run length, but not too far off.

• However, by providing charts for all of the effective components, the combined chart gives more clues for finding assignable causes than the chart.

2T

2T

65

Conclusions• Degree of smoothness in the data smoothing

step has a great impact on the result of the subsequent PCA step.

• High degree of smoothness leads to high total explanation power of the first few principal components– For B-spline regression, # B-spline bases = #

principal components with nonzero eigenvalues.

• If the underlying profiles (i.e., with no noises) are fairly smooth, then the data dimension can be well reduced by PCA.

66

Conclusions

• Profile monitoring has become a popular and promising area of research in statistical process control in recent years.

• At the same time, functional data analysis (FDA) is also gaining lots of attentions and applications.

• We believe many techniques developed for FDA may be extended to developing new profile monitoring techniques in SPC.

67

More Recent Works

• Two master theses of 2009

• Monitoring profiles by their Data Depths of PC-scores

69

Other Related Works

JIN, J. and SHI, J. (2001). “Automatic Feature Extraction of Waveform Signals for In-Process Diagnostic Performance Improvement”. Journal of Intelligent Manufacturing 12, 257-268.

LADA,E.K.; LU, J. –C.; and WHSON, J.R. (2002) “A Wavelet-Based Procedure for Process Fault Detection”. IEEE Transactions on semiconductor Manufacturing 15, 79-90.

M. K. JEONG, J.-C. LU and N. WANG (2006). “Wavelet-Based SPC Procedure for Complicated Functional Data”. International Journal of Production Research, Vol. 44, No. 4, 729–744.

Shiyu Zhou, Baocheng Sun, and Jianjun Shi (2006). “An SPC Monitoring System for Cycle-Based Waveform Signals Using Haar Transform”. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, VOL. 3, NO. 1

70

71

72

73

74

Phase I Monitoring of Nonlinear Profiles

James D. Williams, William H. Woodall and Jeffrey B. Birch, 2003

•Method 1 (sample covariance matrix) does not take into account the sequential sampling structure of the data:

The overall probability of detecting a shift in the mean vector will decrease (See Sullivan and Woodall, 1996) Should not be used

•Method 2 (successive differences) accounts for the sequential sampling scheme, and gives a more robust estimate of the covariance matrix

•In the VDP example, both Methods 1 and 2 gave same result because No apparent shift in the mean vector

There were only about two outliers

•Method 3 (intra-profile pooling) should be used when there is no profile-to-profile common cause variability

•Comparison of the three methods: Method 1 assumes all variability is due to common cause Method 3 assumes that no variability is due to common cause Method 2 is somewhere in the middle

75

76

77

78

79

80

81