4-failure mode analysis

40
Failure Analysis Failure Analysis of Engineering Systems of Engineering Systems Instructor: Professor Steve Maher Instructor: Professor Steve Maher Module 5: Module 5: Scripture of the Module Scripture of the Module Some review of Module 3 Some review of Module 3 8 – Failure Mode Assessment and Assignment 8 – Failure Mode Assessment and Assignment (FMA&A) (FMA&A) 9 – Pedigree Analysis 9 – Pedigree Analysis 10 – Change Analysis 10 – Change Analysis

Upload: aheza-desire

Post on 08-Feb-2016

34 views

Category:

Documents


2 download

DESCRIPTION

4-Failure Mode Analysis

TRANSCRIPT

Page 1: 4-Failure Mode Analysis

Failure Analysis Failure Analysis of Engineering Systemsof Engineering Systems

Instructor: Professor Steve MaherInstructor: Professor Steve Maher

Module 5:Module 5:

Scripture of the ModuleScripture of the Module

Some review of Module 3Some review of Module 3

8 – Failure Mode Assessment and Assignment (FMA&A)8 – Failure Mode Assessment and Assignment (FMA&A)

9 – Pedigree Analysis9 – Pedigree Analysis

10 – Change Analysis10 – Change Analysis

Page 2: 4-Failure Mode Analysis

Scripture of the ModuleScripture of the Module

““The plans of the diligent lead to profit The plans of the diligent lead to profit as surely as haste leads to poverty.”as surely as haste leads to poverty.”

- Proverbs 21:5- Proverbs 21:5

Failure Analysis of Engineering Systems ENGR 5323

2

Page 3: 4-Failure Mode Analysis

AssignmentAssignment

Read Chapter 8, 9, and 10 of Read Chapter 8, 9, and 10 of Systems Systems Failure AnalysisFailure Analysis

Do Quizzes on Bb as they appearDo Quizzes on Bb as they appear

Quiz next week at beginning of classQuiz next week at beginning of class

Failure Analysis of Engineering Systems ENGR 5323

3

Page 4: 4-Failure Mode Analysis

Some Module 3 Some Module 3 “Leftovers”“Leftovers”

Page 5: 4-Failure Mode Analysis

From Module 3 QuizFrom Module 3 Quiz

We have 1000 parts that have run at an average We have 1000 parts that have run at an average of 800 hours each. 20 of them have failed. of 800 hours each. 20 of them have failed. What is the failure rate?What is the failure rate?

MTBF = total service hours/# failedMTBF = total service hours/# failed

= 800*1000/20 = 40000 hr/fail= 800*1000/20 = 40000 hr/fail

Failure rate = Failure rate = λλ = 1/MTBF = 1/40000 = 0.000025 = 1/MTBF = 1/40000 = 0.000025 = 2.50E-5 = 2.5*10= 2.50E-5 = 2.5*10-5 -5 fails/hrfails/hr

Failure Analysis of Engineering Systems ENGR 5323

Page 6: 4-Failure Mode Analysis

From Module 3 QuizFrom Module 3 Quiz

For the parts in Question 17, what is the For the parts in Question 17, what is the Probability that a part will run to 1000 hours Probability that a part will run to 1000 hours without failing?without failing?

PPss = e = e--λλt t = e= e-(2.5E-5)(1000) -(2.5E-5)(1000) = 0.9753 (or 97.53% chance = 0.9753 (or 97.53% chance

of running that long without failing.of running that long without failing.

Some more info: PSome more info: PFF = 1 – P = 1 – Pss = .0247 = 2.47% = .0247 = 2.47%

chance of failing; i.e. ~25 parts will fail by 1000 chance of failing; i.e. ~25 parts will fail by 1000 hours of operation, or ~5 more between 800 and hours of operation, or ~5 more between 800 and 1000 hours.1000 hours.

Failure Analysis of Engineering Systems ENGR 5323

Page 7: 4-Failure Mode Analysis

From Module 3 QuizFrom Module 3 Quiz

Fig 7.1: Event B has a failure rate of 10Fig 7.1: Event B has a failure rate of 10 -4-4. The . The part is operated for 100 hours. The probability of part is operated for 100 hours. The probability of event C happening is .005. What is the event C happening is .005. What is the probability that command event A will occur?probability that command event A will occur?

OR gate, so POR gate, so PAA = P = PBB + P + PCC – P – PBB*P*PCC. .

PPBB = 1 – e = 1 – e--λλt t = 1 – exp[-(10= 1 – exp[-(10-4-4)(100)] = 0.00995)(100)] = 0.00995

PPAA = .00995 + .005 – (.00995)(.005) = .0149 = .00995 + .005 – (.00995)(.005) = .0149

= .015 or 1.5% chance of Event A happening.= .015 or 1.5% chance of Event A happening.

Failure Analysis of Engineering Systems ENGR 5323

Page 8: 4-Failure Mode Analysis

From Module 3 QuizFrom Module 3 Quiz

Fig 7.3: The system is operated for 100 hours. Fig 7.3: The system is operated for 100 hours. The failure rate for B is 2x10The failure rate for B is 2x10-5-5. The failure rate . The failure rate for C is 5x10for C is 5x10-6-6. What is the probability that . What is the probability that command event A will occur?command event A will occur?

AND gate, so PAND gate, so PAA = P = PBB*P*PCC. .

PPBB = 1 – e = 1 – e--λλt t = 1 – exp[-(2x10= 1 – exp[-(2x10-5-5)(100)] = ~0.002)(100)] = ~0.002

PPCC = 1 – e = 1 – e--λλt t = 1 – exp[-(5x10= 1 – exp[-(5x10-6-6)(100)] = ~0.0005)(100)] = ~0.0005

PPAA = .002*.0005 = ~10x10 = .002*.0005 = ~10x10-7-7 = ~1x10 = ~1x10-6-6 or about or about

0.0001% chance of Event A happening.0.0001% chance of Event A happening.Failure Analysis of Engineering Systems

ENGR 5323

Page 9: 4-Failure Mode Analysis

Failure Mode Assessment Failure Mode Assessment and Assignment (FMA&A)and Assignment (FMA&A)

Page 10: 4-Failure Mode Analysis

Berk’s Overall FA ProcessBerk’s Overall FA Process

Failure Analysis of Engineering Systems ENGR 5323

10

Designate a team

Gather all related information

Review and define problem

Identify all potential failure causes

List causes in FMA & A

Converge on root cause

Determine Corrective Actions

Implement Corrective Actions

Assess Corrective Actions

Evaluate for Preventive Actions

Incorporate FA Findings

Page 11: 4-Failure Mode Analysis

What is FMA&A?FMA&A = Failure Mode Assessment and Assignment

It is a tool to help manage the evaluation of each of the hypothesized failure causes.

It is generally a table – textbook has 4 columns:– Event number– Description of each hypothesized failure cause– Likelihood assessment of each cause (updated as data

becomes available)– Actions necessary to evaluate the cause and status of the

evaluation (sometimes separate columns).

See Table 8.1: FMA&A for light bulb example

Spreadsheet (e.g. MS Excel) is an excellent tool for this; can use word-processing tool (e.g. MS Word)

Failure Analysis of Engineering Systems ENGR 5323

11

Page 12: 4-Failure Mode Analysis

Hypothesized Failure CausesEach row of the table is a hypothesized failure cause

Each of the causes is briefly described

Can develop hypothesized causes with any method, then “map” them to a row/column

List causes or inducing events only– In FTA terms, do not list command events– Focus on basic failures, human errors, normal events,

inhibiting conditions, and undeveloped events (using FTA terms)

Typically a repeat from previous activity of identifying potential causes (described in Modules 2-3)– Easier to work with table than with diagram(s)– Saves time, less confusion

Failure Analysis of Engineering Systems ENGR 5323

12

Page 13: 4-Failure Mode Analysis

Event Number Column

Each of the hypothesized failure causes is numbered

This is used for tracking and organizational purposes

Can develop hypothesized causes with any method– Textbook uses FTA example– Each team or individual can choose numbering system

List and assign numbers to causes or inducing events only– In FTA terms, do not list/assign command events– Focus list/number on basic failures, human errors, normal

events, inhibiting conditions, and undeveloped events (using FTA terms)

Failure Analysis of Engineering Systems ENGR 5323

13

Page 14: 4-Failure Mode Analysis

Assessment ColumnDescribes assessment for each of the hypothesized failure causes

Default for each cause is “Unknown”

As analysis proceeds, each cause will be updated using terms such as – “Unlikely” = evaluation showed no problem or no lead– “Likely” = cause likely found but not conclusive, or – “Confirmed” = we found it! (At least the objective evidence

indicates it)

For FA (i.e. the failure has already occurred), Probabilities do not really matter that much…– Can be used as a guide to prioritize evaluation– Should NOT be used by itself to update status

Failure Analysis of Engineering Systems ENGR 5323

14

Page 15: 4-Failure Mode Analysis

Assignment ColumnDefines action necessary to evaluate each of the hypothesized failure causes

Review each hypothesized cause (i.e. row by row) and determine actions necessary to evaluate it– Evaluation needs to be objective (i.e. fact-based), not

subjective (i.e. opinion-based)– Focus on ruling the cause in or out– Be careful!

of jumping to conclusions

of ruling causes out too quickly and without evaluation

Best if ONE owner and a DUE DATE for the action(s)

Status is updated as the actions are completed (in this column or an additional one)

Failure Analysis of Engineering Systems ENGR 5323

15

Page 16: 4-Failure Mode Analysis

Point of EmphasisPoint of Emphasis

Do not touch any hardware or software from the failed system until you have defined an organized, systematic, and objective manner in which to proceed

Failure Analysis of Engineering Systems ENGR 5323

16

Page 17: 4-Failure Mode Analysis

Follow-On Activities (Team)Meet regularly – determined by priority, severity, and urgency– High profile: at least daily– Low profile: at least weekly is recommended– Use FMA&A to guide the meeting

Execute actions that are assigned and update status– Include findings in Assignment column– Assessment and Assignment changes based on the data– Clearly indicate items completed and ruled out (e.g. shading

the row)

Distribute updates to team and stakeholders on a regular basis (e.g. after each team meeting)

Failure Analysis of Engineering Systems ENGR 5323

17

Page 18: 4-Failure Mode Analysis

Individual ApproachesSuggest you use FMA&A or a similar format – Some organizations use an FA Log or similar tool– You want something to capture your thoughts, planned

actions, and status updates

Update someone regularly – determined by priority, severity, and urgency– Go no more than 2 weeks – recommend weekly– Use FMA&A or FA Log to guide the meeting

Execute planned actions and update status– Document findings as you go– Adjust plans based on the data– Clearly indicate items completed and ruled out

Have updates ready to distribute when needed

Failure Analysis of Engineering Systems ENGR 5323

18

Page 19: 4-Failure Mode Analysis

Performing the Evaluations

Pedigree analysis (Ch 9)

Change analysis (Ch 10)

Analytical equipment (Ch 11)

Mechanical and electronic component failures (Ch 12)

Leaks (Ch 13)

Contamination (Ch 14)

Design Analysis (Ch 15)

Statistical Considerations (Ch 16)

Design of Experiments (Ch 17)

Failure Analysis of Engineering Systems ENGR 5323

19

Page 20: 4-Failure Mode Analysis

Group ActivityGroup Activity

Page 21: 4-Failure Mode Analysis

* Discuss Scenario (pg. 72-73)* Discuss Scenario (pg. 72-73)* Document answers to * Document answers to questions in a filequestions in a file* Email file to me (one per team)* Email file to me (one per team)*After emailing, take a 5-10 min *After emailing, take a 5-10 min breakbreak* Re-convene about ____* Re-convene about ____

Page 22: 4-Failure Mode Analysis

Which approach does your organization (or do you) Which approach does your organization (or do you) follow? follow?

Do you think your failure analysis approach needs to Do you think your failure analysis approach needs to change? change?

If so, what can you do to initiate a change?If so, what can you do to initiate a change?

Failure Analysis of Engineering Systems ENGR 5323

Page 23: 4-Failure Mode Analysis

Pedigree AnalysisPedigree Analysis

Page 24: 4-Failure Mode Analysis

Berk’s Overall FA ProcessBerk’s Overall FA Process

Failure Analysis of Engineering Systems ENGR 5323

24

Designate a team

Gather all related information

Review and define problem

Identify all potential failure causes

List causes in FMA & A

Converge on root cause

Determine Corrective Actions

Implement Corrective Actions

Assess Corrective Actions

Evaluate for Preventive Actions

Incorporate FA Findings

Page 25: 4-Failure Mode Analysis

Overall Process Flow for Overall Process Flow for Diagnosing Root Cause of a FailureDiagnosing Root Cause of a Failure

Failure Analysis of Engineering Systems ENGR 5323

25

Confirm the Failure

Characterize the Failure

Isolate the Failure

Isolate the Defect

Identify the Defect

Determine Root Cause

Page 26: 4-Failure Mode Analysis

What is a Pedigree?And How Do You Analyze it?

Product or System Pedigree = Essentially the history of the product– Describes design of product– How it was built– That it was built in accordance to spec’s, codes, etc.

Documents in a pedigree:– Records of how it was built– Records of material used– Conformance to drawing and material requirements

Will a suspect condition be revealed by an analysis of the pedigree?

Failure Analysis of Engineering Systems ENGR 5323

26

Page 27: 4-Failure Mode Analysis

Value of Reviewing the Pedigree

If it addresses the suspect area, examine the pedigree to see if there is something suspicious – Anomalies in test results– Non-conformities found in inspections– Missing items or documents

If it does not address the suspect area, maybe the pedigree should– Recommend for future builds– Can be part of corrective action to prevent future failures

Failure Analysis of Engineering Systems ENGR 5323

27

Page 28: 4-Failure Mode Analysis

Examining the Pedigree

Purchase orders

Nonconformance documentation

Inspection records

Test data

Calibration data

Drawings and specifications

Drawing changes

Work instructions

Certificates of conformance

Failure Analysis of Engineering Systems ENGR 5323

28

Page 29: 4-Failure Mode Analysis

Surprisingly…

Shipped systems do not always meet all of its requirements– Many pedigree reviews reveal that the product had/has a

problem– In some cases, pedigree directly related to the failure

Not necessary to check the ENTIRE pedigree– That may be a massive undertaking– Only review areas that relate to the hypothetical causes

Pedigree can be suspect – Errors, omissions, or even fraud can happen– Certificates of conformance are not a guarantee

Failure Analysis of Engineering Systems ENGR 5323

29

Page 30: 4-Failure Mode Analysis

Example: Tragedy in HawaiiTour Plane caught fire and crashed

Oil leaking into engine caused the fire

Oil filter gasket had melted

Gasket was made of wrong material

Maintenance, filter spec, and gasket spec were fine…

Gasket manufacturer noted different material

Certificate of conformance was missing

Gasket packing slip and certificate did not match

Mismatch slipped through and nonconforming oil gasket was used, leading to the accident

Failure Analysis of Engineering Systems ENGR 5323

30

Page 31: 4-Failure Mode Analysis

Non-conformance Does HappenAnomalous Certificates of Conformance are not uncommon– Typically not outright fraud– Most are human error

Sometimes nonconforming material or system ships anyway

Sometimes everything looks fine but something is suspicious– Follow-up independent verification may be needed – Additional inspections, testing, etc. can be sought

Failure Analysis of Engineering Systems ENGR 5323

31

Page 32: 4-Failure Mode Analysis

Change AnalysisChange Analysis

Page 33: 4-Failure Mode Analysis

What is Change Analysis?

If a system WAS working, what changed?

Need to determine if a change occurred and if the change induced the failure.

Options:– Nothing changed!– Failure was happening, but not observed– Failure occurs within normal statistical variation– Change occurred, but unrelated to failure– A change induced the failure

Failure Analysis of Engineering Systems ENGR 5323

33

Page 34: 4-Failure Mode Analysis

Things That Can Change

Design

Manufacturing Process

Test and Inspection

Environment

Lot Changes (Manufacturing variation)

Aging

Supplier Changes

Failure Analysis of Engineering Systems ENGR 5323

34

Page 35: 4-Failure Mode Analysis

Design Changes

Controlled design change

“Redlined” design change

Rejected material: “use as is” or “repair”

Outsourced components and subassemblies

Failure Analysis of Engineering Systems ENGR 5323

35

Page 36: 4-Failure Mode Analysis

Process Changes

Work or Build Instructions– Many companies do not have instructions– Imprecise work instructions– Little rigor on changes– Changes to equipment, tooling, settings not documented– People not following the instructions

Investigate the documentation for changes

Investigate for non-documented changes

SPC can help and provide a starting point

Failure Analysis of Engineering Systems ENGR 5323

36

Page 37: 4-Failure Mode Analysis

Test and Inspection Changes

Investigate for issues in testing– Failures returned to manufacturing– Reworked systems or components– Results out of the ordinary– Changes in the test process

Investigate inspection processes and results– Change of inspectors– Change of instructions– Items noted but system continued anyway

Changes in test or inspection equipmentFailure Analysis of Engineering Systems

ENGR 532337

Page 38: 4-Failure Mode Analysis

Environmental Changes

Temperature and Humidity issues– Curing or dying of materials– Non-environmentally controlled processes– Investigate if failure correlates to temp/humidity

Storage– Investigate changes in environment or procedure– Epoxies and raw material may be sensitive– Moving locations can induce changes

Shipping

Failure Analysis of Engineering Systems ENGR 5323

38

Page 39: 4-Failure Mode Analysis

Lot and Supplier Changes

Manufacturing has normal variation– Sometimes failures correlate to supplier lots– May be related to material distributions– Investigate if failure correlates to supplier lots – Need to understand supplier’s processes

Aging

Suppliers can change materials or designs – Purchased supplies may still meet spec’s– Investigate for changes that affect system– Can be difficult and sensitive to get information

Failure Analysis of Engineering Systems ENGR 5323

39

Page 40: 4-Failure Mode Analysis

Example from Textbook:Example from Textbook:CBU-87/B Cluster BombCBU-87/B Cluster Bomb