hva er rams, og hvordan bruke rams- metodikk i ... · pdf filehva er rams, og hvordan bruke...

53
1 Hva er RAMS, og hvordan bruke RAMS- metodikk i vedlikeholdsplanlegging? Definisjoner, krav, RAMS-mål og metoder RAMS ifm vedlikeholdsplanlegging rullende materiell Jørn Vatn NTNU

Upload: ledung

Post on 18-Mar-2018

222 views

Category:

Documents


2 download

TRANSCRIPT

1

Hva er RAMS, og hvordan bruke RAMS-metodikk i vedlikeholdsplanlegging?

Definisjoner, krav, RAMS-mål og metoderRAMS ifm vedlikeholdsplanlegging rullende materiell

Jørn VatnNTNU

2

Basic definitions

RAMS = Reliability, Availability, Maintainability and Safety

In Norwegian “Sikkerhet, pålitelighet ogvedlikeholdstilpasning”

3

Def: Apportionment (fordeling av systemkrav)

A process whereby the RAMS elements for a system are sub-divided between the various items which comprise the system to provide individual targets.

In this definition the term “RAMS elements” can usually be interpreted as “targets” or “requirements” for Reliability, Availability, Maintainability and Safety. The overall RAMS targets (e.g. risk acceptance criteria) has to be apportioned to the individual system elements in order to enable these elements to be constructed in a way that allows the overall target to be achieved

4

Def: Availability (Tilgjengelighet)

The ability of a product to be in a state to perform a required function under given conditions at a given instant of time or over a given time interval assuming that the required external resources are provided.

Availability is related to specific failed states/failure-modes (see fig.3 of EN 50126-1) of functions that the system is supposed to provide. Considering only the subset of safety-related failure modes the direct influence of availability on safety becomes obvious.Availability depends on the quality and the design of a system and is related to the ratio of the mean time to maintain (restore) MTTM and the mean time between failures MTBF.Prior to the determination of the availability the system boundaries have to be defined to be able to decide whether external resources (e.g. the supplied power) are part of the system.

5

Def: Failure rate (sviktintensitet)

The conditional probability of failure in (t, t+Δt) given the product has not failed up to time t divided by Δt.

Notation: z(t) = failure ratez(t)⋅Δt ≈ Probability of failurein the next time interval (Δt )given survival up to time tz(t) = ”bath tub curve”

6

Def: Effective failure rate

The effective failure rate, λE(τ), is the expected number of failures per unit time given that the product is maintained at intervals of length τ.

Note that the failure rate, z(t), is an inherent property of the product and not affected by the maintenance, whereas the effective failure rate, λE(τ), depends on the preventive maintenance program

7

Preventive maintenance and the bath tub curve

By preventively replacing the unit at age τ, the effective failure rate λE(τ) is reducedShorter replacement intervals (τ ), gives a lower failure rateThe challenge is to balance lower failure rate against increased maintenance and other negative effects

8

Def: Hazard (farekilde, faresituasjon)

A physical situation with a potential for human injury (EN 50126)A condition that could lead to an accident (EN 50129)

9

Def: Maintenance (vedlikehold)

The combination of all technical and administrative actions, including supervision actions, intended to retain a product in, or restore it to, a state in which it can perform a required functionPreventive maintenance (forebyggende vedlikehold)

The maintenance carried out at predetermined intervals or according to prescribed criteria and intended to reduce the probability of failure or the degradation of the functioning of an item

Corrective maintenance (korrektivt vedlikehold)The maintenance carried out after fault recognition and intended to put an item into a state in which it can perform a required function

10

Def: Maintainability (Vedlikeholdstilpasning)

The probability that a given active maintenance action, for an item under given conditions of use can be carried out within a stated time interval when the maintenance is performed under stated conditions and using stated procedures and resources

Maintainability is an intrinsic property of the system that has to be designed prior to its development and EN 50126 classifies it as a system condition

11

Maintenance

Maintenance is an important means to maintain the inherent reliability of component and systems, such as safety instrumented systemsA structured approach to maintenance is a prerequisite to minimize maintenance related accidentsStudies has shown that up to 70% of accidents could be traced back to deficiency in the maintenance

12

Maintenance and safety

Maintenance related safety problems could be due to:1. Lack of preventive maintenance which increases the

safety critical failures of components2. Lack of corrective maintenance that leave the system in

an unsafe condition3. Accidents during maintenance, due to e.g. working on

pressurised systems, high voltage accidents, cutting incidents etc

4. Latent failures introduced during maintenance, e.g. forgetting to reset systems, error in assembling after maintenance etc

13

Railway authority

The body with the overall accountability to a Regulator for operating a railway systemNOTE: Railway authority accountabilities for the overall system or its parts and lifecycle activities are sometimes split between one or more bodies or entities. For example:

the owner(s) of one or more parts of the system assets and their purchasing agentsthe operator of the systemthe maintainer(s) of one or more parts of the system

Such splits are based on either statutory instruments or contractual agreements. Such responsibilities should therefore be clearly stated at the earliest stages of a system lifecycle.To clarify the term, it is emphasised that a “railway authority” in the sense of EN 50126 is NOT the regulator or the government

14

Comparison of terms (duty holders)

Bombardier, AlstomBaneService, MANTENA

suppliermanufacturing industry

railway support industry

SJTsafety authoritysafety regulatory authority

JBV, NSB, Bergen Bybane FlyToget, CargoNet, Oslo sporveier

infrastructure managerrailway undertaking

railway authority

ExamplesEU Safety DirectiveEN 50126

15

Def: Reliability (pålitelighet)

The probability that an item can perform a required function under given conditions for a given time interval(t1, t2)

An item can be a single component, a subsystem or a system. The reliability is dependent on the quality of the components of the item (inherent reliability) and is related to the mean time to failureMTTF (for non-repairable components/subsystems/systems) and the mean time between failures MTBF (for repairable components/subsystems/systems), respectively

16

Def: Risk (risiko)

The probable rate of occurrence of a hazard causing harm and thedegree of severity of that harmNote: This is often misinterpreted to mean:

“The probable rate of occurrence of a hazard that may cause harm and the degree of severity of that harm.”The problem is that the occurrence of a hazard is not equivalent to an occurrence of harm. In order to make risks comparable with each other it is important to consider the probability that a hazard actually leads to harm. For example, if the barriers at a level crossing do not close when commanded (hazard) this does not automatically lead to a crash between a train and a car (i.e. accident or occurrence of harm)Use risk matrixes with care; the frequency dimension should always reflect the occurrence of the consequence listed on the other axis, i.e. most likely (or alternatively the “worst case”) consequence

17

Risk approach in JBV/NSB maintenance model

18

Risk approach in JBV/NSB maintenance model

19

Def: Safety (sikkerhet)

Freedom from unacceptable risk of harmThis could be misleading, because the aspect “harm” is already included in the term “risk” as defined aboveTo avoid misunderstandings the shortened definition “freedom from unacceptable risk” is more appropriate

Note Safety according to this definition is a state and it does not make sense to express e.g., the level of safety in more than two levelsIf we do not believe in the use of acceptance criteria, the above definition does not make sense

20

Def: Safety integrity (sikkerhetsintegritet)

The likelihood of a system satisfactorily performing the required safety functions under all the stated conditions within a stated period of time

Generally, safety relies on adequate measures to prevent or tolerate faults (as safeguards against systematic failure) as well as on adequate measures to control random failuresIn this sense, safety integrity means to match the qualitative measures (to avoid systematic failures) with the quantitative targets (to control random failures

21

Def: Safety integrity level (SIL)Safety integrity levels are primarily defined for safety instrumented functions/systems, e.g., the signalling systemUsually four safety integrity levels are defined, where each level defines a range for the required safety integrity for the safety instrumented function to achieve necessary risk reductionNote

In IEC 61508 (the generic standard) SIL is only defined for explicit safety instrumented functions, whereas EN 50126 has a broader scopeThe guideline to EN 50 126 uses tolerable hazard rate (THR) as aconcept related to SILDifferent apportionment principles exist regarding THRs

22

Risk reduction: General concept from IEC 61508

Tolerable risk

EUC risk

Necessary risk reduction

Actual risk reduction

Increasingrisk

Residualrisk

Partial risk covered by E/E/PE

safety-related systems

Partial risk covered by other technology

safety-related systems

Partial risk covered by external risk

reduction facilities

Risk reduction achieved by all safety-relatedsystems and external risk reduction facilities

23

Def: Tolerable risk (akseptabel risiko)

The maximum level of risk of a product that is acceptable to the Railway Authority

The railway authority is responsible for providing risk acceptance criteria to the railway support industryNote that tolerable risk is not defined by the regulator, nor the government

24

Use of acceptance criteria

The use of acceptance criteria has been questioned in recent years (cf several journal papers by Terje Aven, UiS)Acceptance criteria are often defined as unconditional limits without including costs, benefits etcThe Norwegian “PLL=11” criterion is one such exampleThis is based on historical values, and does not include comparison to other societal risk elements, cost considerations etc

25

PLL = 11 ↔ German MEM principle

PLL=11 is a historical level in NorwayThree main contributions (one third on each)

Level crossingsPersons hit by the train in the trackTravellers

Assuming one million travellers each year the individual risk is in the order of 4⋅10-6 per yearThis is (much) lower than argued by the German MEM principle accepting an individual risk of 10-5

Claim: A doubling of the risk level for passengers would not be unacceptable

26

Visions, targets and acceptance criteria

Visions are ideal goalsTargets are those values for HSE indicators (e.g., # of fatalities) we believe is realistic to achieve

The indicators are random quantities, hence unexpected high values could occur also in a situation with a high safety level (low risk)

Risk acceptance criteria are limiting values for high riskPersonally I would accept at least PLL 20 for a sustainable railway transportation system, but recallThe zero vision remainsTarget values would most likely be in the order PLL=5

EN 50126 is mixes targets and acceptance criteria

27

Acceptable risk and the ALARP principle

Unacceptable region

Broadly acceptable region(No need for detailed work to

demonstrate ALARP)

Risk cannot be justified exceptin extraordinary circumstances

Tolerable only if risk reduction isimpracticable or its cost is

grossly disproportionate to theimprovement gained

Tolerable if cost of reductionwould exceed the improvement

gainedNecessary to maintain

assurance that risk remains atthis level

Negligible risk

The ALARP or Tolerabilityregion (Risk is undertaken only

if a benefit is desired)

Risk

28

Is apportionment possible?

EN 50126 defines apportionment as a process whereby the RAMS elements for a system are sub-divided between the various items which comprise the system to provide individual targetsIn the Norwegian offshore industry such an attempt has failed, and the general recommendation is rather to defined target values (SIL) for the various instrumented safety functions based on historical safety performance of such systems (no apportionment process)The OLF guideline (http://www.olf.no/?23661.pdf) discusses these challenges

29

Bow tie and risk analysis methods

Event tree analysisConsequence modelsReliability assessmentEvacuation modelsFire & explosion modelsSimulationHydraulic modelsTraffic flow models

ChecklistsPreliminary hazard analysisFMECAHAZOPEvent data sources

Fault tree analysisReliability block diagramsInfluence diagramsFMECAReliability data sources

Consequence analysisUndesired eventCausal analysis

30

Maintenance and RAMS

To optimize the maintenance program with respect to reliability and cost we establish models that link the component reliability to the maintenance levelA very common approach (cf. EN 50126 – 6.4.3.3) to establish a well documented maintenance program is reliability centred maintenance (RCM) comprising the following main elements

Functional Failure Analysis (FFA)Failure Mode and Effect Analysis (FMEA)RCM decision logic for assignment of PM tasksOptimization of maintenance intervals

31

What is RCM?

RCM is a method for maintenance planning developed in the sixties within the aircraft industry and later adapted to several other industries and military branchesA major advantage of the RCM analysis process is a structured, and traceable approach to determine the optimal type of preventive maintenance (PM)The main focus is on preventive strategies, but the results from the analysis may also be used in relation to corrective maintenance strategies, spare part optimization, and logistic support consideration

32

The seven main questions in RCM

1. What are the system functions and the associated performance standards?

2. How can the system fail to fulfil these functions?3. What can cause a functional failure?4. What happens when a failure occurs?5. What might the consequence be when the failure occurs?6. What can be done to detect and prevent the failure?7. What should be done when a suitable preventive task

cannot be found?

33

The main objectives of an RCM analysis process are to:

Identify effective maintenance tasksEvaluate these tasks by some cost–benefit analysisPrepare a plan for carrying out the identified maintenance tasks at optimal intervals

34

PM programme for rolling stock based on RCM

NSB has used RCM for almost two decades in order to establish a preventive maintenance program for the rolling stockA traditional RCM approach has been applied with main focus on

Functional failure analysis (FFA)Failure mode, and effect analysis (FMEA/FMECA)Use of RCM decision logic for assignment of maintenance tasks

In the last three years also formalized methods for interval optimization have been introduced by means of the OptiRCM tool

35

Structure of functional failure analysis

Function: ....Function: Home signal

Function: Departure light signalDescirption: Five lamp signals, with 3 main signals, and 2 pre-signals

Functional Failure- Wrong signal picture- Missing signal picture- Unclear signal picture- Does not prevent contact

hazard in case of earth fault- etc

E/HHHHHH

MSIs- Signal mast- Brands- Background shade- Earth conductor- Signal lantern- Lamp- Lens- Transformer- etc

36

Structure of functional failure analysis

Function: ....Function: Home signal

Function: Departure light signalDescirption: Five lamp signals, with 3 main signals, and 2 pre-signals

Functional Failure- Wrong signal picture- Missing signal picture- Unclear signal picture- Does not prevent contact

hazard in case of earth fault- etc

E/HHHHHH

MSIs- Signal mast- Brands- Background shade- Earth conductor- Signal lantern- Lamp- Lens- Transformer- etc

FMECA

37

FMEA example: Red light bulbComponent

Red light bulb, main signal

FunctionsGive the enginge driver a signal to ”STOP”Enabling the possibility to allow green light from the other direction

Failure modeNo light from the light bulb

Failure causesBurnt-out filament, short circuit, wire failure, lamp socket

Failure effectsSafety: May lead to collision train-trainPunctuality: Not able to set green light from the other side, delays

38

FMEA example: Red light bulbComponent

Red light bulb, main signal

FunctionsGive the enginge driver a signal to ”STOP”Enabling the possibility to allow green light from the other direction

Failure modeNo light from the light bulb

Failure causesBurnt-out filament, short circuit, wire failure, lamp socket

Failure effectsSafety: May lead to collision train-trainPunctuality: Not able to set green light from the other side, delays

39

RCM Decision logic

Does a failure alertingmeasurable indicator

exist?

Is ageing parameterα >1?

Is the functionhidden?

Is overhaulfeasible?

Scheduled overhaul(SOH)

Scheduledreplacement

(SRP)

Scheduled functiontest (SFT)

No

YesYes

Yes

No

No

No

No PM activityfound (RTF)

Yes Is continiousmonitoringfeasible?

Yes

NoScheduled on-

conditiontask (SCT)

Continious on-condition

task (CCT)

40

RCM Decision logic

Does a failure alertingmeasurable indicator

exist?

Is ageing parameter>1?

Is the functionhidden?

Is overhaulfeasible?

Scheduled overhaul(SOH)

Scheduled replacement

(SRP)

Scheduled functiontest (SFT)

No

YesYes

Yes

No

No

No

No PM activityfound (RTF)

Yes Is continiousmonitoringfeasible?

Yes

NoScheduled on-

conditiontask (SCT)

Continious on-condition

task (CCT)

41

Interval optimization

When relevant maintenance tasks are assigned by use of the RCM decision logic, the next step is to determine when and how often to carry out the maintenanceMaintenance optimization usually requires mathematical models that balance the benefits against the cost and other inconveniences of the maintenance, i.e., we need1. Component models showing the effective failure rate as a

function of the maintenance interval2. System models relating component performance to the system

performance (overall RA(M)S performance)3. Cost assignment, i.e., cost of preventive maintenance, corrective

maintenance, and loss of system performance

42

Note, at least two approaches

1. Define target values for each component, find the highest maintenance interval that ensure that the component with this maintenance interval fulfil the requirement,

For example a risk matrix approachA target value approach is recommended in OLF GL 70

2. Establish an object function to optimize, and find the maintenance interval that gives the best performance according to the object function

I.e., an explicit balance of benefits and disadvantages of maintenance

43

Visualization

Target

44

Models and computerized tools

A huge number of component models for the effective failure rate may be required, e.g., models for age/block replacement, models for safety instrumented systems (PFD-models), PF-models for condition monitoring etcComputerized tools is a prerequisite for modelling, and several computerized tools for optimization of the maintenance program existAt SINTEF/NTNU we recommend

MANIFER for the qualitative part of the RCM analysisOptiRCM for the quantitative part of the interval optimization

45

OptiRCM input screen

46

Input to the optimization model

Reliability parametersFailure rate / MTTF without maintenanceAging parameterPF-interval

Cost figuresCorrective maintenance costInspection costPreventive maintenance cost

TOP events safety and punctualityBarriers and barrier probability against the TOP event

47

Other parameters and assumptions

For a given TOP event, generic probability distributions for each end consequence are specified only once

This is a simplification, since impact of e.g. a derailment depends on the component causing the eventThis simplification make the optimization process manageable

It is assumed that we are in the “ALARP” region, hence optimization could be carried out without constraintsPersonal injury and fatality costs are treated explicitlyTØI figures are used as a basis, i.e., VPF (Value of Prevented Fatality) in the order of 3 mil. Euro

48

Some results related to VPF figures

A survey has been conducted (mainly among travellers in Norway) where normative issues have been addressed112 respondentsOne aspect has been to investigate whether the so-called “utilitarian“ ethical philosophy in accordance with our values

The utilitarian approach could be seen as an argument for maximizing total utility in terms of minimizing the number of fatalities as a result of accidentsThis may be in conflict to ideas like “gross-accident aversion”, “priority for children”, “priority for exposed groups” etc.

49

Some figures

Average “willingness” to use more resources againstChildren = 1.7Gross accidents = 1.4Exposed groups = 1.8Public transportations vs car transportation = 1.4Innocent travellers vs irresponsible drivers = 1.7

Related to the TØI model (VPF ≈ 3 mil Euro)Public investment: Increase the VPF by a factor 1.8My own willingness to pay: Increase the VPF by a factor 1.7

50

Values describing the red bulb example

35 Euro per CMCMCost

15 Euro per PMPMCost

450 EuroDelay time Cost | TOP15Train minutes delay | TOPFull StopTOP Punctuality6⋅106 EuroMaterial Cost | TOP15⋅106 EuroVPF-Cost | TOP10-6Probability of TOP|Comp. failureCollision Train-TrainTOP Safety5α (aging)7 yearsMTTFValueQuantity

51

Cost as a function of the maintenance interval

52

Rolling stock example

:

:0.001Pr(TOP|comp failure)Entrance accidentTOPInspection/replacementMaintenance taskLoose cableFailure causeNo signalFailure modeGive door closing signalFunctionBuzzer (signal)ComponentValueFMECA element

53

Results

25 18146 212Total cost14179CM cost

20 15417 753PM cost1 0495 919Punctuality cost3 96222 355Safety cost (VPF)

☺ 1.3E-47.1E-4PLL

2.35.4Replacement interval0.20.09Inspection interval

New valuesHistorical valuesQauntities

Improvement 21 030 (45.51%)

All figures are related to million train km (NOK)