hva er rams, og hvordan bruke rams- metodikk i ... · pdf filehva er rams, og hvordan bruke...
TRANSCRIPT
1
Hva er RAMS, og hvordan bruke RAMS-metodikk i vedlikeholdsplanlegging?
Definisjoner, krav, RAMS-mål og metoderRAMS ifm vedlikeholdsplanlegging rullende materiell
Jørn VatnNTNU
2
Basic definitions
RAMS = Reliability, Availability, Maintainability and Safety
In Norwegian “Sikkerhet, pålitelighet ogvedlikeholdstilpasning”
3
Def: Apportionment (fordeling av systemkrav)
A process whereby the RAMS elements for a system are sub-divided between the various items which comprise the system to provide individual targets.
In this definition the term “RAMS elements” can usually be interpreted as “targets” or “requirements” for Reliability, Availability, Maintainability and Safety. The overall RAMS targets (e.g. risk acceptance criteria) has to be apportioned to the individual system elements in order to enable these elements to be constructed in a way that allows the overall target to be achieved
4
Def: Availability (Tilgjengelighet)
The ability of a product to be in a state to perform a required function under given conditions at a given instant of time or over a given time interval assuming that the required external resources are provided.
Availability is related to specific failed states/failure-modes (see fig.3 of EN 50126-1) of functions that the system is supposed to provide. Considering only the subset of safety-related failure modes the direct influence of availability on safety becomes obvious.Availability depends on the quality and the design of a system and is related to the ratio of the mean time to maintain (restore) MTTM and the mean time between failures MTBF.Prior to the determination of the availability the system boundaries have to be defined to be able to decide whether external resources (e.g. the supplied power) are part of the system.
5
Def: Failure rate (sviktintensitet)
The conditional probability of failure in (t, t+Δt) given the product has not failed up to time t divided by Δt.
Notation: z(t) = failure ratez(t)⋅Δt ≈ Probability of failurein the next time interval (Δt )given survival up to time tz(t) = ”bath tub curve”
6
Def: Effective failure rate
The effective failure rate, λE(τ), is the expected number of failures per unit time given that the product is maintained at intervals of length τ.
Note that the failure rate, z(t), is an inherent property of the product and not affected by the maintenance, whereas the effective failure rate, λE(τ), depends on the preventive maintenance program
7
Preventive maintenance and the bath tub curve
By preventively replacing the unit at age τ, the effective failure rate λE(τ) is reducedShorter replacement intervals (τ ), gives a lower failure rateThe challenge is to balance lower failure rate against increased maintenance and other negative effects
8
Def: Hazard (farekilde, faresituasjon)
A physical situation with a potential for human injury (EN 50126)A condition that could lead to an accident (EN 50129)
9
Def: Maintenance (vedlikehold)
The combination of all technical and administrative actions, including supervision actions, intended to retain a product in, or restore it to, a state in which it can perform a required functionPreventive maintenance (forebyggende vedlikehold)
The maintenance carried out at predetermined intervals or according to prescribed criteria and intended to reduce the probability of failure or the degradation of the functioning of an item
Corrective maintenance (korrektivt vedlikehold)The maintenance carried out after fault recognition and intended to put an item into a state in which it can perform a required function
10
Def: Maintainability (Vedlikeholdstilpasning)
The probability that a given active maintenance action, for an item under given conditions of use can be carried out within a stated time interval when the maintenance is performed under stated conditions and using stated procedures and resources
Maintainability is an intrinsic property of the system that has to be designed prior to its development and EN 50126 classifies it as a system condition
11
Maintenance
Maintenance is an important means to maintain the inherent reliability of component and systems, such as safety instrumented systemsA structured approach to maintenance is a prerequisite to minimize maintenance related accidentsStudies has shown that up to 70% of accidents could be traced back to deficiency in the maintenance
12
Maintenance and safety
Maintenance related safety problems could be due to:1. Lack of preventive maintenance which increases the
safety critical failures of components2. Lack of corrective maintenance that leave the system in
an unsafe condition3. Accidents during maintenance, due to e.g. working on
pressurised systems, high voltage accidents, cutting incidents etc
4. Latent failures introduced during maintenance, e.g. forgetting to reset systems, error in assembling after maintenance etc
13
Railway authority
The body with the overall accountability to a Regulator for operating a railway systemNOTE: Railway authority accountabilities for the overall system or its parts and lifecycle activities are sometimes split between one or more bodies or entities. For example:
the owner(s) of one or more parts of the system assets and their purchasing agentsthe operator of the systemthe maintainer(s) of one or more parts of the system
Such splits are based on either statutory instruments or contractual agreements. Such responsibilities should therefore be clearly stated at the earliest stages of a system lifecycle.To clarify the term, it is emphasised that a “railway authority” in the sense of EN 50126 is NOT the regulator or the government
14
Comparison of terms (duty holders)
Bombardier, AlstomBaneService, MANTENA
suppliermanufacturing industry
railway support industry
SJTsafety authoritysafety regulatory authority
JBV, NSB, Bergen Bybane FlyToget, CargoNet, Oslo sporveier
infrastructure managerrailway undertaking
railway authority
ExamplesEU Safety DirectiveEN 50126
15
Def: Reliability (pålitelighet)
The probability that an item can perform a required function under given conditions for a given time interval(t1, t2)
An item can be a single component, a subsystem or a system. The reliability is dependent on the quality of the components of the item (inherent reliability) and is related to the mean time to failureMTTF (for non-repairable components/subsystems/systems) and the mean time between failures MTBF (for repairable components/subsystems/systems), respectively
16
Def: Risk (risiko)
The probable rate of occurrence of a hazard causing harm and thedegree of severity of that harmNote: This is often misinterpreted to mean:
“The probable rate of occurrence of a hazard that may cause harm and the degree of severity of that harm.”The problem is that the occurrence of a hazard is not equivalent to an occurrence of harm. In order to make risks comparable with each other it is important to consider the probability that a hazard actually leads to harm. For example, if the barriers at a level crossing do not close when commanded (hazard) this does not automatically lead to a crash between a train and a car (i.e. accident or occurrence of harm)Use risk matrixes with care; the frequency dimension should always reflect the occurrence of the consequence listed on the other axis, i.e. most likely (or alternatively the “worst case”) consequence
19
Def: Safety (sikkerhet)
Freedom from unacceptable risk of harmThis could be misleading, because the aspect “harm” is already included in the term “risk” as defined aboveTo avoid misunderstandings the shortened definition “freedom from unacceptable risk” is more appropriate
Note Safety according to this definition is a state and it does not make sense to express e.g., the level of safety in more than two levelsIf we do not believe in the use of acceptance criteria, the above definition does not make sense
20
Def: Safety integrity (sikkerhetsintegritet)
The likelihood of a system satisfactorily performing the required safety functions under all the stated conditions within a stated period of time
Generally, safety relies on adequate measures to prevent or tolerate faults (as safeguards against systematic failure) as well as on adequate measures to control random failuresIn this sense, safety integrity means to match the qualitative measures (to avoid systematic failures) with the quantitative targets (to control random failures
21
Def: Safety integrity level (SIL)Safety integrity levels are primarily defined for safety instrumented functions/systems, e.g., the signalling systemUsually four safety integrity levels are defined, where each level defines a range for the required safety integrity for the safety instrumented function to achieve necessary risk reductionNote
In IEC 61508 (the generic standard) SIL is only defined for explicit safety instrumented functions, whereas EN 50126 has a broader scopeThe guideline to EN 50 126 uses tolerable hazard rate (THR) as aconcept related to SILDifferent apportionment principles exist regarding THRs
22
Risk reduction: General concept from IEC 61508
Tolerable risk
EUC risk
Necessary risk reduction
Actual risk reduction
Increasingrisk
Residualrisk
Partial risk covered by E/E/PE
safety-related systems
Partial risk covered by other technology
safety-related systems
Partial risk covered by external risk
reduction facilities
Risk reduction achieved by all safety-relatedsystems and external risk reduction facilities
23
Def: Tolerable risk (akseptabel risiko)
The maximum level of risk of a product that is acceptable to the Railway Authority
The railway authority is responsible for providing risk acceptance criteria to the railway support industryNote that tolerable risk is not defined by the regulator, nor the government
24
Use of acceptance criteria
The use of acceptance criteria has been questioned in recent years (cf several journal papers by Terje Aven, UiS)Acceptance criteria are often defined as unconditional limits without including costs, benefits etcThe Norwegian “PLL=11” criterion is one such exampleThis is based on historical values, and does not include comparison to other societal risk elements, cost considerations etc
25
PLL = 11 ↔ German MEM principle
PLL=11 is a historical level in NorwayThree main contributions (one third on each)
Level crossingsPersons hit by the train in the trackTravellers
Assuming one million travellers each year the individual risk is in the order of 4⋅10-6 per yearThis is (much) lower than argued by the German MEM principle accepting an individual risk of 10-5
Claim: A doubling of the risk level for passengers would not be unacceptable
26
Visions, targets and acceptance criteria
Visions are ideal goalsTargets are those values for HSE indicators (e.g., # of fatalities) we believe is realistic to achieve
The indicators are random quantities, hence unexpected high values could occur also in a situation with a high safety level (low risk)
Risk acceptance criteria are limiting values for high riskPersonally I would accept at least PLL 20 for a sustainable railway transportation system, but recallThe zero vision remainsTarget values would most likely be in the order PLL=5
EN 50126 is mixes targets and acceptance criteria
27
Acceptable risk and the ALARP principle
Unacceptable region
Broadly acceptable region(No need for detailed work to
demonstrate ALARP)
Risk cannot be justified exceptin extraordinary circumstances
Tolerable only if risk reduction isimpracticable or its cost is
grossly disproportionate to theimprovement gained
Tolerable if cost of reductionwould exceed the improvement
gainedNecessary to maintain
assurance that risk remains atthis level
Negligible risk
The ALARP or Tolerabilityregion (Risk is undertaken only
if a benefit is desired)
Risk
28
Is apportionment possible?
EN 50126 defines apportionment as a process whereby the RAMS elements for a system are sub-divided between the various items which comprise the system to provide individual targetsIn the Norwegian offshore industry such an attempt has failed, and the general recommendation is rather to defined target values (SIL) for the various instrumented safety functions based on historical safety performance of such systems (no apportionment process)The OLF guideline (http://www.olf.no/?23661.pdf) discusses these challenges
29
Bow tie and risk analysis methods
Event tree analysisConsequence modelsReliability assessmentEvacuation modelsFire & explosion modelsSimulationHydraulic modelsTraffic flow models
ChecklistsPreliminary hazard analysisFMECAHAZOPEvent data sources
Fault tree analysisReliability block diagramsInfluence diagramsFMECAReliability data sources
Consequence analysisUndesired eventCausal analysis
30
Maintenance and RAMS
To optimize the maintenance program with respect to reliability and cost we establish models that link the component reliability to the maintenance levelA very common approach (cf. EN 50126 – 6.4.3.3) to establish a well documented maintenance program is reliability centred maintenance (RCM) comprising the following main elements
Functional Failure Analysis (FFA)Failure Mode and Effect Analysis (FMEA)RCM decision logic for assignment of PM tasksOptimization of maintenance intervals
31
What is RCM?
RCM is a method for maintenance planning developed in the sixties within the aircraft industry and later adapted to several other industries and military branchesA major advantage of the RCM analysis process is a structured, and traceable approach to determine the optimal type of preventive maintenance (PM)The main focus is on preventive strategies, but the results from the analysis may also be used in relation to corrective maintenance strategies, spare part optimization, and logistic support consideration
32
The seven main questions in RCM
1. What are the system functions and the associated performance standards?
2. How can the system fail to fulfil these functions?3. What can cause a functional failure?4. What happens when a failure occurs?5. What might the consequence be when the failure occurs?6. What can be done to detect and prevent the failure?7. What should be done when a suitable preventive task
cannot be found?
33
The main objectives of an RCM analysis process are to:
Identify effective maintenance tasksEvaluate these tasks by some cost–benefit analysisPrepare a plan for carrying out the identified maintenance tasks at optimal intervals
34
PM programme for rolling stock based on RCM
NSB has used RCM for almost two decades in order to establish a preventive maintenance program for the rolling stockA traditional RCM approach has been applied with main focus on
Functional failure analysis (FFA)Failure mode, and effect analysis (FMEA/FMECA)Use of RCM decision logic for assignment of maintenance tasks
In the last three years also formalized methods for interval optimization have been introduced by means of the OptiRCM tool
35
Structure of functional failure analysis
Function: ....Function: Home signal
Function: Departure light signalDescirption: Five lamp signals, with 3 main signals, and 2 pre-signals
Functional Failure- Wrong signal picture- Missing signal picture- Unclear signal picture- Does not prevent contact
hazard in case of earth fault- etc
E/HHHHHH
MSIs- Signal mast- Brands- Background shade- Earth conductor- Signal lantern- Lamp- Lens- Transformer- etc
36
Structure of functional failure analysis
Function: ....Function: Home signal
Function: Departure light signalDescirption: Five lamp signals, with 3 main signals, and 2 pre-signals
Functional Failure- Wrong signal picture- Missing signal picture- Unclear signal picture- Does not prevent contact
hazard in case of earth fault- etc
E/HHHHHH
MSIs- Signal mast- Brands- Background shade- Earth conductor- Signal lantern- Lamp- Lens- Transformer- etc
FMECA
37
FMEA example: Red light bulbComponent
Red light bulb, main signal
FunctionsGive the enginge driver a signal to ”STOP”Enabling the possibility to allow green light from the other direction
Failure modeNo light from the light bulb
Failure causesBurnt-out filament, short circuit, wire failure, lamp socket
Failure effectsSafety: May lead to collision train-trainPunctuality: Not able to set green light from the other side, delays
38
FMEA example: Red light bulbComponent
Red light bulb, main signal
FunctionsGive the enginge driver a signal to ”STOP”Enabling the possibility to allow green light from the other direction
Failure modeNo light from the light bulb
Failure causesBurnt-out filament, short circuit, wire failure, lamp socket
Failure effectsSafety: May lead to collision train-trainPunctuality: Not able to set green light from the other side, delays
39
RCM Decision logic
Does a failure alertingmeasurable indicator
exist?
Is ageing parameterα >1?
Is the functionhidden?
Is overhaulfeasible?
Scheduled overhaul(SOH)
Scheduledreplacement
(SRP)
Scheduled functiontest (SFT)
No
YesYes
Yes
No
No
No
No PM activityfound (RTF)
Yes Is continiousmonitoringfeasible?
Yes
NoScheduled on-
conditiontask (SCT)
Continious on-condition
task (CCT)
40
RCM Decision logic
Does a failure alertingmeasurable indicator
exist?
Is ageing parameter>1?
Is the functionhidden?
Is overhaulfeasible?
Scheduled overhaul(SOH)
Scheduled replacement
(SRP)
Scheduled functiontest (SFT)
No
YesYes
Yes
No
No
No
No PM activityfound (RTF)
Yes Is continiousmonitoringfeasible?
Yes
NoScheduled on-
conditiontask (SCT)
Continious on-condition
task (CCT)
41
Interval optimization
When relevant maintenance tasks are assigned by use of the RCM decision logic, the next step is to determine when and how often to carry out the maintenanceMaintenance optimization usually requires mathematical models that balance the benefits against the cost and other inconveniences of the maintenance, i.e., we need1. Component models showing the effective failure rate as a
function of the maintenance interval2. System models relating component performance to the system
performance (overall RA(M)S performance)3. Cost assignment, i.e., cost of preventive maintenance, corrective
maintenance, and loss of system performance
42
Note, at least two approaches
1. Define target values for each component, find the highest maintenance interval that ensure that the component with this maintenance interval fulfil the requirement,
For example a risk matrix approachA target value approach is recommended in OLF GL 70
2. Establish an object function to optimize, and find the maintenance interval that gives the best performance according to the object function
I.e., an explicit balance of benefits and disadvantages of maintenance
44
Models and computerized tools
A huge number of component models for the effective failure rate may be required, e.g., models for age/block replacement, models for safety instrumented systems (PFD-models), PF-models for condition monitoring etcComputerized tools is a prerequisite for modelling, and several computerized tools for optimization of the maintenance program existAt SINTEF/NTNU we recommend
MANIFER for the qualitative part of the RCM analysisOptiRCM for the quantitative part of the interval optimization
46
Input to the optimization model
Reliability parametersFailure rate / MTTF without maintenanceAging parameterPF-interval
Cost figuresCorrective maintenance costInspection costPreventive maintenance cost
TOP events safety and punctualityBarriers and barrier probability against the TOP event
47
Other parameters and assumptions
For a given TOP event, generic probability distributions for each end consequence are specified only once
This is a simplification, since impact of e.g. a derailment depends on the component causing the eventThis simplification make the optimization process manageable
It is assumed that we are in the “ALARP” region, hence optimization could be carried out without constraintsPersonal injury and fatality costs are treated explicitlyTØI figures are used as a basis, i.e., VPF (Value of Prevented Fatality) in the order of 3 mil. Euro
48
Some results related to VPF figures
A survey has been conducted (mainly among travellers in Norway) where normative issues have been addressed112 respondentsOne aspect has been to investigate whether the so-called “utilitarian“ ethical philosophy in accordance with our values
The utilitarian approach could be seen as an argument for maximizing total utility in terms of minimizing the number of fatalities as a result of accidentsThis may be in conflict to ideas like “gross-accident aversion”, “priority for children”, “priority for exposed groups” etc.
49
Some figures
Average “willingness” to use more resources againstChildren = 1.7Gross accidents = 1.4Exposed groups = 1.8Public transportations vs car transportation = 1.4Innocent travellers vs irresponsible drivers = 1.7
Related to the TØI model (VPF ≈ 3 mil Euro)Public investment: Increase the VPF by a factor 1.8My own willingness to pay: Increase the VPF by a factor 1.7
50
Values describing the red bulb example
35 Euro per CMCMCost
15 Euro per PMPMCost
450 EuroDelay time Cost | TOP15Train minutes delay | TOPFull StopTOP Punctuality6⋅106 EuroMaterial Cost | TOP15⋅106 EuroVPF-Cost | TOP10-6Probability of TOP|Comp. failureCollision Train-TrainTOP Safety5α (aging)7 yearsMTTFValueQuantity
52
Rolling stock example
:
:0.001Pr(TOP|comp failure)Entrance accidentTOPInspection/replacementMaintenance taskLoose cableFailure causeNo signalFailure modeGive door closing signalFunctionBuzzer (signal)ComponentValueFMECA element
53
Results
25 18146 212Total cost14179CM cost
20 15417 753PM cost1 0495 919Punctuality cost3 96222 355Safety cost (VPF)
☺ 1.3E-47.1E-4PLL
2.35.4Replacement interval0.20.09Inspection interval
New valuesHistorical valuesQauntities
Improvement 21 030 (45.51%)
All figures are related to million train km (NOK)