Developing an Effective PM Program

Developing an Effective PM Program

 Developing an Effective PM Program

The program is developed using a guided logic approach and is task‑oriented rather than maintenance process oriented. This eliminates the confusion associated with the various interpretations across different industries of terms such as condition monitoring, on‑condition, hard time, etc. By using a task‑oriented concept, it is possible to see the whole maintenance program reflected for a given item. A decision logic tree is used to identify applicable maintenance tasks. Servicing and lubrication are included as part of the logic diagram as this ensures that an important task category is considered each time an item is analyzed.

Maintenance Program Content

The content of the maintenance program itself consists of two groups of tasks  A group of preventive maintenance tasks, which include failure-finding tasks, scheduled to be accomplished at specified intervals, or based on condition. The objective of these tasks is to identify and prevent deterioration below inherent safety and reliability levels by one or more of the following means:
  • Lubrication/servicing;     
  • Operational/visual/automated check;     
  • Inspection/functional test/condition monitoring;     
  • Restoration;     
  • Discard.  
  • It is this group of tasks, which is determined by RCM analysis, e. it comprises the RCM based preventive maintenance program.      
  • A group of non‑scheduled maintenance tasks which result from:     
  • Findings from the scheduled tasks accomplished at specified intervals of time or usage;     
  • Reports of malfunctions or indications of impending failure (including automated detection).
The objective of this second group of tasks is to maintain or restore the equipment to an acceptable condition in which it can perform its required function.  An effective program is one that schedules only those tasks necessary to meet the stated objectives. It does not schedule additional tasks that will increase maintenance costs without a corresponding increase in protection of the inherent level of reliability. Experience has clearly demonstrated that reliability decreases when inappropriate or unnecessary maintenance tasks are performed, due to increased incidence of maintainer-induced faults.

Reliability-Based Preventive Maintenance

This clause describes the tasks in the development of a reliability based preventive maintenance program for both new and in‑service equipment. In the development of a program the progressive logic diagram and the task selection criteria, illustrated in Table 1, are the principal tools. This progressive logic is the basis of an evaluation technique applied to each functionally significant item (FSI) using the technical data available. Principally, the evaluations are based on the items' functional failures and failure causes. 
The development of a reliability-based preventive maintenance program is based on the following:
-Identification of functionally significant items (FSIs).
-Identification of applicable and effective preventive maintenance tasks using the decision tree logic.

A functionally significant item is an item whose failure would affect safety or could have significant operational or economic impact in a particular operating or maintenance context. The process of identification of FSIs is based on the anticipated consequences of failures using an analytical approach and good engineering judgment. FSIs also uses a top down approach, and is conducted first at the system level, then at the subsystem level and, where appropriate, down to the component level. An iterative process should be followed in identifying FSIs. Systems and subsystem boundaries and functions are first identified. This permits selection of critical systems for further analysis, which involves a more comprehensive and detailed definition of system, system functions and system functional failures. 

The procedures below outline comprehensive set of tasks in the FSI identification process. All these tasks should be applied in the case of complex or new equipment. However, in the case of well‑established or simple equipment, where functions and functional degradation/ failures are well recognized, tasks listed under the heading of "system analysis" can be covered very quickly. They should, however, be documented to confirm that they were considered. The depth and rigor used in the application of these tasks will also vary with the complexity and newness of the equipment. Development tasks of a reliability-based preventive maintenance program.

Information Collection

Equipment information provides the basis for the evaluation and should be assembled prior to the start of the analysis and supplemented as the need arises. The following should be included:
  • Requirements for equipment and its associated systems, including regulatory requirements;
  • Design and maintenance documentation;
  • Performance feedback, including maintenance and failure data.
Also, in order to guarantee completeness and avoid duplication, the evaluation should be based on an appropriate and logical breakdown of the equipment.

System Analysis

The tasks described in the preceding define the procedure for the identification of the functionally significant items and the subsequent maintenance task selection and implementation. It should be noted that the tasks can be tailored to meet the requirements of particular industries and the emphasis placed on each task will depend on the nature of that industry.

Identification of Systems

The objective of this task is to partition the equipment into systems, grouping the components contributing to achievement of well-identified functions and identifying the system boundaries. Sometimes it is necessary to perform further partitioning into the subsystems, which perform functions critical to system performance. The system boundaries may not be limited by the physical boundaries of the systems, which may overlap.
Frequently, the equipment is already partitioned into systems through industry specific partitioning schemes. This partitioning should be reviewed and adjusted where necessary to ensure that it is functionally oriented. The results of equipment partitioning should be documented in a master system index that identifies systems, components and boundaries.

Identification of System Functions

The objective of this task is to determine the main and auxiliary functions performed by the systems and subsystems. The use of functional block diagrams will assist in the identification of system functions. The function definition describes the actions or requirements which the system or subsystem should accomplish, sometimes in terms of performance capabilities within the specified limits. The functions should be identified for all modes of equipment operation. 
Reviewing design specifications, design descriptions and operating procedures, including safety, abnormal operations and emergency instructions, may determine the main and auxiliary functions. Functions such as testing or preparations for maintenance, if not considered important, may be omitted. The reason for omissions must be given. The product of this task is a listing of system functions.

Selection of Systems

The objective of this task is to select and prioritize systems, which will be included in the RCM program because of their significance to equipment safety, availability or economics. The methods used to select and prioritize the systems can be divided into:
  • Qualitative methods based on past history and collective engineering judgment;     
  • Quantitative methods, based on quantitative criteria, such as criticality rating, safety factors, probability of failure, failure rate, life cycle cost, etc., used to evaluate the importance of system degradation/failure on equipment safety, performance and costs. Implementation of this approach is facilitated when appropriate models and data banks exist;     
  • Combination of qualitative and quantitative methods.
The product of this task is a listing of systems ranked by criticality. The systems, together with the methods, the criteria used and the results, should be documented.

System Functional Failures and Criticality Ranking

The objective of this task is to identify system functional degradation/failures and prioritize them. The functional degradation/failures of a system for each function should be identified, ranked by criticality and documented.  Since each system functional failure may have different impacts on safety, availability or maintenance cost, it is necessary to rank and prioritize them. The ranking takes into account probability of occurrence and consequences of failure. Qualitative methods based on collective engineering judgment and based on the analysis of operating experience can be used. Quantitative methods of Simplified Failure Modes and Effects Analysis (SFMEA) or risk analysis can also be used.

The ranking represents one of the most important tasks in RCM analysis. Too conservative a ranking may lead to an excessive preventive maintenance program, and conversely a lower ranking may result in excessive failures and a potential safety impact. In both cases, a non-optimized maintenance program will result. The outputs of this task are the following
  • Listing of system functional degradation/failures and their characteristics.
  • Ranking list of system functional degradation/failures.

Identification of Functionally Significant Items (FSIs)

Based on the identification of system functions, functional degradation/failures and effects, and collective engineering judgment, it is possible to identify and develop a list of candidate FSIs. As said before, these are items whose failures could affect safety; be undetectable during normal operation; have significant operational impact; have significant economic impact. The output of this task is a list of candidate FSIs.

Functionally Significant Item Failure Analysis

Once an FSI list has been developed, a method such as failure modes and effects analysis (FMEA) should be used to identify the following information that is necessary for the logic tree evaluation of each FSI. The following examples refer to the failure of a pump providing cooling water flow:
  • Function: the normal characteristic actions of the item (e.g. to provide cooling water flow at 100 I/s to 240 I/s to the heat exchanger).
  • Functional failure: how the item fails to perform its function (e.g. pump fails to provide required flow).   
  • Failure cause: why the functional failure occurs (e.g. bearing failure).  
  • Failure effect: what is the immediate effect and the wider consequence of each functional failure (e.g. inadequate cooling leading to over‑heating and failure of the system). 
The FSI failure analysis is intended to identify functional failures and failure causes. Failures not considered as credible, such as those resulting solely from undetected manufacturing faults, unlikely failure mechanisms or unlikely external occurrences, should be recorded as having been considered and the factors which caused them to be assessed as not credible should be stated.
Prior to applying the decision logic tree analysis to each FSI, preliminary worksheets need to be completed which clearly define the FSI, its functions, functional failures, failure causes, failure effects and any additional data pertinent to the item (e.g. manufacturer's part number, a brief description of the item, predicted or measured failure rate, hidden functions, redundancy, etc.). These worksheets should be designed to meet the user's requirements. 

From this analysis, the critical FSIs can be identified (i.e. those that have both significant functional effects and a high probability of failure, or have a medium probability of failure, but are judged critical or have a significantly poor maintenance record).

Maintenance Task Selection (Decision Logic Tree Analysis)

The approach used for identifying applicable and effective preventive maintenance tasks is one that provides a logic path for addressing each FSI functional failure. The decision logic tree uses a group of sequential “YES/NO” questions to classify or characterize each functional failure.

The answers to the “YES/NO” questions determine the direction of the analysis flow and help to determine the consequences of the FSI functional failure, which may be different for each failure cause. Further progression of the analysis will ascertain if there is an applicable and effective maintenance task that will prevent or mitigate it. The resultant tasks and related intervals will form the initial scheduled maintenance program.

NOTE ‑ Proceeding with the logic tree analysis with inadequate or incomplete FSI failure information could lead to the occurrence of safety critical failures, due to inappropriate, omitted or unnecessary maintenance, to increased costs due to unnecessary scheduled maintenance activity, or both.

Levels of Analysis

Two levels are apparent in the decision logic.
  • The first level (questions 1, 2, 3 and 4) requires an evaluation of each functional degradation/failure for determination of the ultimate effect category, i.e. evident safety, evident operational, evident direct cost, hidden safety, hidden non‑safety or none.     
  • The second level (questions 5, 6, 7, 8 and 9, A to F, as applicable) takes the failure causes for each functional degradation/failure into account in order to select the specific type of tasks.
First Level Analysis (Determination of Effects)

Consequence of failure (which could include degradation) is evaluated at the first level using four basic questions .Reliability decision logic tree-Level 1-Effects of functional failures
NOTE ‑ The analysis should not proceed through the first level unless there is a full and complete understanding of the particular functional failure.  
Question 1 ‑ Evident or hidden functional failure? The purpose of this question is to segregate the evident and hidden functional failures and should be asked for each functional failure.   
Question 2 ‑ Direct adverse effects on operating safety? To be direct, the functional failure or resulting secondary damage should achieve its effect by itself, not in combination with other functional failures. An adverse effect on operating safety implies that damage or loss of equipment, human injury or death, or some combination of these events is a likely consequence of the failure or resulting secondary damage.  
Question 3 ‑ Hidden functional failure safety effect? This question takes into account failures in which the loss of a hidden function (whose failure is unknown to the operating personnel) does not of itself affect safety, but in combination with an additional functional failure, has an adverse effect on operating safety.
NOTE ‑ the operating personnel consist of all qualified staff who are on duty and who are directly involved in the use of the equipment.  
Question 4 ‑ Direct adverse effect on operating capability? This question asks if the functional failure could have an adverse effect on operating capability:
  • Requiring either the imposition of operating restrictions or correction prior to further operation; or
  • Requiring the operating personnel to use abnormal or emergency procedures.

Second Level Analysis (Effects Categories)

Applying the decision logic of the first level questions to each functional failure leads to one of five effect categories, as follows:  Evident safety effects ‑ Questions 5A to 5E This category should be approached with the understanding that a task (or tasks) is required to ensure safe operation. All questions in this category need to be asked. If no applicable and effective task results from this category analysis, then re‑design is mandatory.

Evident operational effects ‑ Questions 6A to 6D A task is desirable if it reduces the risk of failure to an acceptable level. If all answers are "NO" in the logic process, no preventive maintenance task is generated. If operational penalties are severe, a redesign is desirable.  Evident direct cost effects ‑ Questions 7A to 7D A task is desirable if the cost of the task is less than the cost of repair. If all answers are "NO" in the logic process, no preventive maintenance task is generated. If the cost penalties are severe, a redesign may be desirable. 
Hidden function safety effects ‑ Questions 8A to 8F The hidden function safety effect requires a task to ensure the availability necessary to avoid the safety effect of multiple failures. All questions should be asked. If no applicable and effective tasks are found, then redesign is mandatory.  Hidden function non‑safety effects ‑ Questions 9A to 9E This category indicates that a task may be desirable to assure the availability necessary to avoid the direct cost effects of multiple failures. If all answers are "NO" in the logic process, no preventive maintenance task is generated. If economic penalties are severe, a redesign may be desirable.

Task Determination

Task determination is handled in a similar manner for each of the five effect categories. For task determination, it is necessary to apply the failure causes for the functional failure to the second level of the logic diagram. Seven possible task resultant questions in the effect categories have been identified, although additional tasks, modified tasks or modified task definition may be warranted, depending on the needs of particular industries.

Paralleling and Default Logic

Paralleling and default logic play an essential role at level 2. (see Figure 3.3) Regardless of the answer to the first question regarding "lubrication/servicing", the next task selection question should be asked in all cases. When following the hidden or evident safety effects path, all subsequent questions should be asked. In the remaining categories, subsequent to the first question, a "YES" answer will allow exiting the logic. (At the user's option, advancement is allowable to subsequent questions after a "YES" answer is derived, but only if the cost of the task is equal to the cost of the failure prevented).
eliability decision logic tree- Level 2- Effects categories and task determination  Default logic: Default logic is reflected in paths outside the safety effects areas by the arrangement of the task selection logic. In the absence of adequate information to answer "YES" or "NO" to questions in the second level, default logic dictates that a "NO" answer be given and the subsequent questions be asked. As "NO" answers are generated, the only choice available is the next question, which in most cases provides a more conservative, stringent and/or costly route.   Redesign: Re‑design is mandatory for failures that fall into the safety effects category (evident or hidden) and for which there are no applicable and effective tasks.

Maintenance Tasks
Explanations of the terms used in the possible tasks are as follows: 
  • Lubrication/servicing (all categories) This involves any act of lubricating or servicing for maintaining inherent design capabilities. 
  • Operational/visual/automated check (hidden functional failure categories only) An operational check is a task to determine that an item is fulfilling its intended purpose. It does not require quantitative checks and is a failure-finding task. A visual check is an observation to determine that an item is fulfilling its intended purpose and does not require quantitative tolerances. This, again, is a failure finding task. The visual check could also involve interrogating electronic units that store failure data. 
  • Inspection/functional check/condition monitoring (all categories) An inspection is an examination of an item against a specific standard. A functional check is a quantitative check to determine if one or more functions of an item performs within specified limits. Condition monitoring is a task, which may be continuous or periodic to monitor the condition of an item in operation against pre‑set parameters.
  •  Restoration (all categories) Restoration is the work necessary to return the item to a specific standard. Since restoration may vary from cleaning or replacement of single parts up to a complete overhaul, the scope of each assigned restoration task has to be specified. Discard (all categories) 
  • Discard is the removal from service of an item at a specified life limit. Discard tasks are normally applied to so‑called single‑cell parts such as cartridges, canisters, cylinders, turbine disks, safe‑life structural members, etc.
  • Combination (safety categories) Since this is a safety category question and a task is required, all possible avenues should be analyzed. To do this, a review of the tasks, which are applicable, is necessary. From this review, the most effective tasks should be selected. 
  • No task (all categories) It may be decided that no task is required in some situations, depending on the effect. Each of the possible tasks defined above is based upon its own applicability and effectiveness criteria. Table 3.4 summarizes these task selection criteria. 

Task Frequencies/Intervals

In order to set a task frequency or interval, it is necessary to determine the existence of applicable operational experience data that suggest an effective interval for task accomplishment. Appropriate information may be obtained from one or more of the following:
  • Prior knowledge from other similar equipment which shows that a scheduled maintenance task has offered substantial evidence of being applicable, effective and economically worthwhile.
  • Manufacturer/supplier test data which indicate that a scheduled maintenance task will be applicable and effective for the item being evaluated. 
  • Reliability data and predictions.
Safety and cost considerations need to be addressed in establishing the maintenance intervals. Scheduled inspections and replacement intervals should coincide whenever possible, and tasks should be grouped to reduce the operational impact.  The safety replacement interval can be established from the cumulative failure distribution for the item by choosing a replacement interval that results in an extremely low probability of failure prior to replacement. Where a failure does not cause a safety hazard, but causes loss of availability, the replacement interval is established in a trade‑off process involving the cost of replacement components, the cost of failure and the availability requirement of the equipment.
Mathematical models exist for determining task frequencies and intervals, but these models depend on the availability of the appropriate data. This data will be specific to particular industries and those industry standards and data sheets should be consulted as appropriate.  If there is insufficient reliability data, or no prior knowledge from other similar equipment, or if there is insufficient similarity between the previous and current systems, the task interval frequency can only be established initially by experienced personnel using good judgment and operating experience in concert with the best available operating data and relevant cost data.
To all my friends, The Maintenance Community on Slack is an incredible free space where over 1,500 maintenance and reliability professionals like myself share real life experiences with each other.   
To join us, sign up here:

Post a Comment

Previous Post Next Post