The Eight Deadly Sins in Measuring Asset Reliability

The Eight Deadly Sins in Measuring Asset Reliability

 The Eight Deadly Sins in Measuring Asset Reliability

 

"How many organizations in the world actually measure asset reliability accurately and use the data to make adjustments in maintenance strategies?" The US Military, yes, they must, Commercial Airlines, yes, they must, what about industrial plants? These are my thoughts with the hope companies will see measuring asset reliability differently after reading this post.

Sin #1 – "Equipment Reliability is not measured"

Most companies don’t measure mean time between failures (MTBF), even though it’s the most basic measurement that quantifies reliability. MTBF is the average time an asset functions before it fails. So, why don’t they measure MTBF? Let’s define reliability first before we go any further.

Reliability: The ability of an item to perform a required function under stated conditions for a stated period of time
 

Sin #2: “We don’t really have failures”.

We have a Condition Monitoring / Predictive Maintenance program that detects failures before a component or asset catastrophically fails. In this case the definition of the term ‘failure’ must be re-examined.


More often than not, when a failure mode is detected by condition monitoring technologies, it requires some form of intrusive maintenance to rectify the problem. Just because your CM/PdM program gives you enough lead time to prevent catastrophic failure, it is a failure never the less because it has pasted “P” on the PF Curve, as with all intrusive maintenance, it also increases the risk of maintenance induced failure or ‘infant mortality”. Treat all results which come from PdM work orders as failures, maybe not a total functional failure but failure has begun.
The PF Curve is a great graphical representation of "how something fails"
Source: Allied Reliability
"The PF Curve shown above comes from one of the best PdM Consultants, Allied Reliability"
 
We have a Condition Monitoring / Predictive Maintenance program that detects failures before a component or asset catastrophically fails. In this case the definition of the term ‘failure’ must be re-examined. More often than not, when a failure mode is detected by condition monitoring technologies, it requires some form of intrusive maintenance to rectify the problem. 
 
Just because your CM/PdM program gives you enough lead time to prevent catastrophic failure, it is a failure never the less because it has pasted “P” on the PF Curve, as with all intrusive maintenance, it also increases the risk of maintenance induced failure or ‘infant mortality”. Treat all EM/PdM work orders as failures.
 
"Think about it, if a PdM Tech informs management that you have a defect on the outboard bearing on a large pump what would you do?"
Answer: "Plan" the replacement of the bearing or pump to specifications using a procedure based on best practices. If it fails before a downday at least one has the procedure, parts, and other requirements to restore the pump to "Specifications".

Let's exam the facts, "Bearings Fail Ramdonly". Would you want to measure MTBF of that one motor or maybe the MTBF of all of a specific size motors? You could measure the MTBF based on when they are checked out the storeroom because they are checked out because of a "Failure". I seen this and it works.

One Example: If condition monitoring detects a failure mode on an asset every 6 months, even with proper planning and scheduling, the asset becomes unavailable twice a year. By ignoring work orders raised by condition monitoring, you are merely treating the symptoms rather than going after the root cause.

What if we apply root cause and descrease the failures from twice a year to once every two years. Just a thought!

Sin #3: “We can’t measure MTBF in the same way for all of our assets because not all of our plant is a continuous operation”.


Some assets run only on the day shift (8 hours) while others run 24 hours a day. There should not be a misunderstanding of how MTBF is measured.
No alt text provided for this image

The same calculation is used:

For example, If machine ‘x’ runs for 8 hrs a day and fails 3 times a year, and machine ‘y’ runs for 24 hours a day and fails 9 times a year, the MTBF is the same for both assets.
It is simply the number of hours in operation divided by the number of failures. It’s that simple. Most of the time identifying the exact measurement is not as important as knowing you have a problem. Just a thought!
 

Sin #4: “Work orders don’t capture all emergency work”.  

Many companies have rules such as,

“A work order will be written only if the equipment is down for more than one hour.” 

This rule doesn’t make sense. Let’s say, for example, a circuit overload on a piece of equipment trips 100 times in a month. Many times, small problems lead to major asset failure. Don’t wait until a small problem becomes a big one.
Start tracking MTBF and you’ll be on the road to reliability. Eventually, you’ll learn to manage your assets proactively according to their health. Then, you’ll see your MTBF improve dramatically.

Sin # 5: “Not every asset is loaded into the CMMS/EAM”.

This is a problem that makes writing an emergency work order impossible. If you’re not tracking every asset down to the component level, you can’t possibly identify any true reliability issue.

Think about it this way; if 20% of your assets eat up 80% of your resources, wouldn’t you want to identify that 20%, the bad actors? Put all of your assets in your CMMS/EAM, track the MTBF and the bad actors will become obvious.

Validating your equipment hierarchy is the first step.

Sin #6: It isn’t important to measure MTBF because other metrics provide equivalent value.

Yes, you can measure asset reliability from other metrics, but keep it simple by using MTBF. Count the number of breakdowns (the number of emergency work orders) for an asset during a given time interval (by week) - keep it simple at first and then advance to the next level. All we need to know is how long the equipment runs (on average) before it fails. 

MTBF is the number one measurement in the world to measure equipment reliability. Can other metrics be used? Yes.

Sin #7: “The maintenance organization is in such a reactive mode that there’s no time to generate any metrics.”

They’re constantly scrambling merely to react to the latest crisis. But, taking a small step in the right direction – tracking just one measure of reliability – will reveal the 20% of the assets that are burning 80% of the resources. If you start with the worst actor, you’ll be surprised at how quickly you can rise out of the reactivity quagmire. 

For example, a plant manager who recently measured the MTBF for what he called his “Top 10 Critical Assets” and was shocked at the results. He expected the combined MTBF for these assets would be around eight hours to nine hours. In the first month of this initiative, he found that the actual MTBF was 0.7 hours.

You may find yourself in the same situation. You’ll never know the true reliability status on your plant floor until you begin measuring it.

Remember: The data is the data whether one likes it or not.

Sin #8: “There are too many other problems to worry about right now without being pressured to measure reliability, too”.

I’ve heard this many times and what it tells me is that the organization is in total reactive mode. This organization deals only with the problem of the hour. If 20% of your assets are taking 80% of your resources, dig yourself out of the problem by attacking the assets that cause the most pain – the “high payoff assets” that will respond to a reliability improvement initiative.

We’ve got to stop fighting fires. The characteristics of adept firefighters include:
1. High turnover of personnel (mostly in production).
2. Maintenance costs continue to rise.
3. Maintenance costs are capped before the month ends (“Don’t spend any more money this month. We’re over budget.”)
4. Every day is a new day of problems and chaos.
5. Maintenance is blamed for missing the production goals.

It isn’t easy to fight fires and initiate reliability improvement at the same time, but it can be done. Start measuring MTBF and attack the high-payoff assets.

Remember this: (I was taught this in 1980 while working at Alumax Mt Holly (Alcoa Mt Holly) one of three plants in the world ever certified as achieving "World Class Maintenance Status".
 
No alt text provided for this image
Admit it, you cannot change a company’s culture from reactive to proactive overnight, however you can eliminate reliability problems on one major system at a time.
That’s where you’ll find a rapid return on investment. Understand this, "it is all about the money"
 
No alt text provided for this image
The Measurement of Maintenance Effectiveness is:
"Maintenance Cost as % of Replacement of Asset Value" 

> Best of the Best (Alcoa Mt Holly) 3.4% 
> Typical (6-15%) 

How to change?:
Change people’s activities and behaviors slowly and you’ll transition to a proactive culture.

Measuring Asset Reliability is the key to keeping a company profitable, increasing its capacity and reducing its maintenance cost. In a future column, we’ll present some reliability improvement ideas. Check out the results of measuring MTBF by one company. They only measured MTBF of 900 Electric Motors for three years applied while applying a couple known best practices.
 
No alt text provided for this image
If you want to measure MTBF Effectively, based on my experience, begin measuring MTBF at the Section or System Level (see the Equipment Taxonomy from ISO 14224 below).

Once you have identified which Section or System has the lowest reliability you then begin measure the Components or Maintainable items in that specific Section or System.
 
No alt text provided for this image
To sum up this post, reactive maintenance cost are extremely high and MTBF is one of the tactical metrics which can help one make a difference. Yes, there are other methods. I am just sharing one method because MTBF is the NUMBER 1 Measurement of Reliability.

No alt text provided for this image 
 
About:
To all my friends, The Maintenance Community on Slack is an incredible free space where over 1,500 maintenance and reliability professionals like myself share real life experiences with each other.   
 

Post a Comment

Previous Post Next Post