Key performance metrics and indicators survey highlights industry issues
Some surprising answers drive against conventional industry wisdom
By Larry O’Brien
The recent ARC survey on key performance metrics and indicators had some surprising answers that go against conventional industry wisdom. ARC believes that the financial impact of unplanned downtime in process operations had previously been underestimated. It now estimates that the total cost of unplanned downtime for all the process industries worldwide is easily in the range of $1 trillion. A large base of process automation systems installed around the world desperately needs modernization. ARC estimates the "aged out" installed base has grown over the past decade, from $65 to $70 billion. This contributes to downtime, resulting in lost profits.
In the process industries, historical downtime information has been passed on and accepted as truth. In ARC's view, many of these statistics do not represent a realistic view of the industry today. For example, the true impact of unplanned downtime on revenue and profitability has been vastly underestimated.
Most unplanned downtime occurs when the plant is in a transitional state, including startup or grade changes typically involving specific operating procedures. Survey respondents also cited delayed maintenance turnarounds as a major source of unplanned downtime.
Survey respondent demographics
End users accounted for almost 34 percent of total respondents; 30 percent were suppliers, and the remainder were system integrators, consultants, original equipment manufacturers (OEMs), and others. More than 16 percent of responses came from the oil and gas sector, with approximately 12 percent petrochemicals and 11 percent refining. Most other process industries were also represented, including food and beverage, power generation, chemicals, life sciences, and pulp and paper.
More than 39 percent of respondents were from North America, close to 18 percent from Western Europe, over 14 percent from India, and close to 11 percent from Latin America. Other regions responding included Eastern Europe/CIS (over 5 percent) and a few responses from the Middle East, Africa, Asia, and China.
ARC also asked respondents if they represented a single plant or site, multiple plants, or an entire company. This survey offers a comprehensive view into end user practices and experiences across multiple plants and sites with more than 48 percent of respondents answering on behalf of their companies and 25 percent on behalf of multiple sites or regions.
When, why, and how of unplanned downtime
Today's plants are under pressure to contribute value to the company's bottom line by continuously improving asset performance. Process industries are extremely asset intensive, with manufacturing assets typically representing approximately 75 percent of the company's entire assets.
The difficulty generally encountered when attempting to gain approval for new plant construction or capacity make existing assets even more valuable and more important to protect. Today's business drivers focus on metrics such as return on assets (ROA) and overall equipment effectiveness (OEE). Both contribute significantly to the overall goal of achieving operational excellence (OpX). We will discuss these later in this article but, first, let's look at the primary reason for measuring things like OEE: avoiding unplanned downtime.
Unscheduled downtime due to equipment failure, operator error, or nuisance trips is the nemesis of all manufacturers. ARC previously estimated the cost of unscheduled shutdowns is equivalent to about 5 percent of the total output of all the process industries. Other organizations, such as the Abnormal Situation Management (ASM) Consortium, estimate that unplanned downtime costs of the U.S. petrochemical industry alone are $10-$20 billion per year.
But what is the real impact of unplanned downtime? What is the average percentage of unplanned downtime as a percent of total production? How is unplanned downtime defined? When does unplanned downtime occur during normal production, and what or who is responsible? These are the questions ARC set out to answer in our survey.
Primary reasons for unplanned shutdown
ARC asked survey respondents the primary reasons for unplanned downtime, starting with the most prevalent. Is most unplanned downtime due to human error or some extraordinary event or "act of God" like a hurricane or flood? Other causes can be failure of the process (e.g., a bad reaction or wrong catalyst) and equipment failure (e.g., piping, vessels, controls).
Research from organizations like the Abnormal Situation Management Consortium indicate that operator error is the biggest contributor to plant incidents. Certainly, downtime can start in a piece of equipment, but if the operator does not have the right information, it can lead to a major incident with implications for health, safety, and environment.
Our survey, however, measured the effect of unplanned downtime, because it does not always result in a plant incident. For example, unplanned downtime can occur as the result of a prolonged maintenance turnaround. Most respondents ranked failure of equipment, including piping, vessels, and controls, as the most prevalent reason for unplanned downtime. Failure in the process (e.g., bad reaction) ranked second, with human error coming in third. Unusual events were least to blame.
Whether an operator is presented with the relevant information pointing to the root problem and is empowered to act is another matter. In this regard, ARC agrees that operators do not typically see enough of the right information at the right time to make the right decisions about how best to control the process to avoid unplanned downtime events. Giving operators better information and training can help. In addition, operators and other appropriate personnel need to be empowered to act on that information and training.
Most common times for unplanned shutdown
According to conventional wisdom, most unplanned downtime occurs during certain operational procedures, most notably startup and shutdown. Normal plant operations are not seen as a major source of unplanned downtime events, nor are regularly scheduled planned downtime activities, such as maintenance turnarounds.
Survey responses indicate that this conventional wisdom may not be entirely correct. ARC asked survey respondents to rank the most common times for unplanned shutdown. Available options included: during normal operations, startup, shutdown, higher than normal load, extraordinary events like hurricanes and floods, shift changes, other plant transitional states (e.g., grade changes), and delays in regularly scheduled maintenance turnarounds.
On a weighted average basis, startup was tied with "other transitional changes" in the process (like grade changes) as the most common times that unplanned downtime happens. Maintenance turnarounds are a planned activity, but unexpected delays in these turnarounds were cited as a major source of unplanned downtime. This confirms associated ARC research. Maintenance turnarounds are typically planned well in advance-in many cases, as far as a year in advance. Due to this extended planning window, turnarounds do not always take advantage of the available real-time, dynamic asset and plant information to change the plan. Many end users also do not take advantage of intelligent device diagnostic data to maximize turnaround efficiency.
Surprisingly, "normal operation" was cited as one of the most common times for unplanned downtime, ranking higher than during shutdown, operating at higher than normal loads, or shift changes. Extraordinary events like hurricanes and floods were ranked as the least prevalent time for an unplanned plant shutdown. Many end users, particularly in the oil and gas and downstream industries, have become much more sophisticated when it comes to using real-time weather data to shut down plant operations when an impending weather-related natural disaster is predicted.
Quantifying unplanned downtime
Survey responses indicated most end users quantify unplanned downtime in terms of its effect on OEE. Percentage of uptime and impact on schedule were also cited by more than 18 percent of respondents. Close to 16 percent of respondents quantified unplanned downtime in terms of absolute time. Surprisingly few respondents quantified the monetary value of unplanned downtime, either for lost production or through its impact on maintenance costs.
Production lost to unplanned downtime
ARC asked survey respondents to estimate the percentage of production lost to unplanned downtime or deferred production as the result of an unexpected plant shutdown, incident, or abnormal situation. No single percentage range emerged as a clear majority. Most responded in the range of 1-12 percent. Slightly more than 23 percent indicated that unplanned downtime represented 4-6 percent of production. Over 20 percent responded in the 1-3 percent range, while a similar number responded in the 10-12 percent range. Almost 18 percent responded in the 7-9 percent range. It is interesting to note that over 50 percent of all respondents indicated downtime as over 6 percent of production lost, and over a third indicated downtime was over 10 percent.
Respondents also included some interesting comments: "Significantly less production lost and higher OEE in continuous operations opposed to continuous to batch processes." Production lost to unplanned downtime varies greatly among plants in the same company, depending on the plant's maturity level, maintenance strategy, resources, etc.
Aging installed base
A large percentage of the process automation systems installed around the world are at the end of their useful life and need replacement. ARC has increased the estimated value of these systems from roughly $65 billion to $70 billion. The base includes distributed control systems (DCSs), older systems that may predate DCSs, programable logic controller (PLC)/human-machine interface (HMI) combination systems, and process safety systems. To back up this research, ARC asked survey respondents the average control system age at their plants as well as the percentage of the installed base of control systems that are at or nearing the end of their useful life.
Close to a third of respondents had systems installed that ranged from 5-10 years old; 5 percent had brand new or relatively new systems less than five years old. The largest group of responses came from those with systems that were 10-15 years old. Anything older represents systems in need of replacement. Typically, a system begins to "age out" after 20 years, with at least some components needing replacement. Chances are that HMI and workstations have been upgraded already. The 15-to-20-year-old systems represented 10 percent of respondents, and systems 20-30 years old accounted for more than 12 percent of respondents. These older systems may no longer have a valid support path or an adequate pool of experienced people to support them. At 22 percent of the total installed base, these older systems represent the biggest downtime risk and threat.
ARC also asked the question, "What percentage of your installed base of control systems is at the end or nearly at the end of its useful life, requiring imminent replacement?" The largest segment (40 percent) of respondents told us that 20-40 percent of the installed base fell into this category. Thirty percent of respondents told us that 40-60 percent of their installed base needed replacement. Only a very small percentage of respondents reported that over 60 percent of the installed base needed replacement.
Process KPIs: OEE and RAV
The survey asked about the most common metrics used to track reliability and reduce unplanned downtime, including overall equipment effectiveness (OEE) and replacement asset value (RAV).
Overall equipment effectiveness
Fundamentally, OEE determines how much product is produced versus what could have been produced. OEE measures true productive manufacturing performance of a unit benchmarked against design capacity.
Many automation suppliers offer asset management and reliability solutions, and close to 58 percent of our survey respondents indicated that they use OEE as a key performance indicator (KPI) in their manufacturing operations. More than 27 percent do not use OEE. Fifteen percent of respondents told us they use OEE in combination with other KPIs like energy intensity (MMBtu/lb of product), total productive maintenance, or a variation of OEE, such as operating rate relative to best demonstrated rate. Overall plant availability and total maintenance cost were also cited as common industry metrics. One respondent pointed out that while OEE is used heavily, it does not always relate to plant profitability. The ideal OEE number is, of course, 100 percent, but that is not realistic to achieve in the real world. ARC asked survey respondents about their target OEE number. Most respondents indicated a target OEE of 85-90 percent, while almost 23 percent responded in the 80-85 percent range.
Replacement asset value
RAV is another commonly used KPI in the process industries that quantifies the monetary value to replace existing production assets. Many end users measure the efficiency of their maintenance and maintenance costs as a percentage of RAV. Over 27 percent of total respondents indicated that percent of RAV was used, which makes it a much less commonly used metric than OEE. Some respondents indicated that they used both ROA relative to the replacement asset value as well as measuring the cost of maintenance relative to RAV.
Target maintenance cost as a percent of RAV
We asked those who used maintenance cost as a percent of RAV about the target maintenance cost as a percent of RAV. Close to 45 percent of respondents were in the 3 percent range, while a little more than 18 percent responded in the 2 percent or 1 percent range.
ARC believes that the financial impact of unplanned downtime in process operations was previously underestimated. ARC now estimates that the total cost of unplanned downtime for all the process industries worldwide is easily in the range of $1 trillion. The primary value proposition of many of the new technologies finding their way into the process automation world today (i.e., Industrial Internet of Things [IIoT], the cloud, digitalization) involves increased asset reliability and reduced unplanned downtime.
The root causes of unplanned downtime may be more complex than conventional wisdom has led us to believe. While human error remains a primary reason, problems in process design or with the equipment controlling the process are more likely to be the root cause. If that information is not effectively communicated to the operator in a timely and contextual fashion, how much responsibility should the operator assume? Many new technologies are already available that can greatly enhance the role of the operator and help avoid unplanned downtime incidents. These include advanced operator graphics, new control room designs, better training simulation, procedural (state-based) automation, and improved alarm management and situational awareness technologies.
A large installed base of process automation systems around the world desperately needs modernization. Many end users are holding on to their old systems longer. In addition, systems that had not yet reached the end of their useful lives five to 10 years ago now remain in place and operational. Efforts to modernize and migrate the installed base have accelerated over the past couple of years, even as the market for new capital projects has been drying up.
With an older installed base desperately in need of replacement, the desire for end users not to be "locked in" with a particular vendor's largely proprietary system that does not "age in place" has manifested itself in industrywide initiatives, including The Open Process Automation Forum.
End users employ many KPIs to track unplanned downtime and increase overall plant reliability, paving a path to operational excellence. Overall equipment effectiveness and maintenance cost as a percentage of replacement asset value are two primary methods for doing this, but KPIs are just part of the equation. Companies often neglect to consider the human element, including how humans view and process information and the work processes they must use.
The human role is paramount in reducing unplanned downtime. Operators generally do not see enough of the right information at the right time to make the right decisions about how best to control the process to avoid an unplanned downtime event. To avoid an incident, operators must receive better information and training to make the right decisions. Operators must also be empowered to make these types of decisions based on their training and the available information.
Many tools are available today to improve the performance of humans that interact with control systems and instrumentation. These include procedural automation, improved alarm management, advanced operator displays and graphics, and implementation of IIoT-based predictive solutions. ARC recommends that operating companies dedicate the appropriate resources needed to evaluate, implement, and maintain these tools. New approaches, such as procedural automation and IIoT-enabled technologies can help reduce the likelihood of downtime at any time.