1 January 2007

Don't be Alarmed

Avoid unplanned downtime from alarm overload, use top techniques to improve alarm management

By Gary Goble and Todd Stauffer

Alarm management is one of the most undervalued and underused assets of process automation. With process automation systems losing $20 billion to unplanned downtime, and with 40% of it attributed to preventable human error, alarm management has never been more important. An important factor lending itself to human errors is the cacophony of alarms, which preclude the operator's ability to respond quickly and correctly. As alarm systems become less effective, they diminish the effectiveness of the entire automation system. By taking advantage of the alarm management functions of a sophisticated distributed control system (DCS), manufacturers place an investment in risk management. As a result, plants may operate closer to the limits than ever before while reducing downtime, increasing productivity, and protecting workers.

Headlines have featured loss of product, damage to equipment, and loss of life as results of poor alarm management. The explosion at the BP Texas City, Tex., refinery in March 2005 cost 15 lives, 100 injuries, and significant property damage. The total OSHA fines exceeded $20 million for this incident. The fine for the alarm system alone was more than $2 million. In addition, company management is under scrutiny for criminal negligence. The U.S. federal investigative report found "managers authorized the start-up of a key unit despite knowing that key alarms weren't working."

An alarming trend in DCS

More is not necessarily better when it comes to alarms in automated process control systems. Not long ago, when controls were hardwired to annunciator panels, engineers were selective about the number of alarms in a system. In a hardwired system, a single alarm could cost $1,000 to implement, so there were typically no more than 30 to 50 alarms per unit. Today, we often view alarms as free. We might configure a single I/O point to activate multiple alarms. Modern DCS may include more alarms configured in the system than there are measured process values.

As a result, managing these alarms is becoming an increasingly important role of the process plant operator, who must seamlessly interact with the DCS. The job of the control system is to maintain the process within the target range of performance. The job of the operator is to intervene to keep the process within its normal performance range. When processes go outside the normal range, upset conditions prevail. The ability to respond quickly, efficiently, and correctly during upset conditions can mean the difference between high profits, unplanned downtime, or catastrophic failure. By taking advantage of new alarm management capabilities, manufacturers minimize unplanned downtime, increase production, and more confidently protect workers.

Industry trends elevate alarm management

Significant trends in manufacturing facilities are putting pressure on plant operators and elevating the alarm management issue. Operators are doing more than ever before, and the number of alarms can become overwhelming. Significant variations exist in the operators' education level, skill sets, and abilities. Usually the operator doesn't have adequate training time, particularly in responding to alarms and upset conditions. Many of the plant's most experienced operators, technicians, maintenance people, and control engineers are transitioning out of the workforce, leaving plant managers to deal with a brain drain on the company's knowledge base.

Industry leaders have acknowledged the growing importance of alarm management. In 2003, the User Association of Process Control Technology in Chemical and Pharmaceutical Industries issued recommendation NA 102 Alarm Management. More recently the ISA working committee SP18 issued a draft version of S18.02, a new standard intended to define the terminology and models necessary to develop an alarm system and to define the work processes required to maintain the system over time.

Alarm management best practices

To understand how a sophisticated DCS improves alarm management, you should first look over industry best practices. One of the most important references documenting alarm management best practices is the Engineering Equipment Manufacturers and Users Association (EEMUA) Publication 191, Alarm Systems - A Guide to Design, Management and Procurement. It provides practical recommendations on alarm management based on experiences from end users and human factor studies. According to the EEMUA, you should configure alarms and alarm systems with the following basic principles in mind.

The single most important characteristic of a good alarm system design is a requirement of operator response. If an alarm condition doesn't really require the operator to take action, then there should be no alarm for that condition.

The alarm subsystem should contain the following capabilities to help end users implement alarm management best practices and follow the recommendations of the EEMUA:

  • Focus operator attention on the most important alarms.
  • Provide clear and understandable alarm messages.
  • Provide information on the recommended corrective action and allow recording of comments regarding the actions taken or the future actions required.
  • Suppress (lock) all alarms from a field device or from a process area.
  • Analyze alarm system performance metrics to identify nuisance alarms or areas requiring additional training.


Indicators of poor alarm management

There are several tell-tale signs or symptoms of poor alarm management. Manufacturing operations usually exhibit one or more of these symptoms. Some of the most common include:

  • Nuisance alarms: When alarm conditions come and go on a regular basis or intermittently
  • Alarm floods: When too many alarms are presented to the operator during abnormal situations
  • Cascading alarms: When specific alarms always occur together
  • Useless alarm messages: When alarm messages do not provide meaningful information to the operator concerning the cause of the problem or the corrective action
  • High number of high-priority alarms: When too many high-priority alarms are present in the system, causing the operator to treat some as lower priority
  • Standing alarms: When too many alarms are continuously present in the system, even during steady-state conditions, and operators ignore them

Five ways to improve

While there are quite a few features and benefits of DCS alarm management capabilities, five categories align with the EEMUA's best practices, delivering immediate benefits to any process manufacturer and allowing personnel to:

  1. Focus on the most important alarms
  2. Suppress meaningless alarms as needed
  3. Quickly comprehend the situation based on clear, consistent, concise, and informative messages
  4. Obtain useful information regarding probable cause and recommended corrective action
  5. Evaluate system and operator performance


Focus on most important alarms

Operators run into more alarms per day (by a factor of 10X) than they can reasonably process (based on the recommendations of the EEMUA).

With the potential for an overabundance of alarms, one of the most important features of the DCS focuses the operator's attention on the most critical alarms. When a new alarm occurs, multiple visual and audible methods exist within the system to attract the operator's attention and indicate the importance of a particular alarm condition.

You can assign each individual alarm message within the system an alarm priority between 1 and 16 (0 means default, not assigned). EEMUA studies have shown to maximize operator effectiveness, you should configure no more than three different sets of alarm priorities in a system. You can use this alarm priority attribute to ensure displaying the highest priority unacknowledged alarm at all times on a dedicated line at the top of the screen. It also provides one means of filtering alarm display lists so operators can home in on specific critical alarms. You can find a comprehensive list of alarms in the incoming alarm list, a preconfigured alarm display within the DCS. From this display, an operator can filter or sort-base on any column to fine-tune the display so it shows only the most important alarms.

The default HMI symbols (block icons) focus the operator's attention. When a new alarm occurs, the alarm status display within the HMI symbol changes color (typically yellow or red) and displays text indicating an alarm is present. This dual indication of alarm status change (color and text) follows HMI design best practices to ensure even color-blind operators, who make up a significant minority of the male population, can see the alarm state change.

The right DCS features a built-in alarm horn capability that allows the user to trigger an action when a new alarm condition occurs, such as playing a unique .wav file via the PC's sound card. You can configure the horn to create a unique sound based on any combination of specific alarm attributes, including message class, priority, tag name (source), or process area.

Alarm suppression

Another way to reduce the number of meaningless alarms is to use an alarm locking or suppression capability. You can suppress all alarms from an individual device, or from an entire process area, at the touch of a button (the lock button). As a result, you can remove alarms from deactivated equipment, or from process areas that are non-operational, from the view of the operator. Standard access security protects the lock button and requires the appropriate privileges to enable it.

When a device is locked or out of service, place an X in the alarm status grids contained in the symbol (block icon) on the process graphic. Also place an X in the plant overview button bar to indicate you have taken at least one point in a particular area out of service (locked).

Alarm locking is a powerful capability for eliminating meaningless alarms that can impede operator effectiveness. However, if left unmonitored, important alarms could remain out of service after they become meaningful. To prevent this situation, the DCS displays a list of all devices currently suppressed (called the lock list). Operations personnel can get an up-to-date list of all the locked alarms in their system, no matter how old, to determine whether it is acceptable to start using the equipment.

Provide clear alarm messages

Users conform to alarming best practices because of clear and understandable information in alarm messages. A typical alarm message display will contain clear, easy-to-understand information, such as tag name, tag description, process area in which the alarm occurred, message type, alarm state, alarm priority, and corrective action information if it exists. In particular, the Event field is easily configurable, providing information in a format that makes sense to a plant operator.

Recommended corrective action

One of the most significant manufacturing facility trends is the attrition of experienced and knowledgeable plant personnel (including operators). The DCS was introduced in 1975 and widely adopted in manufacturing facilities throughout the late 1970s and early 1980s. In many plants, the best and most experienced operators have run the plant for the past 25 to 35 years. Consequently, when these people leave the workforce, significant process knowledge may be lost.

Many end users take their standard operating procedures, and copy and paste this information directly into the DCS via the Info Text field. Other facilities are configuring these fields based on the recommended procedures of their most valued operators, ensuring everyone in the plant can benefit from their experiences and this process knowledge is not lost.

In addition, operators can add comments to individual alarms after they've acknowledged them. As a result, they can document actions taken to correct a situation, the root cause of the problem, or to flag it for maintenance personnel's follow-up attention. They can then sort this Comment field so maintenance can create an on-demand report of all the annotated alarms.

To make sure you give continuous attention to ensure effective alarm management, look for a DCS that offers several out-of-the box tools to provide alarm key performance indicators that engineers and operations personnel can use to continuously evaluate and improve the performance of the system.

About the Authors

Gary Goble is controls design manager at Precision Systems Engineering, an international gas and chemical producer. Todd Stauffer (todd.stauffer@siemens.com) is PCS 7 product manager with Siemens Energy & Automation.

Fast Forward

  • Industry trends in alarm management used to include the more-is-better syndrome.
  • Best practices now include prioritizing based on relevance, uniqueness, and timeliness of alarms.
  • Learn indicators of poor alarm management and five top ways to improve it.


Principles of DCS Alarm Management (IC36P)

Treat Your Water Right: Deep tunnel sewage plant gets supervisory control with HMI software

Overview of DCS Alarm Management (IC36PC)


Sugar producer gains competitive edge

By Ellen Fussell Policastro

The humidity of the South Pacific makes it ideal for cultivating sucrose, or table sugar. That's why the 37,000-acre plantation and adjacent factory at Puunene, Maui, is home to sugarcane grower and processor, Hawaiian Commercial & Sugar Company (HC&S), which produces nearly 200,000 tons of raw sugar annually.

The company also produces a fibrous residue called bagasse, a byproduct of processed sugarcane, as fuel to generate steam and electricity.

Looking for a way to further optimize its operations, the plant looked to human-machine interface (HMI) software to integrate with the Puunene plant's DCS system and electrical relays.

"With a large industrial plant, it is very easy when things are not integrated to make decisions that might not be the right answer for the entire operations," said John Rivera, a power management analyst at HC&S.

HMI is important to the plant's processes because "our operation is a totally integrated system, and nothing can operate without our HMIs," Rivera said. The most immediate benefit of the industrial application server was system changes, he said. The system started out with three separate HMI client applications-feeding the historian for sugar-plant operations, steam-plant operations, and power monitoring-with access to a combined 17,000 tags in a two-tier client/server topology.

Specific results from the new system were, "the ability to view the entire operational data from a PC," Rivera said, which allowed electrical generation to improve production "since information was available to all to view." Steam production data became most useful in energy projects to save money. And sugar factory operations became more aware of the entire operation.

One of the challenges in the transfer to the new system was "convincing the entire operations divisions how important it is that everyone see the entire operations and how more eyes on the problem will resolve issues faster," Rivera said. So far, most employees appreciate the work that occurred. "Now the operators can see the data at their finger tips, and the system online data allows the operator to make good decisions on operational issues," he said.

Rivera's advice to any process plants thinking of an HMI changeover is to "make sure you have someone in your organization that will champion the conversion process and will continue to improve the system. This process never ends since change is always good and organizations are always changing. It's better to have someone in your organization who knows what it took to do the process and maintain it."

About the author

Ellen Fussell Policastro is the assistant editor of InTech. Her e-mail is efussellpolicastro@isa.org.