From managing to optimizing alarms
Should there be fewer alarms, more alarms, or just the right number? Look to ANSI/ISA-18.2
By Richard Slaugenhaupt
If you have been doing much research on alarm management lately, you will have found that most of the discussion centers on reducing the number of alarms control room operators must deal with in a normal day or during a process upset. This trend follows the conventional wisdom that too many alarms are enabled simply because it is so easy to do so in computer-driven process automation systems. It presupposes that many alarms are poorly chosen or too easily triggered.
As we are often reminded, adding an alarm in the days of pneumatics and panel boards involved some effort and cost, so there had to be a compelling reason. As a result, the selection process was largely self-regulating. With the advent of modern control systems, it was much easier, perhaps too easy, to make alarms for everything.
Soon, operators faced dozens of alarms during an upset, and were often unable to determine which were important and which were not (figure 1). The alarm management industry was born out of a need to bring this problem under control. Early efforts generally assumed there were too many alarms and they needed to be evaluated one by one, getting rid of those causing more trouble than they were worth.
With virtually every major automation supplier in the game, numerous applications have been developed to tame the alarm problem. These generally follow the same principal goal of reducing the quantity and rate of alarms to which operators must respond. Nonetheless, the focus on these topics only addresses part of the goal. The ultimate objective is not an optimal number of alarms, but instead to create an environment where the alarm system provides the most critical information the operators need at any given time and in any possible situation.
Figure 1. Control room operators are
often faced with an overwhelming
number of alarms, courtesy of modern
Help from ANSI/ISA-18.2
In 2003, ISA began developing a standard to give more direction to alarm management. After six years of work, the ANSI/ISA-18.2-2009, Management of Alarm Systems for the Process Industries, standard was released, and that version has recently been updated. This standard departed from ISA's normal nuts-and-bolts approach and focused on a broader vision of work processes rather than mechanics. It recognized that alarm management has a significantly human component. Although systems and mechanisms are involved, most of the processes are driven by people.
Building on the prior work of the Abnormal Situation Management Consortium (ASM), the Engineering Equipment and Materials Users Association (EEMUA), and NAMUR, the petrochemical industry has now been using this standard for nearly a decade. It has recently been getting tremendous traction in other industries as well, and should be required reading for all process automation professionals. The standard focuses on alarms found in modern process automation systems like distributed control systems, supervisory control and data acquisition systems, and programmable logic controllers. Its scope targets all varieties of process manufacturing methods-batch, continuous, and everything in between-so its application is universal.
Because ANSI/ISA-18.2 is a standard, and not just a guideline or suggestion, it carries the weight of being "recognized and generally accepted good engineering practice." This means that groups such as the Occupational Safety and Health Administration, the Chemical Safety Board, and the American Petroleum Institute use it as a yardstick when evaluating systems and investigating events.
Understanding the alarm life cycle
Traditional and ANSI/ISA-18.2 methods are similar in some respects, since both are about reducing the number of alarms. The former approach starts with many alarms and then reduces them to a manageable number, while the latter approach builds from the ground up with a heightened sense of selectivity. ANSI/ISA-18.2 changed prevailing thought by setting a higher standard for implementing an alarm, based on the notion that if more thought is given to selecting alarms in the first place, there should be fewer of them. It also clarified the alarm rationalization process as just one step of a holistic life-cycle approach, beginning with the idea of only creating alarms where the process and safety considerations call for them.
The 10-step life cycle has a few starting points, with one of the most common being establishing a foundational philosophy, and it continues through all the other steps needed to properly manage alarms-from identification to maintenance (figure 2). While other articles on the topic can provide broader detail and a more general description of ANSI/ISA-18.2, this article will focus on the practical side of alarm management: design and implementation.
Figure 2. ANSI/ISA-18.2 established a framework for developing and working with
alarms using a systematic approach.
Too many, too few, or just right
The ultimate objective for a process facility or individual unit is not to have a specific, fixed number of alarms, but rather the right number for the current circumstances. The concept is simple, but implementation is more complex. The underlying concern about having a liberal number of alarms is that too many will activate at the same time and overwhelm operators. During this "alarm flood," they will not be able to discern important alarms from duplicate or irrelevant ones. As a result, situational awareness is compromised, and operators can make bad decisions or even escalate an incident instead of defusing it.
Running counter to the less-is-better idea is the proliferation of new tools that result in even more alarms. Many vendors provide code libraries that enhance (increase) alarm generation through built-in diagnostic features. Many of these alarms can be identified as the root cause for more generalized undesirable process events, and as such should not just be eliminated during rationalization. Instead, these valuable warnings can and should be added to the alarm pool.
These alarms can be early warnings of developing critical conditions, and therefore serve a valuable purpose. But their timing is also indiscriminate, because their trigger condition is generally narrow in scope, such as a sudden loss of signal reported by an analog input instruction used to monitor cooling flow. For this reason, proper application requires some type of context-based filtering to restrict their occurrence under wrong or unhelpful circumstances.
The use of diagnostic alarming functions has been controversial for some time. If a pressure transmitter is showing signs of degradation in its electronics, it can detect the problem and send a warning. But to whom? The control room operators? Maintenance personnel? Naturally, the answer is, "It depends."
One of the most basic qualifications of a legitimate alarm is that the operator must be able to do something to fix or mitigate the situation. So, if that pressure transmitter is providing a very critical reading, it probably merits a control room alarm, even though the operators are not likely to be the ones going out there to fix it. It might cause them, however, to implement a backup solution or compensate for the loss of information while the device is being replaced by maintenance.
Now, multiply this scenario by all the smart instruments and actuators in the plant or process unit. If every one of the potentially hundreds or thousands of devices that can cause an alarm is configured to do so, there could be major ramifications. The easy solution is to direct those diagnostic messages to maintenance, but this solution is shortsighted if something is truly critical. Returning to our example, say the pressure transmitter does fail. A process upset may result, which would trigger other alarms, but the situation would have advanced well past where it could have been avoided by earlier action.
To further complicate the situation, the degree of criticality of the pressure transmitter may vary depending on what is happening with the process. During a procedure or particularly critical time in a batch process, the threat of lost or questionable data from the device might be very important to the operators. At other times, they may not care as much.
This brings us to an important realization: many alarms are more useful or critical at some times than at others. The objective is making those critical alarms active at the times when they are most needed, and suppressed when they are not.
Here is another example: a compressor provides pressurized air to a variety of devices around a process unit, such as valve actuators (figure 3). Each of those air-dependent devices verifies that the air supply is adequate, and will trip if the pipe is clogged or someone inadvertently closes a valve. Now assume that the compressor motor fails. It has its own alarm to warn operators and maintenance personnel that the air supply has been lost. The operators may need to open a valve to borrow air from another part of the facility while maintenance troubleshoots the compressor.
Now with the root cause already addressed, does the control room need a dozen or more alarms from all those air-dependent devices, each reporting some sort of failure? Followed by yet another round of alarms from the systems using those devices? Clearly not. Once the common cause is announced to the operators, subsequent related alarms should be suppressed, so as not to distract the operator from the real problem.
Figure 3. A single event can become a common cause, launching a flood of individual
alarms as the results ripple through various systems.
Types of suppression
ANSI/ISA-18.2 defines three forms of alarm suppression:
- Shelving, where an operator manually suppresses an alarm temporarily.
- Designed suppression, where the process automation system suppresses an alarm based on a specific set of conditions.
- Out of service, where an alarm has been suppressed because a portion of the equipment is shut down for maintenance or some other reason.
Designed suppression is the most interesting and challenging, and it is typically divided into two categories: static and dynamic suppression. Static suppression is based on the state of the process and equipment. Specific alarms are enabled or suppressed during defined procedures or conditions. For example, some alarms may only be enabled during a unit startup. This simple technique is the most commonly implemented of the two suppression types.
Dynamic suppression is designed to avoid alarm floods during upsets and other more complicated scenarios. It gives the system enough "intelligence" to determine the most important alarms under the circumstances, making sure they get to the operators, while suppressing unnecessary and irrelevant alarms.
Dynamic suppression is the most challenging, because it requires creating the rules the system uses to determine what is important and what is not. And like a safety system, it must have the correct sensors to detect the conditions upon which those rules are based. In some respects, it can be even more complex than a safety function, because hundreds of alarms may be affected and many factors can enter into the decision-making process.
Very complex undertaking
Like safety systems, alarm management requires knowledgeable specialists, particularly when moving into areas as complex as dynamic alarm suppression. New tools have emerged on the market to help with the design, implementation, and maintenance of such systems.
The waters can be muddied further by the subjective nature of the evaluation. Overwhelming an operator with too many alarms at one time is clearly detrimental, but the threshold for what constitutes "too many" depends on the urgency, importance, and complexity of the required response to the event. The skill level and experience of individual operators must also be considered.
Companies can either invest in developing in-house alarm management personnel through training programs, such as those offered by ISA, or utilize experienced consultants and system integrators, who can assist with alarm management programs, providing guidance and assisting with implementations as the situation requires. Developing and maintaining in-house personnel is a long-term investment decision that needs to be consciously made. Using experienced consultants and system integrators to help create and maintain alarm management programs, providing guidance and assisting with implementations as the situation requires, is another investment option to consider.