September 2008

Automation IT

Sergeant at alarms

Why operations-owned alarm management is best bet for plant safety


  • Operations should take ownership of alarm systems.
  • Processes, tools push success.
  • Results: Better response, overall system.
By Claire Hackney

Most alarms are there to help ensure the integrity and health of operations. Some alarms are required for regulatory agencies, and some are needed for system maintenance as well as those that can only be addressed by specialty groups. Yet, the vast majority of alarms are designed for operations response.

Since engineers usually are not interested in lengthy meetings, and documentation and rationalization of alarms normally require long discussions, operations can address these alarms and only require meetings for controversial issues, leaving the minor changes to an automated process for computer-based documentation of the approval process. And complete alarm documentation can eventually result for any organization.

Operational issues are normally reflected in the alarm system. That means identifying procedures for a business' response to alarms addresses the mechanism for response to operations issues. Engineers do not want to work on alarm management; but they do need to develop processes so operations can fill out forms and drive the management of alarms. The approvals should come from engineers, but the management should come from operations. In response to this need, we should take a closer look at operations-owned alarm management.

Operations should own alarms

Operations needs to take ownership of alarm systems. That is the main idea behind operations-owned alarm management (OOAM). They define the need for the valuable alarms and identify unnecessary alarms. They identify the alarms that need to be disabled when the equipment is out of service and the alarms that need to be kept at the same time. They troubleshoot the issue as much as possible using the resources at their disposal. They then either resolve the issue or define the group within the organization that can. The appropriate support personnel then review this identification process and process it for approved changes.

Another essential part to OOAM is the success of the program. Training and resolution throughout the organization should assure once operations has done its job, the organization will respond appropriately. Maintenance will occur in a timely fashion according to priorities. When there is a discrepancy between the perceived priority and the available maintenance, discussions should occur with operations, and they should be involved in the process all the way through to resolution. Maintenance should not define the technical completion of the issue; this should happen with the operator that initiated the work notification.

Maintenance is not the only part of the organization that needs training. Engineering also needs to resolve issues in a timely fashion. Engineering issues are not quick fixes, but they need to prioritize the process and keep a list so operation personnel are confident their work is not in vain, and they have support.

Training also needs to occur in operations management concerning their role in alarm management. Ignored alarms should be a thing of the past. Until changes can occur, we should respond to all alarms as if they are identified correctly. Otherwise, any response loses significance. That includes alarms limiting production and alarms at incorrect limits. If an alarm is currently configured to require a response, take it. This will drive the organization to make the appropriate changes to alleviate incorrectly defined alarms.

Principles for success

OOAM requires operations to deal with potentially thousands of configured alarms and their details. Thus, success depends on several principles. Any tools the operators use in documentation need to be easy and efficient. They must be able to capture essential information quickly and accurately. Simply seeding documentation forms with similar alarms is useful. Drop-down boxes help in ease of use, accuracy, and consistency.

OOAM in action involves issue identification and prioritization, alarm valuation, out-of-service equipment, operator troubleshooting, and shelving programs.

Out of service

Of course, you will need to consider conditions when equipment is out of service for an extended period of time to maintain adequate alarm management. It is best to create out-of-service lists per piece of equipment at installation. However, since such engineering is not the norm, you will probably have to create these lists as needed.

Alarms might be necessary even when a piece of equipment is out of service. The high-level alarm on a boiler would see use even when the boiler was out of service if the equipment was not blinded away from the steam system to avoid running condensate into the steam header. In this case, it is important to ensure appropriate people review the out-of-service list; this includes the responsible party for unit incidents.

When taking alarms out of service, you should provide for these alarms coming back into service when unit conditions indicate equipment is going back in service. You should provide a minimum number of triggers and place code in the DCS that allows for monitoring instrumentation to ensure alarms are fully functioning as needed. Provide for nuisance alarms during startup with some type of delay logic built into the program. This should be carefully engineered, however.

When dealing with alarm disabling and enabling, alarms need to become available in the most conservative conditions. If the code is not placed in a redundant location, a watchdog code should be in place in another application on the DCS to monitor the health of the primary system and reactivate the alarms should the health of the primary system be in question.

Operator troubleshooting

Operator functionality varies from location to location. However, you should require a bare minimum of troubleshooting for instrumentation before operations can delegate an alarm issue to some other aspect of process support. It is crucial to the health of the site's support structure that operations thoroughly discern whether process conditions are causing anything to alleviate the alarm, such as turning on a pump, or draining a knock-out pot. You can normally group these troubleshooting techniques into flow, level, temperature, pressure, and analyzer indications.

Sites should define troubleshooting for each of these indications, but you should give special consideration for level indication troubleshooting. Level indication troubleshooting varies from site to site based not only on different roles and responsibilities, but also on the availability of sight glasses. In any unit where sight glasses are installed, include maintaining their functionality in the normal rounds of the outside personnel. It is unacceptable for sight glasses to be installed and unavailable when needed. Establish sight-glass maintenance procedures along with full explanations of any specialty equipment associated with these installations, for instance blow out protection valves.

For any computer applications where operations is simply monitoring the health of some equipment,  the response is to call someone out who is performing a non-operations role. An alarm with the highest priority triggers operations to call someone out immediately. An intermediate alarm priority indicates a corrective action that varies upon when the personnel will be available during normal business hours. The response to this alarm might vary upon occurrence on a Friday evening as opposed to a Sunday afternoon. An alarm with the lowest priority has the corrective action of informing the appropriate personnel that the alarm has occurred and should receive a response at some time in the future. It is unacceptable for parties outside operations to not respond to alarms that are on the system in some defined time frame. This response might be tracked with a key performance indicator for quality assurance.

Shelving, delegation

Now that we have determined the alarm has a defined priority, is not a temporary issue (out of service), and cannot be alleviated by the console or outside operator, the path forward entails two aspects: shelving and delegation. Shelving programs allow operations to disable alarms while addressing an issue. The program, by procedure, should not be available to ignore issues, but should only be used for issues that have been through the entire operations-owned rationalization process.

Therefore, if 1) an alarm rationalization form exists for a piece of equipment, 2) the associated equipment is either in service or an out-of-service equipment form is undergoing process, and 3) operations has already troubleshot the issue and cannot alleviate the alarm, there should be a mechanism available for operations to disable the alarm while the delegated party is processing it.

You should time the shelving in some fashion, and identify and study processing for issues that stay in the shelving program too long. Even though alarms are disabled, they still go to an alarm journal in some DCS systems. So if the alarm is an extremely bad actor, shelving may not be the solution. You might want to consider alarm inhibition. For these cases, it is important to get DCS system support involved.

Corporate obstacles to OOAM

Since change is the only constant in today's business environment, OOAM needs to be ingrained as everyone's responsibility throughout the organization. The alarm system should correctly reflect every change. The more front-end loading, the less symptoms of poor alarm system performance will be exhibited. There is no way to ensure any new process or change is not going to impact the alarm system. However, through proper engineering (operations involvement through engineered procedures), you can consider all changes before installation and attempt a first pass at properly designing the impact.

Consider alarm management before every change. Expense and capital projects need to address potential impact on the alarm system, and the process needs to be in place to ensure the alarms are appropriately valued and documented before changing the process.

Maintenance budgets can also be an obstacle to proper alarm management. That is why the alarm valuation forms need to put the issue in perspective by being closely associated to the risk matrix operations uses to determine any maintenance expenditure.

Cutting rates according to poorly designed alarms is a travesty. However, it is a self-inflected issue and can be resolved at whatever time frame the organization determines is the correct priority. Emergency changes can follow appropriate procedures and ensure proper alarm management procedures.

Potential benefits

When an organization fully implements and supports OOAM, it can realize the following benefits:

  • A complete alarm data base
  • OSHA-required documentation of causes, consequences, and corrective action for critical alarms
  • Proper prioritization of operational issue resolution by operations management, maintenance, and the engineering organization
  • Satisfied operations personnel since a mechanism is in place for them to address issues
  • Lower maintenance budgets since more maintenance work notifications are appropriate and truly require maintenance personnel because of better operational troubleshooting
  • Better response to a well functioning alarm system
  • Experience does not leave the organization without leaving behind some clues to their response to issues

Claire Hackney ( is a process control consultant in Houston and chemical engineer with 25 years experience in the industry.

First ISA alarm standard moving ahead

By Ellen Fussell Policastro

The ISA18 committee on alarm management is forging ahead on its first standard for management of alarms for modern control systems. The committee is in the process of reviewing comments and said it hopes to publish the standard late 2008 or early 2009.

Some of the key items are the introduction of the alarm management lifecycle, the definition of processes for suppressing alarms, and the requirements to monitor the performance of the alarm system.

"The field of alarm management has benefited from some good practices, developed by organizations like Engineered Equipment and Materials Users Association (EEMUA) and Abnormal Situation Management Consortium, but very few standards have been developed to help user companies with a set of requirements on the minimum alarm management practices," said ISA18 co-chair Nick Sands, process control engineer at DuPont Kevlar in Newark, Del. "This standard will provide the guidance for which many companies have been asking," he said.

"There is a misconception in industry that the EEMUA document is the definitive standard on alarm management," said Donald Dunn, ISA18 co-chair and consulting engineer at Aramco Services Company in Houston. "EEMUA is not a standard body, and the document is at best a best practice/book report on alarm management. Thus, this work by [ISA]18 is the first standard on this topic by one of the recognized industry standards bodies."

None of the internationally recognized standards bodies has created any standards on the topic of alarm management. Industry has been requesting guidance on this topic for many years. If you review any of the major petro-chem incidents over the past 15 years, alarms within the facilities have been one of the key findings from the incident investigation.

Judging from the volunteers on the committee, those most interested in the standard include refining, chemical, pharmaceutical, power, food, and water companies, Sands said. "While our volunteer committee is a good judge of its reach, any industry that utilizes alarms will be able to find substantial benefit from utilizing this standard," Dunn said.

Most of the comments in the last cycle were on the alarm philosophy, specifically the alarm classification section, and the human machine interface design, specifically the displays section, Sands said. "These clauses specify the expectations for a sites alarm philosophy on alarms that require extra attention through the life cycle and how alarms are displayed on alarm summaries and some other displays."

In fact, alarm management lifecycle is what defines ISA18, Dunn said. "The concept was not utilized in any of the best practices and is ground breaking within alarm management."

The subcommittee has addressed all comments so far. "We review the comments on each clause in a subcommittee and then review any rejected comments or comments that need discussion, in the editors committee," Sands said.

Within ISA18, all major clauses have more than one clause editor that works to resolve comments. "We then review all of those decisions in the editors committee conference calls where we seek consensus on the sub-groups decisions," Dunn said. "No one person or one company has undue influence on the efforts of [ISA]18; it is a consensus effort."