Bookmark and Share
01 February 2004

Protecting your data flow in an IT-driven world

How configurable redundancy makes a hybrid control system the cost-effective choice for high-availability monitoring and control.

By Sal Provanzano

Everybody wants to be famous, but no control engineer wants to see his plant become the lead story on the six o'clock news. For a long time, preventing this kind of unwelcome publicity was the primary design consideration when installing a monitoring and control system.

Things have changed.

Whether it is on the oil platforms in the Caribbean or in a nuclear power plant in rural Georgia, the enterprise network has extended its grip from the information technology (IT) department down into the operational levels of the business. And as the network grows, plants rely more and more on the steady and uninterrupted flow of accurate information.

It used to be that the engineer responsible for monitoring and controlling a critical or dangerous process only needed to make sure that operators always had a way to see what was happening in the field—otherwise they would be flying blind. Then he had to provide a fault-tolerant system to control it, no matter what upsets occurred. Of course, he always had to deliver this in a cost-effective manner.

Now—despite massive reductions in personnel, demands for ever higher productivity, constraints on capital expenditure, and ever more complicated networking technology—management also demands an uninterrupted flow of data from every point in the enterprise.

We see a system that meets the challenge of providing data in a secure and predictable fashion as one that has high availability. One approach to ensuring high availability is simply to duplicate system components so that if one fails the other can take over, but specifying a system that is both reliable and cost effective poses a real challenge.

Historically there have been multiple reasons for specifying a high-availability system—usually focused on monitoring a particular process. The obvious reason, of course, is to protect process integrity at all times so that productivity and safety remain in place. Regulatory compliance can also require a high-availability system. In nuclear power plants, industry regulations specify redundant monitoring. When proof of pollution abatement is required, gaps in environmental monitoring lead to expensive fines, so it is mandatory to select a system with a high availability. Harsh environments are another reason for installing a system with a high availability, because loss of control in these circumstances can have disastrous effects on safety and capital investment.

Hot standby software flow

DEMANDING REDUNDANCY LEVELS

Terminology

Fault tolerant System architecture capable of sustaining reliable operation in the presence of a fault by providing additional levels of redundancy

Redundancy A configuration that provides two or more backups in case of a component or network failure

Hot standby A backup processor that processor monitors operation of the primary processor. If it detects a failure in the primary, it automatically as-sumes control of all functions including PID, logical operations, communications to SCADA, and directing I/O, circumvents failure

Today, to ensure protection of the enterprise's data flow, project specifications increasingly demand redundancy at multiple levels, no matter what the job. When bidding on control systems for multiple oil drilling platforms in the Cantarell field, for instance, PEMEX required all bids to address redundancy at the level of the I/O, the controller, and even interplatform communications. In particular there were very tight limits on how long interplatform communications could take.

In the past the logical choice for a job like this would have been a distributed control system (DCS). These have a proprietary architecture, involve extensive engineering, and are tightly bound to one particular process—a single refinery, for example. More recently, programmable logic controller–based systems linked together by supervisory control and data acquisition (SCADA) software appeared to be a potential replacement for the DCS, but the engineering costs and difficulty of configuring these for fault-tolerant, redundant operation have made them a less than ideal choice for many mission critical applications.

The solution that won the Cantarell project was a hybrid control system (HCS). Although these are not exactly new, many engineers mistakenly regard the HCS as a recent arrival. In fact, they have won broad acceptance for both mission critical applications and now for the growing number of other monitoring and control applications that require rock solid data collection.

NUKES PROVED HYBRID CONTROL

The nuclear industry knows and uses hybrid control systems around the globe—currently they are in hundreds of plants worldwide.

The HCS initially attracted nuclear users because it offered the same security with greater flexibility than a comparable sized DCS, often at about one-tenth the cost. Rebecca Miller, control engineer at the Hatch Nuclear Plant in Vidalia, Georgia, found other benefits to using an HCS, "In an out of the way place like Vidalia where the chief recreation is deer hunting, we tend to have a high turnover of operator staff," she explains. A new HCS on the Hatch boiling water reactor meant professional staff could concentrate on important design issues rather than training operators. An easy-to-learn, Windows-based user interface and the availability of training for commercial-off-the-shelf (COTS) software means that she spends less time on training and more on engineering. However, a principal reason that Hatch uses an HCS is reliability: the system availability is 99.997%.

COMBINES MODULAR HARDWARE

Mission critical

Ensuring that operational data is continuously collected and available becomes increasingly important as the plant becomes ever more tightly integrated with the enterprise. This treatment of the hybrid control system shows how its configurable redundancy can provide a highly cost-effective solution for creating a high-availability monitoring and control system. At each level, implementing redundancy strategies can protect mission critical functions and ensure that operators never lose process visibility. Only those functions that are truly vital need backing up, but an HCS can scale to any size without losing its ability to provide data and control free from interruption.

For those not familiar with them it may be useful to describe the architecture of a generic hybrid control system without reference to any particular model. The ideal HCS is a chassis-based system, consisting of one or more chassis linked by a bus or local network. You can configure a chassis as either a target node, housing the controller processor and a limited number of I/O cards, or as a straight I/O chassis. The controller processor, which contains the run-time executive software and downloaded control application, manages the I/O communication channels, data transfer, and processing proportional, integral, derivative (PID) loops; performs I/O scans; solves logic, and communicates with the SCADA/human-machine interface (HMI) operator stations.

Elements in each chassis can include a processor, interprocessor communications connection, provision for mounting digital and analog I/O cards on a backplane, dual I/O communications connections, and one or two power supplies.

Networked to the HCS proper will be servers that support the system's tag database, some provision for creating and hosting its control logic and PID control, an alarm system, an operator interface or HMI package, a historian or data archive, the system clock, and (optionally) an OPC/DDE server. Many of these are generic COTS solutions—some even provided free by vendors—and most of them except the tag database have little potential impact on system reliability. The operator interface or data archive, for example, can be failureproof by the simple expedient of running a backup system.

An HCS, which can use white box hardware and COTS software run on an open operating system such as Windows, has a major cost advantage compared to a DCS.

INHERENTLY MORE ROBUST

Because it is hardware-based, an HCS can be inherently more robust than a software solution. Equally important, its inherently modular and flexible design means that an HCS can provide the exact amount of assured availability that is needed to protect mission critical operations in a cost-effective manner. Its economical cost and the widening demand for uninterrupted information now make the HCS attractive for more applications than those that previously used fault-tolerant solutions.

For deterministic monitoring and control, an HCS can be configured to provide exactly the desired level of fail-safe backup by the judicious combination of:

  • Redundant network connections
  • Hot standby processors
  • Redundant power supplies
  • Redundant I/O connections
  • Redundant I/O chassis and cards

Scaling up typically requires the simple expedient of adding more controller and I/O chassis. This simplifies building redundancy into the system.

In theory, any element in a monitoring and control system can fail. Parts on an HCS that might theoretically fail include any component from the field instrumentation connection all the way up to the enterprise network connection. These elements include the I/O cabling and individual I/O cards, backplane, power supply, and processor and network connection. Any or all of these can be redundant.

Individual I/O cards can be redundant within the same I/O chassis, or the entire module can be backed up for automatic switchover. A watchdog timer in the switch monitors the local network connection between the I/O chassis and the target node and automatically switches to the backup connection if it detects a failure. You can configure a target node with a backup power supply. Processors connected by an interprocessor link—either within the same chassis or in separate chassis—can operate in hot standby mode, constantly checking each other's operation, with the secondary ready to seamlessly assume control if the primary fails.

HOT STANDBY BUMPLESS TRANSFER

Technical initialisms

COTS commercial off the shelf

DCS distributed control system

DDE dynamic data exchange (protocol)

HCS hybrid control system

HMI human-machine interface

I/O input and output

OPC OLE for process control (software)

PID proportional, integral, derivative algorithm

PLC programmable logic controller

SCADA supervisory control and data acquisition

An HCS with leading edge technology provides hot standby protection in the event of a primary-side controller failure. In a hot standby configuration, one processor operates as the "primary," executing the project program and controlling communications with the I/O and the host. The other processor operates as the "secondary" or backup processor. The secondary processor monitors the operation of the primary and synchronously executes the same project program.

A hot standby system is typically equipped with dual processors, power supplies, target node controllers, independent buses, I/O cards, and network interfaces. The two processors synchronize such that by using a unique hardware design or by networking they can provide a smooth—no bumps—transfer to the standby processor. Four watchdog timers continuously monitor the integrity of the entire system from the I/O to the network communications. The backup processor continuously operates in hot standby mode, monitoring the operation of the primary processor. If the secondary detects a failure originating in the primary, it automatically assumes control of communications with the I/O system and host computer. By providing fail-safe redundancy, a hot standby configuration circumvents potential failures, minimizing process downtime and costly delays.

A fully redundant HCS needs a communications card with dual connections to the I/O. This is an important feature for ensuring high availability. If the system detects a failure in one of its I/O networks, it will reroute the I/O communications through the other I/O network switch. Whenever the primary processor accesses an I/O card, it determines that the card is installed and powered up, and resets the card's watchdog timer. If a card does not respond, the processor switches to whichever redundant I/O card is ready and able. The watchdog timers will time out, electrically disconnecting the outputs so that the redundant I/O card can take over the outputs.

Diagnostics also play a large role in protecting the HCS against failure and making it cost effective. Rapid detection of failure enables the system to switch to its backup option immediately with no loss of data. There are dozens of types of specialized cards that handle I/O in an HCS; in general they can be categorized as digital input and output, analog input and output, and specialty function cards, e.g., Modbus communication cards.

A hot standby HCS uses four levels of watchdog timers to ensure system integrity:

I/O watches the CPU. There are watchdog timers in all output cards. If the control processor does not talk to an output card for the prescribed period of time, the output card shuts down.

CPU watches the I/O. At the next level the primary CPU talks to an I/O card. If there is no response, it switches to a backup redundant I/O card.

Secondary CPU monitors the primary. The standby processor and the primary processor rendezvous at a prescribed time to exchange signals. If the secondary cannot detect the primary's signal, it assumes control.

Secondary CPU monitors the communications network. If the standby processor detects one or more communications that the primary processor does not respond to, it takes over control.

An HCS can also reach out to test the connected instrumentation and perform diagnostics on itself to ensure its accuracy. A typical example would be a thermocouple card that exercises the entire analog input path with its own integral voltage source and reports any failure it detects.

Modular, open-system design makes an HCS highly cost effective for creating fault-tolerant applications, because it can easily interact with solutions provided by other suppliers. For example, system activated notification—e-mail, pager, or other wireless alarming—makes it possible to get a technician on site immediately to replace a failed card or an entire chassis and ensure that the data flow continues without loss. IT

Behind the byline

Sal Provanzano has degrees in electrical engineering and business. He has more than thirty-five years of experience in the design and manufacture of PLCs, DCSs, and HCSs. He is the president of RTP Corporation. Write him at sal.provanzano@rtpcorp.com.


Return to Previous Page

Read questions answered by our experts or join the email list.