1 October 2006
Industrial network integrity
New-era industrial network communications require fresh skills and tools
By Ian Verhappen and Eric Byres
If the reliability of the process rides under a veil of question and uncertainty, there is big trouble.
With industrial communications networks playing a critical role in today’s control systems, it is vitally important these networks have the highest level of reliability possible.
The increasing use of fieldbus protocols for communication to field devices and the expansion of the control network to the final field elements (i.e. the transmitters and control valves) means if networks do not achieve the expected degrees of reliability, the whole reliability of the process is questionable. That is big trouble indeed.
Reliability, regardless of where it is necessary, requires both high availability and integrity of the system—a system that fails regularly can be as bad for production as a system that sends the wrong data.
To meet this objective, it is necessary to have the right people with the right knowledge and tools to design and maintain a system. Look at where some of these resources reside.
Point-to-point on the outs
As long as there have been signals connecting one control device to another, there have been challenges associated with keeping the control system reliable.
For example, blocked pneumatic airlines, ground loops in 4-20 mA wiring, and noise on RS-232 circuits all reduce the reliability of the system.
Industry wants reliable to mean 100% availability, but we never really can achieve that. The lofty ideal of Six Sigma or 99.9999% availability will still result in about 30 seconds/year of downtime (99.99999% availability results in about 3 seconds/year downtime).
The expense associated with recovering from the unplanned 30-second interruption can be significant. For example, a 30-second outage on a fluid coker can result in a 30-40 day shutdown to clean the vessel, with the resultant loss of production.
By looking at some of the common maladies associated with control system signals over the years, one will be able to see how each subsequent generation of industrial communications reduced the impact of an earlier fault but added other concerns that needed attention too.
The oldest (but still widely used) control networks are the pneumatic airlines connecting transmitters to valves with local controllers. In some cases, signal lines also went back to a central control room.
The most common problems faced with pneumatic systems are air leaks, dirt, and varying system pressure. Air leaks result in wasted air but can also mean a depression in the control signal.
Dirt, which can include condensation and corrosion products, will plug up the various small orifices within the system. Besides damaging to the equipment, dirt also affects the operation of the device by effectively changing the size of the control orifice.
From the 1970s onward, pneumatic control gave way to analog (4-20 mA) current loops. This avoided all the issues associated with air, only to have other challenges appear, such as signal noise from other energy sources, ground loops, and calibration mismatches due to either configuration error or device drifting over time.
To overcome some of the limitations about calibration, smart instruments (typically with a HART interface) now have become the norm for analog control.
Smart instruments include device diagnostics to monitor the health of the device and two-way communications that make it possible to verify the information in the control system and the field device match.
However, this generation of instrumentation is still analog at heart, with point-to-point wiring and the limitations associated with only being able to communicate to one device at a time.
The last 10 years have seen the emergence of fieldbuses as the replacement for point-to-point analog or hybrid control systems. These buses incorporate the field devices themselves as part of the control network supporting true bidirectional communications with verification of all the parameters at both ends of the network.
Because of the environment in which field devices must reside, fieldbus systems typically operate at slower dial-up-modem speeds.
The next generation of field devices and networks based on Ethernet technology are now deploying. Most experts agree Ethernet is the high-speed communications medium of the future, no matter whether it platforms on copper cable, fiber, or wireless media.
To maintain the expected high degree of reliability, Ethernet protocols intended for deployment in the industrial environment (outside the control room) are typically restricted to 10Mbps or 100Mbps to improve reliability and ease installation in a typically harsh environment.
Number, diversity of devices
What is interesting about the above technologies, whether pneumatic or Ethernet, is they all need proper design from the start to be reliable.
This requires good engineering and the selection of high quality products that are also designed properly to work in the environment they will be used. For example, expecting a home Ethernet switch to provide reliable service on a plant floor is like expecting your garden hose to work as a pneumatic airline—it just is not realistic.
All these systems also need maintenance on a regular basis through a proper preventative maintenance (PM) program. Simply waiting until there is a failure is an expensive way to do business.
In fact, this has been one of the big issues causing troubles for technologies like industrial Ethernet. Many companies that have excellent PM programs for traditional technologies have failed to do the same for the new technologies. They then wonder why Ethernet (or Fieldbus, DeviceNet, etc.) is not as reliable as they had hoped.
A recent Worldwide Industrial Ethernet Survey by Network Vision Software indicated 57% of Industrial Ethernet users report infrequent (greater than one year between incidents) communication issues.
Conversely, 8% of respondents have issues at least weekly. Eleven percent of respondents indicated unpredictable, but rising issues may be a result of the number and diversity of devices adding into Industrial Ethernet networks these days.
This indicates there is a wide range of reliability experience with the same technology, pointing to local factors such as design, environmental, or maintenance conditions, rather than the technology.
Also of note is the phrasing of the Network Vision survey question. It used communication issues and not network problems.
The survey found properly configured networks are not usually the cause of communication disruptions, but rather a device specific problem such as a connector, cable, or a specific device lock-up were the cause of many disruptions. These types of problems occur over time and can typically be found through either visual inspection or field-testing.
IT has different culture
As each new generation of field control technology starts off, those wanting to take advantage of it must develop new skills to design and maintain the resulting systems or networks.
In the case of Industrial Ethernet, managers often say, “Ethernet is Ethernet,” so let’s use the skills we already have in our Information Technology (IT) department. This belief is partially correct: There can be significant synergies between IT and automation; but there is also a significant difference in the mental approach between network operations on the plant floor and in the office environment.
Beyond this list of differences are the expectations of reliability.
IT has a culture different from the factory culture and different risk goals. This means in terms of reliability, it can afford to be more experimental. This is not surprising; offices can often tolerate short network outages that would destroy a plant floor’s production capabilities.
As a result, there is a possible price to pay in terms of reliability unless automation professionals either take ownership of the Industrial Ethernet environments we deploy or make sure the IT department really understands the industrial requirements. Usually a cooperative combination between IT and automation is the best solution.
Some of the changes required on the maintenance front include not only new tools—of which many are software based and require the use of a laptop or similar device with specialized interface cards to the network—but also new skills, including items with awareness of how changing the network via the addition of one or more devices, including possibly the tool being used to troubleshoot the system, can impact the network.
Identifying the concerns, tools
As with all good engineering practices, the KISS, or keep it simple stupid, principle is still important.
If you keep the network simple, there will be fewer components to fail and fewer components to troubleshoot when something untoward happens.
The following statement summarizes the basis on which a reliable control system can be designed: “Proper design upfront trumps all—this is the plant floor, not an office environment.”
Communications systems must be as rigorous as those that industry uses for electrical design and follow established standards. Fortunately, standards such as TIA-568 TR42.9 are arriving that are a basis for proper network design.
Of course, once the system goes online, it is important to know what “good” looks like.
This is why one must gather baseline statistics once the system has commissioned and after every major system change.
Upon securing this baseline information, it is just as important to regularly monitor for changes against the baseline. This need not be a complicated procedure and with today’s asset management, process historian, and similar systems, one can largely automate this task. This information is the basis for your preventative maintenance program.
Preventative maintenance is critical for reliability. Fortunately, the built in statistics in most control systems are a great watchdog into the health of the system. Every modern industrial network, whether it is Foundation fieldbus, DeviceNet, Profibus, or any of the other numerous offerings, comes with the built-in ability to collect and display a number of key network statistics.
If you know how to interpret these statistics, you have an immensely powerful built-in troubleshooting tool that will cost nothing but will require a little pre-crisis planning and guidelines on how to interpret the results. Unfortunately, few technicians look at them until it is too late.
Most network-communications problems fall into three classes: noise problems, loading problems, and device/ software incompatibilities. If your system has commissioned and there have been no significant changes, you can typically rule out the incompatibility issues.
Because of our background in the analog world, we often think noise problems come into the network from an external source.
The simplest way to monitor if the noise level on your network is changing is to monitor the protocol’s framing error counters with names such as CRC errors, FCS errors, received with error, bad frames, fragments, and aborts. All of these indicate the device has received a message it believes contains one or more corrupted bits. As a result, the system discards the message, and the device waits for the originating device to retransmit the message again.
Other noise related problems can show up with names such as lost tokens, token failed, late collisions and poll failed. These occurrences indicate the protocol that manages node transmission sequencing (known as the media-access-control protocol) is breaking down.
The cause is usually a device that cannot hear its neighbor devices, and depending on the protocol, either sends an interrupt request or misses its turn to transmit.
Of course, once you have identified a problem, it may become necessary to use a variety of tools to determine the source of the problem so you can solve it.
The traditional multimeter still has a role to play, but now it mostly insures sufficient power is available to the device. In the case of the fieldbus protocols, there are a number of tools available that are specific to each protocol.
For example, noise may arise from both internal sources (e.g. bad connectors) and sources external to the network (such as a motor starting). A physical-layer tool can quickly identify these by analyzing either the waveform or another physical characteristic of the network.
There are increasing numbers of these tools on the market for nearly any field network you might choose.
Be aware that in order to analyze, repair, or work on a fieldbus device, the device must be live on the network, or it will not be functioning and communicating in its typical fashion. For this reason, many facilities have a small fieldbus network in their maintenance facility.
If the protocol analyzer does not provide the degree of information required in most cases, it then becomes time to analyze the raw data, which in most cases also requires a high degree of knowledge to interpret.
The tools this person will likely use includes an oscilloscope and Time Domain Reflectometer to isolate the individual packets and area in the network causing the symptoms. The knowledge is what helps convert the data gathered into a solution.
Because the lower layer protocols are the same, troubleshooting of the industrial Ethernet networks uses many of the same tools that work in the IT space.
However, the user must be careful the tools are not invasive or introduce additional loads onto the network for if they do, that could exacerbate the problem and even cause the system to fail.
As well, some management solutions (especially those based on version one of the Simple Network Management Protocol) can become a security issue unless proper attention is paid to the design and implementation of the network and the maintenance procedures.
Borrowed heavily from IT
If a system is not secure, how do you know it is reliable?
Security is comprised of ensuring confidentiality, integrity, and availability.
How do we make sure our system is secure? It certainly is not to ignore the whole cyber security problem and hope it goes away. The IT department is not the enemy; it is a very useful ally.
The answer lies in understanding the industrial control world has already borrowed heavily from the IT world, making technologies such as Windows, TCP/IP, and Ethernet our own. Now we need to borrow their security technologies and practices, but modify and use them properly in our world. It also lies in clearly understanding how our assumptions and needs differ from the IT world.
ISA offers assistance in sorting out what works and what does not with its ISA-99 Manufacturing and Control System Security series of standards. There are four proposed standards in this series, each covering a specific aspect of the subject of Manufacturing and Control Systems Security:
ISA 99.00.01 – Scope, Concepts, Models and Terminology
ISA 99.00.02 – Establishing a Manufacturing and Control Systems Security Program
ISA 99.00.03 – Operating a Manufacturing and Control Systems Security Program
ISA 99.00.04 – Specific Security Requirements for Manufacturing and Control Systems
The first two will be out in fall 2006.
There are also two technical reports in the series: ANSI/ISA-TR99.00.01-2004 Security Technologies for Manufacturing and Control Systems and ANSI/ISA-TR99.00.02-2004 Integrating Electronic Security into the Manufacturing and Control Systems Environment.
These two are available today and can assist in the design of control systems to minimize the potential for security related events that negatively influence system reliability.
The new era of industrial network communications is an exciting time. We must augment this excitement with new skills and tools to make our technologies work as reliably as is required for factory automation.
Fortunately, many of these skills are complimentary to the ones control engineers already have for selecting and designing control systems and field components. By learning the additional new skills, you will not only make yourself more valuable to your employer, you will also be improving the reliability of your operation as well.
ABOUT THE AUTHORS
Ian Verhappen (firstname.lastname@example.org) P.E., is an ISA Fellow, ISA Certified Automation Professional, and director of industrial networks at MTL Instruments. Eric Byres (email@example.com) P.E., is an ISA member, director of industrial security at Wurldtech Research Inc., and chair of ISA-SP99 Working Group 1 Security Technologies for Manufacturing & Control Systems.
Worlds in Collision—Ethernet and the Factory Floor www.isa.org/link/WCollisionpdf
Industrial Data Communications (TS05) www.isa.org/link/TS05
The Myths and Facts behind Cyber Security Risks for Industrial Control Systems www.isa.org/link/cyber_myth_fact
Worldwide Industrial Ethernet Survey http://www.intravue.net