01 December 2004
When Things Go Wrong
Safety standards, inside security crucial to avoid plant floor SNAFUs.
By Ellen Fussell
Keeping a plant productive and running smoothly is difficult enough in today's economy, but when things go wrong—whether by accidents or sabotage—manufacturers have an added challenge. Following the development of new standards and being aware of what can happen are a couple of tactics to head off disaster before it strikes.
An equipment failure in the operating unit could propagate into a major loss of containment with impact to people or the environment. The purpose of the ISA-84 standard is to provide work process and requirements for what manufacturers should do—appropriate redundancy, voting, diagnostics, operation, and maintenance. Voting refers to having multiple instruments in the field. "You could have one transmitter measuring pressure, but if it failed you'd have nothing," said Angela Summers, president of SIS-Tech in Houston. "So with potential catastrophic events, you'd have more than one transmitter, so you could vote one out of two or two out of three to shut down."
The purpose of voting is to get fault tolerance. "When you look at a safety instrumented system, it's a barrier. So you have a process upset occurring," Summers said. "Pressure is building, and if it continues you'd have a rupture. The sensors are measuring the pressure to take action on the process. How many shutdown actions do you need to prevent buildup of pressure? The worse the severity of the catastrophic event, the more devices you'll typically have in the field."
To help manufacturers update their systems, the ISA SP84 committee has released a new standard, which builds on the international standard IEC 61511. It's important because it doesn't just cover the design; it covers the operations and maintenance as well, Summers said. "Other industrial standards stop at design. But if you don't maintain the system, it doesn't matter how many devices are installed. To keep things in good working order, you need to do maintenance—just like with cars. If you don't change the oil, it'll break down."
A boiler explosion in Algiers this past year did extensive damage to one plant. On 20 January 2004, a defective high-pressure steam boiler ruptured, killing more than 20 workers and injuring more than 70. A foreman in a storage depot at the complex heard "strange noises and abnormal vibrations coming from a boiler and valves before the main explosion," said an Agence France Presse article. The foreman said specialists had filed a report "more than a year ago" saying the boiler had defects, and the plant had completed "superficial repairs."
Older boilers like this are "always an issue any time you have a new standard come out," Summers said. "When you have existing equipment, at what point do you look at a new standard and existing equipment and find deficiencies and work toward rectifying them?"
To prevent some of these incidents caused by process upsets, Summers said the SP84 committee is documenting the good engineering practices used throughout industry. When an instrument or piece of equipment fails, it can result in the process going beyond what's considered normal operation. The committee is documenting the good engineering practices to address the potential hazards from abnormal operation. "From what I read about the explosion, if it had been reviewed under the new ISA-84 standard, they would have found deficiencies in the design. And if those had been corrected, then the explosion would not have occurred," she said.
Other incidents result from plain old human error. "The ones we usually find out about are the big incidents, and it turns out that someone made a mistake," Summers said. "You can't write an industrial standard to prevent people from making mistakes. You can only write standards to reduce the potential for mistakes."
Differences in new safety standard
There's one key difference in the U.S. standard on safety instrumented systems and the international one: The committee has included a grandfather clause—a process of looking at existing equipment. "The grandfather clause doesn't say you have to update your existing equipment, but you do need to look at it and determine and document—based on your design, operation, and maintenance practices—that it's safe," Summers said. "If it is safe, you don't have to modify your systems to the level of the new standard. If it isn't, you need to bring it into compliance with the latest good engineering practices."
The premise behind the new standard is that it's performance- or goal-oriented versus prescriptive, said president of Houston's L&M Engineering Paul Gruhn. The goal of the new standard is not to dictate what technology you need for the level of redundancy you need. The fact that the new standard is performance-oriented means "the greater your level of process risk, the better the safety systems you need," Gruhn said. The new three-part standard also covers a higher safety integrity level—SIL 4. The old standard only went up to SIL 3. (See a related story in November 2004 InTech.) "But just because you define something as
SIL 4 doesn't mean you need SIL 4 in your facility," Gruhn said, "especially when before, you didn't even need SIL 3."
"What's out there now is the original SP84 committee development from 1996—the U.S. domestic standard," Summers said. "Other countries used it, but it was predominantly focused on the U.S. market. The new standard is global. So it doesn't matter if you're building a plant in Asia, Texas, or Europe; it'll be built by the same standard. And that's good because most work is done using a variety of parties," she said. "You might do engineering in the states, a European company might own the site, and the installation might be going into Africa."
In the past, Summers said, local regulations made it difficult to execute projects efficiently because everyone had different opinions about how to design it. "Now we have one base template about effective design of safety instrumented systems," she said.
What about security?
And just when they thought complying with safety standards was enough, now plants need to protect their systems against attacks. "But we need to apply protection in proportion to the risk and value," said Rich Ryan, vice president of business development, global manufacturing solutions, at Rockwell Automation in Milwaukee, Wisc. "Having too much security—if there is such a thing—can create unnecessary expenses and restrict accessibility to those with authorized access. But the lack of security puts people, processes, and profits at risk. So companies need to evaluate and balance the level of exposure with the business criticality of what they're protecting," he said.
Simply giving an IP address to a plant-floor device makes it a potential target, Ryan said, "but that doesn't mean we shouldn't leverage available technologies to improve manufacturing productivity. It's possible to build systems that leverage contemporary IT technology, but to apply it blindly without understanding the consequences of the threats isn't a good business risk."
Culprits, processes, and solutions
One way to protect information inside the perimeter of the plant is to implement user authentication at the door—between the inner and outer areas—using role, location, and process-based authentication, Ryan said. "Think of it as the definition and enforcement of who can do what, and from where." (See "Who, what, where" story, page 25.)
The outer layer of the enterprise, normally IT's domain, is the protective shell of the plant floor. It uses firewalls, encryption, and patch management to protect the plant from hackers, crackers, and script kiddies, Ryan said. "Think of what happens when you buy a new PC, take it home, and plug it into your DSL [digital subscriber line] or cable modem line. You immediately find your system is vulnerable to the outside world," he said. "The same risks occur in the production system, especially when you open up your manufacturing system by connecting it to the corporate IT network and the Internet." When they work with IT, most manufacturing managers realize this shell has proper protection, but they also need to know how that protection works to secure the factory floor.
Inside the plant, "we're concerned about production schedules, production rates, customer information, process conditions, product specifications, recipes, operating procedures, and quality data," Ryan said. The plant needs an additional barrier to isolate and filter plant-floor network traffic from the rest of the enterprise. This blocks misdirected network traffic (e-mail, SPAM, DOS attacks) and prevents it from causing harm to intellectual property and production assets.
This is where manufacturers need to protect themselves from the good guys—employees and partners—"not worrying as much about intentional attacks as the accidental ones," Ryan said. "This is where companies typically get complacent with security policies. Encryption may not be a critical need here, but capabilities like authentication and role-based authorization are valuable," he said. "The issue goes beyond network security to encompass data security, data integrity, and network loading as well."
Who, what, where of enterprise protection
By Rich Ryan
Would you want your human resources manager modifying a programmable logic controller (PLC) program or forcing an output? Depending on the roles established on the plant floor, engineers and technicians are probably the only ones who should touch the equipment, and these people are identifiable by name.
Control system threats
By Nancy Gill Shifflet
Control systems are in a constant state of flux with equipment installations and removals, subsystem upgrades, equipment reconfigurations, communication media changes, user account activity, and vendor changes. These operational activities can introduce vulnerabilities into a secure control system implementation. Security plans can give guidance in avoiding vulnerabilities, but they need to be operationally current to be effective.
Historically, control systems used proprietary hardware, software, and network protocols—limiting the potential for an attack to those with control system-specific skills. Control systems today face more risks because of their vulnerabilities, due to things like standardized technologies and network connectivity, restrictions on implementing existing security technologies and practices, unsecured remote connectivity, and publicly available control system-specific information.
Systems are vulnerable when the security plan doesn't include a proactive and reactive strategy. Proactive strategy includes a pre-attack examination of security policy vulnerabilities and steps to minimize them by developing contingency plans. Reactive strategy provides instructions on performing post-attack assessments and damage repair of the contingency plan to get the process functioning. Security plans may not exist, personnel may not know their location, the documents can be out-of-date, or instructions can be difficult to follow.
The user community may not know how the security policy affects them and will unknowingly introduce vulnerabilities into the control system. Information technology and control system departments may not share information concerning current threats. Vulnerabilities lie in hardware, software, and data assets. A business environment, operational environment, and control system architectural design decisions could also be vulnerable.
Motives and methods
Control system cyber attack motives can include experimental curiosity, pride and power, commercial advantage, extortion and criminal gain, random protest, political protest, terrorism, and cyber warfare.
Methods include probes, scans, floods, authentication, bypass, spoof, read, copy, steal, modify, and delete. Tools include physical attack, information exchange, user command, script or program, autonomous agent, toolkit, distributed tool, and data tap. Social engineering has been a success in past cyber attacks. Those included gaining user logins, passwords, and system instructions. Control system skills attackers use to implement a target-specific attack include knowledge of the control system architecture, types of equipment installed, protocols, network architecture, configuration algorithms, specialized communication devices and passwords, ladder logic, and sequential function block programming.
Types of attacks include virus, worm, buffer overflow, denial of service, timing, password, desynchronization, resource exhaustion, logic bomb, Trojan horse, and trapdoor. Attack vehicles are application program interface, e-mail, protocol, and user activation.
After an attack, a control system could face service denial; elemental denial of service; addition, deletion, alteration, or delay of signals transmitted to edge devices; addition, deletion, alteration, or delay of signals transmitted to the host system; lost transmitted data; system transfer of control; elemental transfer of control; and knowledge discovery.
Behind the byline
Nancy Gill Shifflet is a dissertation candidate from Nova Southeastern University in Ft. Lauderdale, Fla. Her e-mail is Shifflet@computer.org.
Good engineering practices
OSHA has a list of industrial standards bodies documenting what's considered good engineering practices. "It's not just a string of words," said Angela Summers, president of Houston-based SIS-Tech. It has a regulatory meaning. "Every day industrial organizations are working to write new standards. OSHA can't say you must comply with XYZ standard, because ABC standard might come out," she said.
So manufacturers and users need to look at what industrial standards bodies are doing, what they're documenting, and what practices apply to the process. Then they need to use them. "Good engineering practice is a benchmark not a list," Summers said. OSHA has a list of organizations like American National Standards Institute (ANSI), National Fluid Power Association (NFPA), American Petroleum Institute (API), and American Society of Mechanical Engineers (ASME)—all organizations that the manufacturers should look to for guidance and mandates on how to design their equipment. The end users look to it for work process and requirements. "OSHA does not require that you follow any particular standard, just that you maintain safe operation by applying the appropriate good engineering practices," Summers said.