01 September 2003
How safe is open?
Open systems work fine, but you increase security risks.
By John C. Grebe Jr.
There has been an almost universal adoption of open system architectures based on commercial operating systems and open networking components in new control systems developed over the past ten years. The trend started as cost effective PC-based human-machine interface (HMI), and engineering stations with optional use of Ethernet have recently extended to just about all layers of new control systems. The benefits of this approach are many, which has helped to drive this rapid adoption. The open technology, however, also brings new risks often not fully recognized by users. The older, more structured proprietary systems, while less flexible, naturally applied a layer of protection that is no longer present unless specifically added or preserved in new systems.
The topic of control system security is more complex because of the wide set of cross-disciplinary knowledge required for comprehensive review and planning. Knowledge of the chemicals involved in the process and their possible misuses needs to combine with knowledge of the process control system, computer networking, computer software, and security scenarios.
Old and new
A traditional distributed control system (DCS) segregates its processing components into multiple distinct layers, each separated by its own distinct layer of communications. The bottom communication layer connects proprietary I/O to the actual sensors and actuators that interface with the process. The next level provides a proprietary communications link between the I/O and the proprietary controllers. Above this is a proprietary peer-to-peer network, which can communicate between the controllers and data server(s). The HMI can either share the control bus or have its own proprietary bus connected to the data server(s), depending on scalability and distance requirements. At the top, servers on the control or HMI bus bridge information to and from the proprietary DCS buses and the open and typically Ethernet plant local area network (LAN). This layered and partitioned approach preserves performance and limits the impact of failures.
The first generation open control system (OCS) has essentially the same structure, but various parts of the diagram undergo replacement with standard components for hardware and software based on open standards. These standard components actually subdivide into two groups: ones based on standard components developed exclusively for the process control industry and standard commercial products used across multiple industries with thin layers of selected customization added as necessary to support the process control requirements.
Originally Ethernet began to replace the proprietary networks connecting the HMI and engineering stations, while other industry standards such as ControlNet and Profibus began replacing the proprietary networks used for communication between multiple controller engines. At the I/O level, DeviceNet and Highway Addressable Remote Transducer (HART) were early communication standards that allowed for more intelligence to be moved to the field points. The more powerful Profibus and Foundation fieldbus later joined these initial fieldbuses.
At the same time, standard PCs and industrially hardened PCs have made major inroads into the processing hardware including the use of standard PC operating systems. This enables the use of existing low cost products, both hardware and software, to lower costs while enhancing functionality.
OPC communications actually represent a set of process control layers on top of a combination of Ethernet communications and Microsoft Distributed Component Object Model (DCOM) technology.
Although not yet complete, another trend is rapidly advancing. Ethernet is making major inroads into the control bus level and is even advancing at the I/O bus level. This now results in the potential to dramatically alter the traditional DCS network structure, allowing off-the-shelf information technology (IT) products to merge these previously separate levels into one network while managing network bandwidth capacity in a more sophisticated and flexible manner. In addition to potential cost savings, this results in potential for significantly increased flexibility to add features that strengthen the integration of the manufacturing and financial information systems.
Where is the crack?
The combination of three primary factors together create security issues:
- Commercial off-the-shelf (COTS) hardware, software, and networking components is one issue.
- Complexity of software, especially COTS software, which includes many features users do not need or even know are present, plus the corresponding weaknesses of the software and the availability of knowledge about the weaknesses.
- Degree of connectivity of DCS/PLC to business LANs, which typically connect to wide area networks (WAN) and the Internet. Some early papers warned that open industry communication buses such as ControlNet would weaken security. This did not prove to be a significant issue at the time based on the still small community with this knowledge and the relatively small rewards for compromising such a system. There are now, however, two fundamental changes in the environment that alter the current situation: a shift to commercial-based products, and the resulting order of magnitude increase in people with sufficient internal knowledge of the systems to compromise them. This is especially true with the availability of easy-to-use hacking tools that reduce the level of knowledge required. Risk is further increased by the existence of groups that have strong potential motives other than financial gain, place a low value on human life, and do not fear capture.
While earlier control-market-specific open components evolved from industrial roots and contained only the functionality needed for this market, the new open system components are from the PC industry. These products just use a subset of the existing functionality, which is typically much greater than required.
The increased complexity and additional functionality provide opportunities for bugs, plus the unused functionality still provides additional opportunities to compromise the control system using these components.
Security violations range from access to nonpublic and restricted information (browsing) to destruction of valuable information (sabotage) and losses caused by inability of the systems to perform their intended services (denial of service). For a typical IT system, economic damage or potential public embarrassment is the worst case outcome of a security breach. Although the same results are possible from a security breach of a control system, additional, more catastrophic worst case scenarios are possible in many situations due to the potential to control and/or misuse the resources and materials available to the control system. This may create explosions or toxic releases or compromise public health with misuse of water or sewage treatment plants. These security breaches may also come in conjunction with other more physical forms of attack to increase the potential impact.
It is even possible that normal Internet hackers may inadvertently cause major damage to an unprotected system without even knowing they were targeting a control system. The potential impact for any particular facility or control system can be best evaluated by a formal risk assessment, which can be coordinated with or be an extension to the normal hazard and operability (HAZOP) study, also including the business/economic risks.
There are particular aspects of open control systems known to result in an increased level of risk. The sections below provide a quick look at some of the most important areas of concern.
Standard operating systems
Control system security
There are multiple industry and government groups actively involved in coming up with better methods for control system security.
Most focus on long-term standards and solutions and do not provide much practical guidance for the short term.
The following is a list of some industry and government groups actively involved in the area of control system security:
- National Institute of Standards and Technology (NIST) - Process Control Security Requirements Forum (PCSRF)
- Common Criteria for IT Security Requirements (ISO 15408)
- American Institute of Chemical Engineers (AIChE) - Center for Chemical Process Safety (CCPS)
- EPRI Enterprise Infrastructure Security program
- National Infrastructure Protection Center
- Office of Homeland Security
Standard operating systems include Unix as well as the various Microsoft products. They provide basic services such as communications and data storage. In addition to the functionality required by the control system, they have many additional features including the ability to run other programs at the same time. One major issue is the number of people looking for weaknesses in these products and the speed of the knowledge propagation once they find a weakness. This is especially an issue in control systems, which typically stay at one level of software for an extended period of time, and where installation of software upgrades themselves results in increased levels of risk. Another major issue is the number of unnecessary communication services typically left enabled in default configurations, resulting in an increased level of vulnerability. These points are actually relevant for all software and not limited to the operating system.
OPC has extensive security capabilities built into the design that are based on Windows security. An important question is "How are your OPC components designed and installed in terms of security?" There are times when a user sacrifices security for easier development, installation, and maintenance.
Active X components have extensive access to sensitive resources within a computer, which can result in major damage via either inadvertent software bugs or intentional hidden capabilities that are not obvious to the user. Someone could easily exploit them to compromise security and system integrity.
Simple Network Management Protocol (SNMP), used to manage network infrastructure, has recently been tested by an independent organization. It found extensive security holes in typical implementation practices and in the products of the major vendors that support this protocol.
Exploitation of these weaknesses can easily disrupt Ethernet communication pathways.
Internet-enabled browsers create unlimited possibilities for security breaches. Even systems almost impervious to external attack can face a breach by a component that is "invited in" via the actions of someone browsing the Internet. Once inside, components can set up covert monitoring activities to learn and report additional weaknesses or launch attacks from the inside.
Calling home refers to a device attempting to contact a specific address over the Internet (typically the company who created the product) for various purposes, such as to check for available updates and monitor performance. The Microsoft auto update facility for its operating system is a well-known example.
Even if you trust the associated vendor, this raises questions on how this capability may affect a system in ways not intended by the developer, because your PC is essentially inviting the outsider in and giving them permission to make changes with potentially far-reaching and unknown effects.
Awareness of security issues is the first step. It enables you to ask the right types of questions to expose and eliminate many of the risks. Although the best outcome results from up-front planning prior to commissioning a system, in many cases significant improvements can occur with existing systems by identifying and closing multiple points of vulnerability.
Unfortunately users accustomed to the benefits of the open control systems will not want to go back to the "good old days." The list below provides some relatively simple rules that would provide security but not be acceptable to most users:
1. Do not use open DCS/PLC systems.
2. Do not connect DCS/PLC systems to corporate networks.
3. Do not use software with known security holes and quickly patch newly discovered bugs.
Obviously better solutions need to take into account user needs and expectations balanced against the associated risks.
For an existing system, organizations must realize that risk reduction needs to be an iterative process of discovery and corrective actions for a system in operation. There are three important activities that initiate this process: estimating levels of risk involved, performing a preliminary security walk-through audit, and defining the goals and objectives realistically. Remember that in an iterative process, perfection is not the goal for the first pass. It is more important to get an overview and address the most important issues first and achieve progress within a reasonable amount of time.
Given the nature of the problem and the reality of limited resources, it is very important to plan and prioritize based on the risk of the consequences. Existing HAZOPs for the facility are a good starting point, but are not sufficient because they were performed within a set of expectations that someone may deliberately violate by causing an accident or simultaneously compromising multiple systems.
Depending on the physical plant layout this could also involve crossing normal process boundaries to combine materials that are not normally in contact with each other-perhaps even by using an easily added temporary path. Although the type of DCS does not impact this risk, it should still be included in the risk analysis.
A security walk-through is a simple step that can help find blatant issues and help put the other factors into perspective. The goal is to look at everything from the viewpoint of someone who wants to exploit weaknesses or other opportunities presented by the environment. This is a good way to observe physical security measures and work practices that may defeat physical and other measures. It is also a good time to look for undocumented communication connections, such as PC modems.
Preparation with a simple checklist of points to look for can help, but it is also important to just observe and discover unanticipated things.
Goals and objectives
An important question that needs an answer is "What needs protection?" The answer depends on the particular circumstances and risks. Is preventing catastrophic failure and damage the only goal? Is it also a goal at this time to be able to preserve availability and not be susceptible to denial of service attacks? If so, what needs to remain available? If the HMI communications undergo disruption, but the automated control continues, is that acceptable? Is the internal data sensitive, and does it need protection from unauthorized viewing?
Goals may need to change over time starting with the most pressing priorities, such as protecting the operation of any existing safety systems as a first priority and the ability to continue operation as a second priority. Trying to do too much at one time may result in the inability to make the most critical improvements in a timely manner.
There are multiple protective measures for new and existing control systems. All of the measures listed should undergo serious consideration to provide a sufficient level of risk reduction, dependent on the level of assessed risk.
Whenever possible maintain the traditional separation of communication layers in the distributed control system, even if the protocols are compatible. This helps to isolate any communication problems and their impact, whether caused by a security breach or a hardware/software failure. It is also easier to fully consider and properly handle the potential consequences in a more limited environment. Use of standard networking routers may not be sufficient unless you can uncover known weaknesses. When standard PCs are used to bridge the levels, it is very important to carefully control the software content and especially the communication services bound to the interfaces by the operating system.
If the HMI network is vulnerable, you may find it advisable to keep a dedicated and physically secure HMI station connected directly to the control network for backup monitoring and control activities if someone compromises the HMI network. It may also be useful to have an easy-to-disconnect point to break the connection from the HMI network if the control system comes under attack.
Because the business network typically connects to the Internet, it represents a significant level of risk. To use an analogy in functional safety terminology, the business firewall may only effectively provide safety integrity level (SIL) 1 level protection, while the control network may need SIL 3 risk reduction. In addition the business network itself is accessible to a much larger audience than the control system and most likely has lower levels of physical security associated with it. This does not mean that data from the control system is not accessible to the business network. Dedicated historians can still supply timely data, but the historian should support separate network connections for each network. Again, the software content and communication bindings need to be carefully controlled. It is also best if any interface to the control system contains a rate-limiting feature to prevent flooding of messages due to either a denial of service attack or a simple network component or software failure.
In practice this separation also results in enforcing other good practices, such as not permitting Internet browsing or e-mail from the HMI terminals. Users should operate separate PCs for those functions in the control room without physical connections between them and the control system components.
It is best to keep OPC traffic off the business network, but if it must extend into this region be sure the exposed functionality is sufficiently restricted or the security sufficiently strong. Insufficiently protected OPC interfaces are easy to attach to with standard tools that do not require a high level of knowledge. You can download OPC-specific tools off the Internet to help the connection efforts.
Security for distributed control systems is not always a primary concern of users and perhaps taken for granted. Newer open control systems place more responsibility on system integrators and users to ensure security, while the same systems make this more difficult to understand and control effectively.
Although formal methods and standards are not yet available, users can assess and reduce significant aspects of current vulnerability. AIT
Behind the byline
John C. Grebe Jr. is a partner at Sellersville, Pa.-based exida.com, L.L.C. His e-mail is email@example.com.