Cyber-related process hazard analysis
How Shell conducts cyber PHA assessments based on ISA-TR84.00.09
By Larry O’Brien and Mark Duck
The introduction of process safety system–specific malware into the manufacturing world in 2017 intensified the discussion around the convergence of safety and cybersecurity. If a cyberattack could compromise safety in the physical world, we must view cybersecurity in the context of safety. Similarly, approaches taken in the safety world to evaluate risk and to design safer systems must consider cybersecurity-related threats to the integrity of safety systems.
Key end user companies from the oil and gas, marine transportation, and offshore exploration and production industries discussed these and related issues in a session at the 2019 ARC Industry Forum in Orlando. In that session, Mark Duck, who is with the Shell Projects & Technology organization, talked about an approach Shell is exploring to integrate cybersecurity risk assessment into traditional process hazard analysis (PHA) methods.
For example, hazard and operability (HAZOP) studies could be used to determine the impact of cybersecurity threats and vulnerabilities on the safety of plant operations. Similarly, methods like HAZOP should be adapted to consider the cybersecurity risk to the integrity of the selected safety barriers for each specific hazard risk scenario.
These approaches are not limited to the oil and gas industry. They could be applied to an even wider range of industries, including those that are not the primary users of process safety systems.
The industry needs: (1) a clear way to think about how cybersecurity risk, if realized, could degrade the integrity of safety barriers, and (2) a standard methodology to assess this risk. In the first case, using a bow-tie model and the concept of escalation factors can help frame where cybersecurity threats and vulnerabilities can affect safety barriers. In the second case, the industry has already taken steps to address how cybersecurity risk assessment can be integrated into the functional safety life cycle.
Figure 1. Example risk assessment matrix.
Figure 2. Example bow-tie model.
Cyberrisk viewed as a safety concern
The International Electrotechnical Commission (IEC) 61511 Functional Safety standard now requires a safety instrumented system (SIS) security risk assessment. ISA has published a technical report (ISA-TR84.00.09-2017) that documents a SIS cybersecurity risk assessment procedure, called cybersecurity PHA or cyber PHA. The link to PHA is a step in the cybersecurity risk assessment process to: (1) review the output of the PHA to identify worst-case health, safety, security, and environment (HSSE) consequences for the asset, and (2) identify any hazard scenarios where the initiating event and all control barriers are “hackable.”
NAMUR has also published a worksheet (NA 163) titled “Security Risk Assessment of SIS.” A cyber PHA methodology can be used to assess the risk associated with identified
cybersecurity-related escalation factors and recommend mitigations to reduce the risk to an acceptable level. Linking concepts and tools used in the process hazard analysis world with cybersecurity risk assessment can help bring these two, traditionally separate, risk management processes together with a goal of improving the robustness of our safety systems against cybersecurity attacks.
Most cybersecurity risk scenarios only deliver a consequential business-loss consequence along with a potential impact on company reputation. PHAs, on the other hand, typically do not consider consequential business loss. But a cybersecurity risk assessment must include this consequence category, as shown in the example risk assessment matrix (figure 1).
This consequence category can be calibrated in terms of duration of production loss. The worst-case severity is calibrated by determining the maximum number of hours, days, weeks, etc., it would take to bring production back online in case of, for example, a ransomware attack that affects all servers and workstations in the industrial automation and control system (IACS).
The cost is the value of lost and deferred production plus the materials and labor required to respond to the incident. The list of systems required to bring production back online will likely be a subset of all systems in the IACS. From a cybersecurity risk scenario point of view, this list would be the critical systems, including all safety systems.
Shell’s process safety pedigree
Shell is well known for its emphasis on safety. The company takes a comprehensive and multifaceted approach to managing process safety risk including managing the HSSE risk associated with the asset process (a particular aspect of the chemical manufacturing process for example), integrity of safety barriers, risk to production loss, and other factors.
Shell uses many methods to evaluate risk in process safety that are consistent with those outlined in the IEC/ISA 61511 process safety standards. These include the use of risk assessment matrices that consider likelihood; consequence of risks to people, assets, community, and environment; and severity of the consequence. The company also uses bow-tie models (figure 2) to visualize the various elements of risk scenarios, such as hazards, top events, and barriers, including escalation factors and escalation factor controls.
A “hazard” is an agent with potential to cause harm. A “top event” is an uncontrolled release of a hazard, such as hydrocarbons, toxic substances, energy, or objects at height. An “escalation factor” is any situation, condition, or circumstance that may lead to the partial or full failure of a barrier (e.g., independent protection layer).
An example of this is making unauthorized trip setting changes to a safety instrumented function. This escalation factor could be controlled by improving the logical and/or physical access controls for the safety instrumented system. Identification of escalation factors is part of the process of managing the integrity of independent protection layers. These tools, among others, can be used to start the journey of integrating the process safety and cybersecurity risk assessment processes.
Figure 3. Partial bow tie showing threat and barriers including SIF with SIL and top event.
Today’s challenge is to create an interface between process safety risk assessment methods and cybersecurity risk assessment methods. Historically, the HSSE risk assessment process has not considered sabotage (cybersecurity attacks are a form of sabotage). Given the level of sophistication seen in recent cybersecurity attacks on industrial control systems, the potential for simultaneous cybersecurity attacks on one or more independent layers of protection must be considered during the HSSE risk assessment process.
Safety instrumented systems and other control and recovery barriers have cybersecurity vulnerabilities that must be mitigated. These vulnerabilities represent “escalation factors” in the bow-tie model that must be mitigated with appropriate “escalation factor controls.”
One possible interface between the cybersecurity risk assessment and the HSSE risk assessment process is to focus on cybersecurity escalation factors associated with barriers that have cybersecurity vulnerabilities. One advantage to this approach is that it is already part of the existing HSSE risk assessment process.
It is common for one or more of the selected control or recovery barriers to be vulnerable to cybersecurity threats. Some examples are safety instrumented systems, PLC-controlled fire water pumps, and fire and gas systems—
essentially any barrier based upon microprocessors running firmware/software and, potentially, connected to a network.
A cybersecurity escalation factor for these types of controls is the combination of cybersecurity threats and vulnerabilities associated with the equipment used to implement the control. In this context, cybersecurity escalation factors are just one type of escalation factor among many other types of escalation factors that can degrade the integrity of a safety barrier.
During a PHA, there is the possibility that the initiating event and all the control barriers selected for a hazard scenario have cybersecurity escalation factors. Where this is a high-consequence scenario (e.g., potential fatality), an effort should be made to add at least one control barrier that does not have cybersecurity escalation factors, such as a pressure relief valve or nonprogrammable safety instrumented function (SIF). If this is not possible, the cybersecurity risk assessment team should consider that a cybersecurity attack on these specific control barriers has a higher likelihood of leading to the top event, and they should identify a robust set of cybersecurity countermeasures to manage this risk.
Maintenance and cybersecurity
Industrial assets typically have a maintenance program to maintain the various components of the asset and must often prioritize work based upon some criteria. A common way to do this is to organize the assets in terms of system criticality. If a backlog of maintenance activities exists, then ensure the components with the highest criticality are taken care of first.
A common issue with cybersecurity controls in an industrial control system (ICS) environment is the related maintenance required to sustain them over time. Often, the maintenance associated with cybersecurity controls is a lower priority than instruments, valves, etc., because of the lower perceived value. One way to resolve this issue is to assign the cybersecurity controls used to manage cybersecurity escalation factors a criticality rating based on the barrier being protected, and then factor this into the overall maintenance strategy.
The need for doing cybersecurity risk assessments for process safety is called out in IEC 61511 Part 1 (2016). This requires that “a security risk assessment shall be carried out to identify the security vulnerabilities of the SIS.” The requirement further specifies additional details supporting the risk assessment.
Although this is a needed step, there are potentially many other safety systems, in addition to SISs, that are subject to cybersecurity vulnerabilities. The following are examples of other programmable safety systems subject to cybersecurity vulnerabilities:
- fire water pumps
- tanker loading systems
- ballast management systems (example: offshore semi-submersibles)
- mooring systems (example: offshore semi-submersibles)
- helicopter refuels
- hazardous area ventilation
- deluge systems
- sprinkler systems
- navigation aids
- collision avoidance systems
- communication systems
The trend to integrate these programmable safety systems with basic process control systems will likely continue and must be considered in the context of the IEC 61511 safety life cycle, which includes ensuring cybersecurity risks are adequately addressed. Considering these challenges, companies must make sure that cybersecurity risks to the availability of all barriers are understood, mitigated, and even “designed out,” where possible, during the PHA process.
Figure 4. Cybersecurity integrated with process safety.
Figure 5. Safety instrumented systems interface to HMI, engineering workstations, and instrument asset management systems.
Emergence of cyber PHA
This raises the question of how we develop a cybersecurity risk assessment that meets the requirements of the world of process safety. The concept of cybersecurity process hazard analysis (PHA) has emerged in the industry over the past several years and is finding increasing acceptance among end users. ISA and IEC cybersecurity standards have embraced this method. A methodology (and supporting information) for integrating process safety and cybersecurity risk assessment is documented in the following:
- ISA/IEC d62443-3-2 (draft) – Security Risk Assessment and System Design
- ISA-TR84.00.09-2017 – Cybersecurity Related to the Functional Safety Lifecycle
- NAMUR Worksheet NA 163 – Security Risk Assessment of SIS
Several service providers have emerged over the past few years that have developed their own methodologies for doing cyber PHA that are consistent with the recommendations outlined in the standards. These companies range from smaller software and engineering service providers to large, integrated process automation suppliers.
Shell’s lessons learned
Shell concluded its ARC Industry Forum presentation by sharing several lessons learned from undergoing its own cyber PHA (based on ISA-TR84.00.09) assessments. Although some are not strictly cybersecurity related, they nevertheless emerged in a cybersecurity risk assessment. Awareness of these issues would benefit the ICS cybersecurity community.
For example, not all safety instrumented systems have a hardware key switch to manage the various modes of operation, such as “run, program, remote” modes. In cases where a hardware-based key switch is not provided, consider using a “software” lock to enforce separation of duties for the SIS. For example, the supervisor in the plant should unlock the SIS to allow the engineer to make configuration changes. Another example is making sure that critical SIS parameters such as “trip limit” cannot be changed online. A download, using separation of duties, should be required to change trip limits and other critical parameters.
End users should also pay attention to how the safety system engineering workstation user roles and associated privileges are set up. Companies should ensure that the “principle of least privilege” is enforced. In other words, minimize the number of people with privileged accounts and set up roles to support separation of duties.
Companies should also rigorously manage bypasses, and operators should have a real-time view of all active bypasses. A best practice is to use separation of duties to enable bypasses along with an administrative process that creates a record of the bypass. Rigorous management (or elimination) of remote access to safety system engineering workstations and instrument asset management systems (IAMS) should also be employed. Periodic audits of who has remote access should be implemented. Remote access by privileged users should be normally disabled, done under “permit to work,” and monitored locally.
Unauthorized changes to SIS instrument settings, such as sensor type, scale, or range, can render a SIF inoperable. This can potentially be done from the IAMS or through handheld devices used to interface with intelligent instrumentation. End users should ensure some form of “instrument lock” is in place, such as a hardware jumper or software lock. This will let them use separation of duty to make changes to instruments. Some end users have employed data diodes for this purpose as well.
Run periodic audit reports to detect unauthorized downloads to SIS or changes to instruments. Changes should match corresponding administrative controls such as “permit to work (PtW)” or “management of change (MOC)” records. If user accounts for the safety engineering workstation (S-EWS), human-machine interface (HMI), and IAMS are integrated with Microsoft Active Directory, risk assess the implementation to make sure these credentials are properly protected.
Good password policy should also be followed. Ensure that privileged account passwords are not leaked or shared. Use administrative policies that state this. If passwords are “stored,” do so in a secure manner. Do not pass around files with passwords.
Overall, companies should identify and address cybersecurity risks to safety barriers during design. Using concepts such as cybersecurity escalation factors and ensuring high-consequence hazard scenarios are mitigated using barriers without cybersecurity vulnerabilities can help achieve this goal. A cyber PHA should be included as part of any new project and as part of the contract when designing an automation system. Companies should perform a cyber PHA for existing assets when a change occurs that affects the safety system or when the cybersecurity threat landscape has changed.
There is still much to learn about the intersection of cybersecurity and safety. This requires collaboration between both communities for the continued protection of assets, people, and communities in today’s challenging cybersecurity environment.