Ukrainian power grids cyberattack
A forensic analysis based on ISA/IEC 62443
By Patrice Bock, with the participation of Jean-Pierre Hauet, Romain Françoise, and Robert Foley
Three power distribution companies sustained a cyberattack in western Ukraine on 23 December 2015. As the forensic information is extensive from a technical point of view, it is an opportunity to put ISA/IEC 62443-3-3 Security for industrial automation and control systems Part 3-3: System security requirements and security levels to the test with a real-life example. Several sources were used for this purpose that, overall, provide unusually detailed information. This article:
- reviews the kinematics of the attack using the available reports and reasonable assumptions based on our experience of cyberattack scenarios and of typical operational technology (OT) systems and vulnerabilities
- introduces a methodology for assessing the Security Level - Achieved (SL-A) by one of the Ukrainian distributors (corresponding to the best documented case)
- applies this methodology; presents and discusses the estimated SL-A; reviews this SL-A per the foundational requirement (FR); and derives conclusions and takeaways
- evaluates the security level (SL-T) that should be targeted to detect and prevent similar attacks
Kinematics of the cyberattack
Although the attack itself was triggered on 23 December 2015, it was carefully planned. Networks and systems were compromised as early as eight months before. Keeping this time frame in mind is essential for a proper understanding of the ways and means that should be used to detect, and eventually prevent, a similar attack.
Our analysis of the cyberattack is threefold:
- Initial intrusion of the information technology (IT) network using spear phishing
- Intelligence gathering on the IT and OT networks and systems using the flexible BlackEnergy malware: network scans, hopping from one system to another, identification of device vulnerabilities, design of the attack, and installation of further malware and backdoors
- Attack itself that lasted 10 minutes on 23 December
Step 1: Malware in the mail!
In spring 2015, a variant of the BlackEnergy malware was triggered as an employee of Prykarpattya Oblenergo opened the Excel attachment of an email. BlackEnergy is a malware "suite" that first hit the news in 2014, when it was used extensively to infiltrate energy utilities. Its aim was to gather intelligence about the infrastructure and networks and to help prepare for future cyberattacks.
The diagram in figure 1 is a simplified view of the network architectures (i.e., Internet, IT, OT) and will help depict each step of the cyberattack. The hacker is shown as the "black hat guy" at the top right side. The hacker used the utility's IT connection to the Internet as the channel to prepare and eventually trigger the cyberattack.
We can see that the company had proper firewalls set up, one between the IT network and the Internet and the second between the IT and OT (industrial) network. The OT network included a distribution management system (DMS) supervisory control and data acquisition with servers and workstations and a set of gateways used to send orders from the DMS to remote terminal units that controlled the breakers and other equipment in the electrical substations. Additional devices were connected to the network too (e.g., engineering workstations and historian servers) but are not relevant for the attack kinematics.
At this step, the hacker managed to compromise one office laptop thanks to the BlackEnergy email attachment. This is difficult to prevent as long as people open attachments of legitimate-looking emails.
Figure 1. Simplified diagram of the control system architecture
Figure 2. Step two of the attack
Step 2: Attack preparation, network scans, and advanced persistent threat (APT)
During several months in the summer of 2015, the BlackEnergy malware was remotely controlled to collect data, hop from one host to another, detect vulnerabilities, and even make its way onto the OT network and perform similar "reconnaissance" activities.
Forensic data analysis about this phase is incomplete, because the hacker did some cleaning up and wiped out several disks during the actual attack. Nevertheless, prior analysis of BlackEnergy, as well as reasonable considerations about the standard process used for cyberattacks, makes the following reconstitution probable with reasonable confidence.
As displayed in figure 2, during step two, a large amount of network activity took place. The remote-controlled malware scanned the IT network, detected an open connection from an IT system to an OT supervision platform, performed OT network scans, collected OT component information, and eventually installed ready-to-trigger malware components on both the IT and OT systems.
This phase lasted weeks, maybe months, and allowed for a custom exploit development. An exploit is a bit of software designed and developed to exploit a specific vulnerability. It is embedded as a payload on malware that is configured to deliver the payload for execution on a target. Actually, this effort was somewhat limited. The only original piece of malware code developed was the one needed to cancel out the gateways as part of step three. And this really was not a significant "effort," as gateways have for a long time been pointed out as vulnerable devices.
Step 3: Triggering the cyberattack
In the afternoon two days before Christmas, as stated by an operator, the mouse moved on the human-machine interface (HMI) and started switching off breakers remotely.
When the local operator attempted to regain control of the supervision interface, he was logged off and could not log in again, because the password had been changed (figure 3).
The whole attack only lasted for a couple of minutes. The hacker used the preinstalled malware to remotely take control of the HMI and switch off most of the switchgears of the grids. Additional malware, in particular the custom-developed exploit, was used to prevent the operator from regaining control of the network by wiping out many disks (using KillDisk) and overwriting the Ethernet-to-serial gateway firmware with random code, thus turning the devices into unrecoverable pieces of scrap.
Additional "bonus" activities included performing a distributed denial-of-service attack on the call center, preventing customers from contacting the distributor, and switching off the uninterruptible power supply to shut down the power on the control center itself (figure 4).
This step was obviously aimed at switching off the power for hundreds of thousands of western Ukrainian subscribers connected to the grid. However, most of the effort was spent making sure that the power would not be switched on again: all specific malwares were developed with that objective. Once triggered, the only way for the operator to prevent that issue was to stop the attack as it was performed.
But the attack was too fast to allow any reaction; indeed, in a critical infrastructure environment, operator actions may cause safety issues. Therefore, only predefined actions are allowed, and operators have to follow guidelines for taking any action. In the event of an unforecasted operational situation, they are not trained to make decisions on the spot. This was exactly the situation in the Ukrainian case. "Obvious" actions could have stopped the attack (like pulling the cable connecting the OT to the IT network), but untrained operators cannot be expected to take such disruptive steps on their own initiative in a stressful situation where mistakes are quite possible.
Figure 3. Step three of the attack (1)
Figure 4. Step three of the attack (2)
In retrospect, once we know all the details about the cyberattack, it looks easy to detect, given quite significant network activities and the levels of activity taking place on numerous systems.
But it is actually a challenge to figure out exactly what is happening on a network, especially if you do not have a clue about what is "normal" network activity. Once connections to both the Internet and to the OT network are allowed, detecting signs of cyberattacks is difficult because of the volume of traffic. Continuous monitoring with the capability to identify the few suspect packets in the midst of all of the "good" packets is needed. Multiple proofs of concept of such detection using correlated IT and OT detection have been performed and were presented at the conferences GovWare 2016 in Singapore, Exera Cybersecurity days 2016 in Paris, and SEE Cybersecurity week 2016 in Rennes (France).
Yet other means exist, and using IEC 62443-3-3 to scrutinize the Ukrainian distributor security helps to identify all the controls that were missing and that could have prevented the cyberattack.
Methodology to estimate the SL-A
ISA/IEC 62443-3-3 lists 51 system requirements (SRs) structured in seven foundational requirements (FRs). Each SR may be reinforced by one or more requirement enhancements (REs) that are selected based on the targeted security levels (SL-Ts). Evaluating the achieved security levels (SL-As) can therefore be performed:
- for each SR, checking whether the basic requirement and possible enhancements are met
- for each FR, the SL-A being the maximum level achieved on all SRs
- with the overall SL-A evaluation being the maximum level achieved on all FRs
Table 1 summarizes the result of the evaluation on an FR that has few SRs for the sake of illustration.
The table 1 matrix is directly extracted from the IEC 62443-3-3 appendix that summarizes the requirements. As for the Prykarpattya Oblenergo case and for each requirement (basic or RE), we have identified three possible cases:
- the available information is sufficient to consider the requirement met: ✔
- the available information is enough to figure out that the requirement was missed: ✘
- it is not possible to evaluate whether or not the requirement was met: ?
Table 1. Result of the evaluation of the SL-A for FR5
Once filled, table 1 corresponds to the actual evaluation of the FR5 for the case at hand (Ukraine), leading to an SL-A of 2. This means that network segmentation ("restrict data flow") was implemented for at least the basic requirements and for a few requirement enhancements.
Application to the Ukrainian case
This analysis was performed on all SRs, and two situations were identified:
- The SR may not be applicable (e.g., requirements about wireless communication in the absence of such media).
- We may not have direct evidence that the SR was met or missed, but deduction based on typical similar installations and other inputs allows a reasonable speculation about whether the requirement was met or missed.
For instance, we can consider "backup" missing, because disks could not be restored several weeks after the attack. Considering SR 5.2 RE(1), it is reasonable to consider that the secure shell (SSH) connection through the firewall was an exception and that all the other traffic was denied. The hacker would not have gone through the burden of capturing the password if more direct ways to reach the OT network existed.
Out of the 51 SRs, four were deemed "not applicable" (1.6, 1.8, 1.9, and 2.2), and 25 could not be determined ("?"). This is a large quantity, which means that only half of the SRs could actually be evaluated. This actually favors a higher SL-A, because only evaluated SRs are taken into account, and because by default we consider that the SR is potentially met.
Another decision was made in terms of data presentation. Instead of presenting the information with one requirement (basic and RE) per line, as in table 1, we decided to have one line per SR and list the increasing RE on the various columns. Table 2 illustrates the same FR5 evaluation using this mode of presentation.
Table 2. Estimation of the SL-A (FR5)
Table 3. Overall estimation of the seven FRs
Eventually, a more synthesized view was used without the RE text in order to present the overall picture for all FRs, which would span several pages otherwise. The overall estimated SLs are regrouped in table 3.
The results depicted in table 3 are rather bad. Furthermore, half of the requirements could not be evaluated, and, therefore, this view is probably optimistic.
On the right side, the estimated SL-As are listed for the seven FRs. We can see that the SL-As are zero except for:
- FR5 (restricted data flow): mainly due to the IT-IACS firewall and strict flow control. To comply with this requirement means that traffic between zones on the OT network should be filtered. The Ukrainian attack example demonstrates that this requirement could be reviewed in future updates of the standard:
- Complying with SR 5.2 does not require one to define zones. As in the Ukrainian case, all OT systems could interact with each other. Note that recommendations about zone definitions are available in ISA/IEC 62443-3-2 that should be used before applying ISA/IEC 62443-3-3.
- The requirement about traffic filtering between zones is set for SL=1. The return on investment is questionable, as the cost and risk of traffic filtering are high, and the effectiveness is questionable, as demonstrated by the Ukrainian case. It may make more sense to require detection as soon as SL-T=1 is targeted, and require active filtering/preventing for higher SLs.
- FR6 (timely response to events): The very existence of detailed forensic information is the result of minimal logging being in place.
Table 4 shows a detailed analysis for some of the most significant SRs.
Table 4. Specific analysis for some the most significant SRs
At first, looking at the reports about the various Ukrainian operator security controls, it looked like they had paid significant attention to cybersecurity issues. Indeed:
- nonobvious passwords were used
- a firewall with strict data flow restriction was in place
- significant logging was performed
But, as demonstrated in the SL-A evaluation, most FR security levels were null, because at least one of the SRs was not addressed at all. There is no point in setting up advanced security controls when some basic ones are missing. The weakest link drives the overall security effectiveness down. The fact that advanced security controls are useless if other basic security controls are missing is best illustrated by the configuration of the firewall with a single SSH link requiring a nonobvious password authentication. This is typically a painful operational constraint, as allowing direct remote desktop protocol (RDP) access for several systems, or virtual network connections (VNCs), would have been easier to use. Unfortunately, these additional constraints did not lead to increased security, because:
- The lack of IT network supervision did allow extensive network scans, vulnerability searches, and discovery of the allowed SSH link.
- The lack of strong authentication (two-factor) or local (OT) approval of remote connections made it possible to frequently connect from the IT to the OT network without detection over several months.
- The lack of OT network intrusion detection allowed extensive OT network scans, vulnerability detection, and mobile code (malware, exploits) transfer restrictions.
When deploying security controls, it is essential to apply requirements in a consistent way across all aspects of security: detection, prevention, and reaction. It is best to use a well-designed standard such as IEC 62443-3-3. Do not aim for SL-T=2 or 3 on some FRs if the SL-A is still zero on other FRs, as this would likely be useless.
Which SL would have been required to prevent the attack?
Looking at the issues listed previously, it appears that raising the SL-A to level 2 would have allowed detection of the activity during step two, thus preventing the cyberattack. Plenty of time was available for the post-detection reaction. Additional controls, such as strong/local authentication, anti-malware, and SL 2 requirements would actually have prevented the specific attack kinematics.
The fact that setting the SL-T at level 2 would have been enough to detect and prevent the attack with several layers of defense may sound surprising to the reader, as this was (quite certainly) a state-sponsored cyberattack, which normally calls for SL-T=3 or even 4 to prevent.
Actually, it is likely that the hacker could have matched SL-A=2 by developing more advanced exploits and using attack vectors other than the Internet, such as mobile media or mobile equipment introduced by rogue employees or third parties. Nevertheless, those additional steps are more complex and expensive, and, because they were not needed, less advanced means were used.
To summarize the takeaways of this cyberattack using IEC 62443-3-3 guidance:
As a mandatory first step, power distribution utilities should aim for SL-T=2, ensuring at least minimal requirements about detection (SR 6.2) are met.
To have several layers of defense, prevention, detection, and time for reactions in anticipation of the most sophisticated attacks, it is best to aim for SL-T=3.
In any case, it is essential to set up security controls in a consistent way to ensure that all FR have achieved the same SL-A before aiming for a higher SL-T. Otherwise the efforts are useless, as demonstrated by the example at hand.