1 June 2007
It's risky business as usual
Industry and government recognize the potential for digital technology to enhance safety and reliability in nuclear power plant operation. Uncertainty exists in licensing.
By Dave Blanchard and Ray Torok
When implementing a digital upgrade at a nuclear power plant, an area of technical and regulatory uncertainty is in the application of risk-informed techniques.
All nuclear power plants now have plant-specific probabilistic risk assessments (PRAs) with which risk-informed analyses are routinely performed in support of the operation, maintenance, and licensing of the plants.
However, current regulatory guidance with respect to evaluation of digital upgrade design remains largely deterministic and does not yet take advantage of the PRAs.
We will examine the following:
What technical issues are barriers to the use of PRA in the evaluation of the risks associated with a digital upgrade?
What kinds of risk insights can be derived from a plant-specific PRA using existing techniques that are useful in support of licensing a digital upgrade?
Evaluation of digital systems
Over the next few years, many nuclear stations will be initiating design changes to I&C systems using digital-based systems to replace obsolete, aging analog instrumentation. In replacing this I&C, utilities wish to take advantage of the added reliability and operational capabilities available using digital technologies.
When designing such a digital upgrade, licensees perform a “defense-in-depth and diversity” (D3) evaluation.
The purpose of a D3 evaluation is to consider the effects from postulated digital or software related, common mode failures that could be introduced by the digital main menu upgrade.
These new common mode failures may have the potential to compromise the redundancy built into the mitigating systems into which the digital I&C is to be installed.
Nuclear Regulatory Commission (NRC) guidance with respect to D3 evaluations is in the Standard Review Plan. Developed prior to the completion of the plant-specific PRAs, this regulatory guidance is largely deterministic in nature and strictly focuses on evaluating design basis events. Given this focus, accident scenarios that we now know from the PRAs could dominate risk, are not considered.
In addition, licensees expend much effort to evaluate and perhaps modify the design of the digital upgrade to address the effects of potential common mode failures on design basis events that PRAs have demonstrated to be of limited safety significance.
Finally, as the NRC reviews evaluations, licensees are finding acceptance criteria originally published as a part of current regulatory guidance may no longer be acceptable. This introduces significant licensing uncertainty and has had a chilling effect on plans to upgrade I&C at several plants.
Here, we will introduce the use of PRA in the performance of D3 evaluations to limit scope as well as assure completeness in assessing adequate D3 in the design of a digital system.
PRA is successful for a variety of other generic safety issues where the original regulatory guidance is incomplete and/or overly burdensome without corresponding benefits in safety.
We will look at possible limitations of PRA in this evaluation of digital systems. Existing techniques for generating insights from PRA in the performance of D3 evaluations are here.
Finally, we will compare the advantages and limitations of PRA approaches versus current regulatory approaches.
PRA address rules, issues
All nuclear power plants have a plant-specific PRA that has been performed to evaluate the potential for severe accident vulnerabilities. These PRAs are successfully identifying risk insights and focusing the scope of many regulatory related programs such as in-service testing, in-service inspection, and quality assurance.
In addition, the PRAs are serving extensively in support of technical specification changes and inspection and enforcement activities. The industry and the NRC have identified numerous benefits to the use of PRA in its support of licensing actions, among them the evaluation of design modifications to the plant. A digital upgrade is such a design modification for which it may be useful to evaluate using the plant-specific PRA.
Subsequent to development of the PRAs, the NRC published the PRA policy statement, in which the Commissioners stated:
Because PRA considers the frequency of a broad spectrum of initiating events and combines them with an assessment of the reliability of mitigating systems, including the potential for multiple and common cause failures, it is an extension and enhancement of traditional regulation.
PRA techniques are most valuable when they serve to focus the traditional deterministic-based regulations and support the defense-in-depth philosophy.
The PRA approach supports the NRC’s defense in-depth philosophy by allowing quantification of the levels of protection, and by helping to identify and address weaknesses or overly conservative regulatory requirements applicable to the nuclear industry.
Consistent with these assertions, the Commissioners have instructed the NRC and encouraged the industry to increase the use of PRA in:
All regulatory matters, to the extent supported by the state-of-the-art PRA methods and data, and in a manner that complements the NRC’s deterministic approach and supports the NRC’s traditional defense-in-depth philosophy.
Regulatory matters, where practical within the bounds of the state-of-the-art, to reduce unnecessary conservatism associated with current regulatory requirements, regulatory guides, license commitments, and NRC practices.
Given the industry currently performs D3 evaluations in designing a digital upgrade and that NRC policy considers PRA to be an extension to traditional defense-in-depth philosophy, an obvious application of the plant-specific PRAs would be in performing these D3 evaluations. The question is whether the state-of-the-art for current PRAs would support such an evaluation.
In attempting to model digital systems within a PRA, it is important analysts recognize there are differences in the manner in which digital equipment “fails” as compared to the analog systems that they are replacing. Among these differences are:
The software in digital equipment is not a physical entity, as would be the case for analog equipment, and is not subject to wear out or random failure.
The failure modes of digital equipment, should it fail, may not be well defined.
Given the same inputs, software will produce the same outputs every time. As a result, the “probability of failure” of digital equipment and its software relates to the potential that the equipment will encounter conditions for which designers did not plan and for which it will respond in a manner that is adverse to the function of the system it should be protecting.
If digital equipment is to be in the logic models of the PRA, there must be methods that differ from those used to model traditional analog systems. They do not exist now. NRC staff has expressed the view that methods for identifying failure modes, modeling their effects, and estimating failure probabilities of digital equipment have not yet been successful. To address these issues, NRC research is in the process of reviewing a variety of methodologies (e.g., Markov modeling, dynamic fault trees, etc.) to model the dynamic effects of potential digital failure modes. This review is part of a multi-year research plan that is to culminate in regulatory guidance with respect to how to incorporate digital equipment in PRA. The NRC concludes that at this time, the modeling methods needed to support current risk informed methods are not available.
Evaluation insights from PRA
It is clear methods for identification of digital equipment failure modes, modeling their effects, and estimating their probabilities are still evolving. However, the results of recent Electric Power Research Institute (EPRI) investigations show insights with respect to the acceptability of digital system design from a D3 perspective can still emanate from the plant-specific PRAs.
NRC research perspective vs. EPRI research perspective: In deriving risk insights using current techniques, one must examine the PRAs from a slightly different perspective than is being investigated by NRC research.
Aside from its consideration of dynamic techniques, the NRC’s approach to modeling digital equipment is similar to traditional approaches to performing risk evaluations for any plant modification.
It is effectively a “bottom up” approach in which the changes to the plant (in terms of the digital system and its failure modes) are a part of the PRA and the consequences of these changes objectively determined through a regeneration of the PRA results.
The EPRI analysis is more of a “top down” approach in which the objectives of what is to be achieved are first defined (in terms of the mitigating system failure modes to be avoided), and then design features that assure those objectives are met are determined. Both approaches were successful in the past in implementing risk informed changes at nuclear power plants.
The NRC approach will probably have the advantage of generating detailed knowledge of the important failure modes of various types of digital equipment, as well as producing more precision in terms of the likelihood of these failure modes.
The EPRI approach carries with it more uncertainty with respect to these failure modes and, hence, a higher likelihood of needing additional design features to deal with the uncertainties associated with these failure modes.
However, it still offers improvement over the current regulatory approach.
The EPRI approach has the advantages of being available now, and given the use of commonly applied and accepted risk techniques, it is capable of providing input to the design of near term digital upgrades prior to their installation.
If the NRC approach yields practical results, they will not come for several years, so they will have limited potential to influence those upgrades currently in the planning process.
EPRI research on the application of PRA to digital upgrades: The EPRI D3 project took a pragmatic approach, acknowledging precise quantification of software reliability is a very difficult problem.
The focus was instead on important engineering insights that one can glean from an understanding of the role of the software in the broader context of the plant system and the plant itself.
The current generation of PRAs is capable of generating a number of insights with respect to the design of digital systems.
In developing an approach for the performance of risk-informed D3 evaluations, EPRI recognized the need to exercise its guidance using actual PRAs. We obtained five plant-specific PRAs from participating utilities for these evaluations, including those for three Westinghouse plants of various vintage, a Combustion Engineering unit, and a GE BWR. We used some of the PRAs to perform simple sensitivity studies on the effects of digital failures at the system level, while others helped to test guidance for complete accident sequences. A general approach to modeling the effects of digital common cause failures evolved as a part of these exercises. The approach considers three factors as being important in the performance of a D3 evaluation using the plant-specific PRAs:
Factor 1: Reliability of a division of digital I&C (or, conversely, its failure probability)
Factor 2: Potential for a redundant division of digital I&C to fail given the failure of the first division (or common cause b factor)
Factor 3: D3 between the mechanical and electrical trains of equipment into which the I&C is to be installed
The reliability block diagram illustrates the incorporation of these three factors into the models of a PRA for the evaluation of the possible effects of digital and software related common cause failures.
The product of Factors 1 and 3 represents the potential for common cause failure of the new digital I&C (represented by the common cause beta factor). The EPRI guidance has the analyst perform a review for susceptibilities to digital failures in estimating the probability of these two factors. This review identifies key design features of the digital upgrade, which limit the potential for or effect of digital failure mechanisms. Such design features include fault tolerance, self-diagnostics, data validation, and the like.
Factor 3 recognizes the new digital equipment will be part of the mechanical and electrical mitigating systems that are available to respond to a spectrum of initiating events that may occur at the plant. These mechanical and electrical systems carry their own levels of D3, which are desirable to maintain following the installation of the new I&C.
The explicit consideration of defensive measures outlined by the EPRI guidance results in the identification of potential susceptibilities to digital failure as well as the evaluation of the dominant causes of digital failure and the potential for digital common-case failure. The consideration of the D3 in the existing mitigating systems that are to be controlled by the new I&C provides a determination of where D3 in the I&C itself is of value.
Together, these factors allow for an integrated look at the effects of potential digital common cause failures on the plant as a whole and not just within the functions performed by the digital I&C itself.
Several general conclusions came out of the EPRI evaluations using the plant-specific PRAs:
I&C, as modeled in these PRAs, does not typically dominate risk. This conclusion has limited impact on determination of the risks associated with the final digital upgrade. Its implications, however, are that there is little room for improvement of current risks associated with the plant I&C. If the NRC or the licensee concludes that additional diversity is needed in a digital upgrade beyond that of the existing I&C, this implies the new digital I&C is perceived to be less reliable than the analog system it is replacing.
The D3 of the mitigating systems dictates the level of D3 that is of value in the I&C. In determining where D3 is of value in the digital upgrade, keep in mind this I&C does not mitigate an accident but is installed in mechanical and electrical systems that provide the needed mitigating functions in response to specific initiating events. These mechanical and electrical mitigating systems have an inherent level of D3 modeled explicitly in the plant-specific PRAs. Where this existing mechanical and electrical system related D3 is important in keeping risk acceptably low, effort should be made not to introduce new common mode failures from the digital I&C that would compromise this D3. What this suggests is the existing D3 found in the mechanical and electrical systems in the plant should be an input to the design and licensing of the digital I&C, as it indicates where D3 is of value in the digital upgrade.
The reliability of a digital division of I&C needs only be similar to that of a comparable analog division. With the recognition that the digital I&C should have similar D3 to the mechanical systems into which it is installing, it becomes obvious that the digital divisions of I&C need only be as reliable as the analog divisions of equipment that they are replacing. As noted earlier in this section, the EPRI guidance provides a listing of design features of digital equipment that are desirable in assuring that the reliability of the equipment equals or exceeds that of similar analog equipment. These design features are here as defensive measures against the failure of digital systems and provide assurance that the digital equipment is at least as reliable as comparable analog trains.
In response to obsolescence and increasing maintenance costs, nuclear plant operators are upgrading their existing instrumentation and control systems. Upgrade solutions often include digital technology due to its availability, operating flexibility, and potential for performance and reliability improvements. Technical and licensing issues associated with the implementation of a digital upgrade include the need to consider the potential for new behaviors and failure modes caused by software or other digital system design flaws. Current regulatory guidance directed at the evaluation of digital systems to assure adequate D3 against the occurrence of digital common cause failures is resource intensive, often results in added complexity to the plant that does not address safety, and is potentially incomplete in addressing risk significant accident sequences.
To address these potential shortcomings, EPRI has developed guidance with respect to the performance of D3 evaluations that takes advantage of the existing plant-specific PRAs to assure the final design of the digital system as well as efforts in performing the D3 evaluations focus on areas most important to safety. The risk-informed framework provided by the EPRI guidance not only addresses potential adverse behaviors of digital equipment in risk-significant applications but is a significant improvement over the simple, but overly restrictive, assumptions made with respect to software common-cause failure in current regulatory guidance.
Both the industry and the NRC recognize the potential for digital technology to enhance safety and reliability in nuclear power plant operation. However, uncertainty in licensing of digital upgrades based on current regulatory approaches is resulting in delays in the implementation of these systems and increased costs. As the need to replace existing I&C systems becomes more acute, a consensus approach to treating digital technology-related issues is necessary to assure consistent and predictable licensing. Accordingly, it would be helpful to both utilities and regulators to consider the insights offered by plant-specific PRAs before implementing I&C upgrades and diverse backups that may not have the desired effect on safety. The EPRI guidance for the performance of D3 evaluations provides such an approach and is consistent with the risk-informed direction that is happening in other areas of the operation and regulation of the nuclear power industry. EPRI is encouraging the NRC to endorse this guideline for w
idespread use by nuclear plant operators in the licensing of future digital upgrades.
About the Authors
Dave Blanchard (email@example.com) is president of Applied Reliability Engineering. Ray Torok (firstname.lastname@example.org) is an ISA member and a project manager at EPRI. Read their complete work including “Comparison of the current regulatory approach for performing D3 evaluation with risk-informed methods” at http://www.isa.org/intech/june07nuclear.
Probabilistic risk assessment (PRA), or probabilistic safety assessment/analysis, is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity such as airliners or nuclear power plants.
Risk in a PRA is a feasible detrimental outcome of an activity or action. In a PRA, risk has these characteristics: the magnitude (severity) of the possible adverse consequence(s), and the likelihood (probability) of occurrence of each consequence.
D3 (defense-in-depth and diversity) is a method laid out in a regulation NUREG/CR-6303 for assessing digital systems, which may have potential for common-mode failure in redundant units running common software.