January/February 2010

Web Exclusive

The human side of safety

Guidelines on how to reduce process safety incidents caused by human error

FAST FORWARD

  • Human error is considered to be equal to operator error, and the focus is placed on training and procedures. In this way, other opportunities for reducing human error are overlooked, such as designing the loop for the human.
  • With modern safety systems and layers of protection analysis, incidents are mostly limited to situations when several failures occur simultaneously: A valve is in the wrong position; key information is spread across multiple operator displays; an alarm is missed; a trip is bypassed; and mechanical containment is found to be inadequate. Therefore, no single solution will be adequate.
  • There are seven practice areas the ASM Consortium has identified as a solution framework. When applied in full, the available best practices and research will significantly reduce the likelihood of human error in the design, operate, and maintain lifecycle.
By Mischa Tolsma, Melvin Jones, Dal Vernon Reising, Peter T. Bullemer, Chris Stearns, and Peggy A. Hewitt

"Safety is our first priority." Most companies will not only agree with this statement, but will recognize it as a core value of their corporate culture. Indeed, a great deal of attention and effort has gone into process safety and occupational safety, but one can argue insufficient attention has been given to the human aspect of process safety, sometimes called the "human in the loop." Often, human error is considered to be equal to operator error-and the focus is placed on training and procedures. In this way, other opportunities for reducing human error are overlooked, such as designing the loop for the human.

webs10Attention to the "human in the loop" has started to grow. For example, most companies will have implemented or heard of alarm management with EEMUA 191 and ISA-18.02 explicitly addressing human limitations. Nevertheless, there seems to be limited awareness of the breadth of the problem and the depth of the research available. First, the limited awareness of the depth is probably because the research is spread over different disciplines, i.e., from human factors engineering-sometimes called cognitive ergonomics-to control engineering. Second, and more important, the lack of awareness of the breadth of the problem is because it covers the entire design, operate, and maintain lifecycle.

This pervasive industry failing with respect to the "human side" is and remains important and has been the focus of the Abnormal Situation Management (ASM) Consortium for the past 15 years. The mission of this consortium, a group of 13 leading universities and companies in the process control industry, is to empower operating teams to proactively manage their plants to maximize safety and minimize environmental impact while allowing the processes to be pushed to their optimal limits.

In addition to the ASM Consortium, there are several organizations and research groups that study the human in the loop in continuous process control systems, such as the U.S. Nuclear Regulatory Commission; the human factors group at the Brookhaven National Lab; the OECD Halden Reactor Project; and EEMUA, PRISM and NAMUR in Europe. Like many of these groups, this article is intended to increase the awareness of the breadth and depth of the challenge in dealing with the human in the loop and to promote cross-disciplinary and cross-company research and best practice sharing.

webs11
Anatomy of a catastrophic incident

The challenge

With modern safety systems and layers of protection analysis, incidents are mostly limited to situations when several failures occur simultaneously: A valve is in the wrong position; key information is spread across multiple operator displays; an alarm is missed; a trip is bypassed; and mechanical containment is found to be inadequate.

Therefore, no single solution will be adequate. Rather, manufacturers need to look at all failure pathways and their interaction: A confusing procedure that has not been followed; operator displays difficult to learn, do not support the procedures, and do not show all relevant information; a non-rationalized alarm system with many nuisance alarms; communication failure around the operational state of equipment, perhaps because of unstructured shift handover; and insufficient management support for safety related activities.

These are all human errors. However, it should be clear the errors do not fall solely on the operator and no single discipline can solve this problem. The solution requires the collaboration from a variety of academic fields-control engineering, chemical engineering, human factors engineering, and management science-to name just a few. Also, it requires a structured approach such as the solution framework the ASM Consortium developed to investigate the human factor in abnormal situations.

webs12
A model of operations team activities for managing abnormal situations

The solution framework

Training and procedures are common areas that companies invest in to address the challenge of human errors. Arguably, training and procedures focus on operator action when the facility is running, and they may not correct for underlying errors made earlier in the lifecycle. Regardless, training and procedures, while important, are just two out of the seven practice areas the ASM Consortium has identified as a solution framework. The seven practice areas include:

  1. Understanding abnormal situations includes the broad scope of investigating the causes and impacts of abnormal situations. The goal is to prioritize future research and to efficiently and accurately inform continuous improvement programs that mitigate and reduce abnormal situations. One of the research projects in this area focused on the common causes for abnormal situations and has made specific recommendations on, amongst others, Root Cause Analysis investigations, common language and systemic error elimination.
  2. Organizational roles, responsibilities, and processes focus on determining the management systems, work practices, organizational structures, and continuous improvement culture that support the prevention and mitigation of abnormal situations. An interesting example of potential research is on effective first line leadership during abnormal situations: What are the skills required? What support is required? What tools could provide assistance?
  3. Knowledge and skill development looks at the development and maintenance of a competent workforce through training and the creation of a continuous learning environment so personnel can effectively respond and cope with abnormal situations. Research in this area has, amongst others, looked at the benefit of high-fidelity over low-fidelity simulation, and identified the need for training on effective usage strategies, e.g., how to use the alarm summary more effectively when faced with an alarm flood.
  4. Communications investigates ways to ensure successful communication to enable situation awareness under normal, abnormal, and emergency situations. Specifically, the practice area investigates how information media can be used effectively to ensure work continuity between operational and functional team members. Currently, research is undertaken to determine the benefits of electronic shift handover over paper based systems.
  5. Procedures investigate the different aspects that enable effective procedure use such as procedure development, deployment, accessibility, accuracy, analysis, automation, and lifecycle management. Also, it looks at ways of dealing with abnormal situations during a procedure and deviations from procedural intent. An example for this practice area is the development of improved operator display elements for procedural automation.
  6. Environment focuses on work place design factors that enhance the situation awareness of personnel, such as controlled lighting, reduced noise, and improved operator console layout. For example, a dark control room can significantly reduce the alertness of an operator over a 12-hour shift; an ineffective operator console layout will cause unnecessary foot traffic, which increases the potential for distraction during abnormal situations.
  7. Process control and monitoring looks at the effective design, deployment, and maintenance of a comprehensive and user-centered set of applications and tools that enable a single point of access to the information required by the operations team for situation awareness and abnormal situation response. For example, the consortium has recently designed a novel alarm summary display that has been shown to improve the ability of operators to deal with alarm floods; the consortium has a guideline on Effective Operator Display Design, which has been shown to increase situational awareness and reduce variability between operators.

The aim is not only to increase understanding, but also to identify best practices and develop solutions in the form of guidelines and tools to assist the operations team and the operator in particular. The research in these seven practice areas goes beyond appropriate operator training and well designed procedures. When applied in full, the available best practices and research will significantly reduce the likelihood of human error in the design, operate, and maintain lifecycle.

A collaborative approach

So, how can manufacturers reduce the occurrence of process incidents caused by human error to zero?
Of course, there is no silver bullet to take away the risk of human error-there are too many areas where error can take place. However, significant improvements can be achieved with the proper application of guidelines already available. Also, many research questions are known that, when solved, will further reduce the incident rate.
In summary, improvements will come from a collaborative approach that looks at the comprehensive framework and involves not just operations, but also design, budgeting, project management, maintenance, and, of course, leadership.

ABOUT THE AUTHORS

Chris Stearns (chris.stearns@honeywell.com) is a product manager for reliability and operational excellence solutions at Honeywell Process Solutions. Peggy Hewitt (peggy.hewitt@honeywell.com) is director of the ASM Consortium and director of Strategic Marketing at Honeywell Process Solutions. Dr. Peter T. Bullemer (bullemer@applyhcs.com) is a senior partner at Human Centered Solutions LLP, USA and specializes in the application of human factors principles and human-centered design methodologies as a to optimize the influence of culture, organizational structures, management systems and use of technology on operator and plant performance. Dr. Dal Vernon Reising (dreising@applyhcs.com), a senior partner at Human Centered Solutions, LLP, holds a Ph.D. in Industrial Engineering and is a human-centered design specialist in the hydrocarbon processing industries. Dr.ir. Mischa Tolsma (mischa.tolsma@sasol.com) is a senior engineer for Sasol Synfuels Ltd., South Africa, responsible for human factors and control engineering optimization in Sasol Secunda. Melvin Jones (MSc, MBA) (melvin.jones@sasol.com) is an area manager at Sasol Synfuels Ltd., South Africa, responsible for instrumentation and control engineering in Sasol Secunda.

Resources