01 July 2003
Avoid disasters for automation's future
Could this happen to you?
By Paul Gruhn
Unintentional disasters have plagued the industry so much in the past year—the Columbia shuttle, the pharmaceutical plant explosion in Kinston, N.C. , and countless others—it's no wonder safety is on the minds of automation engineers and plant operators. Yet the growing control automation in the process industry has caused facilities to actually cut staff. And increased automation has lead to the dumbing down of plant operators, a concern for the future of automation. Take a look at some disasters of the past to learn more about avoiding them in the future.
In 1993, a Lufthansa Airbus A320 landed at Warsaw Airport during a thunderstorm. It overran the runway and collided with an earthen bank. Only one crewmember and one passenger were killed. The aircraft was destroyed in the subsequent fire.
Modern aircraft are heavily automated with redundant diverse computer systems. In the Airbus, operators can only use ground spoilers and engine thrust reversers with both sets of landing gear compressed. They can use wheel brakes once the main landing gear is above a certain reference speed. The crew isn't able to override and operate the ground spoilers and engine thrust reversers manually.
The crew landed the plane at a faster-than-normal speed to counteract the effects of wind shear reported when the plane touched down beyond the normal touchdown point. The plane didn't fully touch down on both sets of the main landing gear, so the ground spoilers, thrust reversers, and wheel brakes didn't activate until many seconds later (due to hydroplaning), when the plane had finally slowed enough to compress both sets of the main landing gear. By this time, it was too late to stop the plane before it collided with an earthen bank.
Past analysis has attributed so many aircraft accidents to human error. Yet at a closer look, we could trace quite a few back to design issues. The Airbus philosophy was to give the computer final authority when there was a discrepancy with the pilot. Although there could be a good reason for this, we aren't at the point where we can build software to account for every possible condition. Trusting software above human intelligence and flexibility may be a mistake. At least three other Airbus accidents resulted in hundreds of deaths due to similar computer-versus-pilot control issues.
How confident are you that all operating conditions are accounted for in your automated control system? Because unknowns will always exist, we need to design our systems with this in mind. That's easy to say, but difficult to actually do. Detailed hazard analyses would reveal most scenarios—we hope. One way to know—make sure your automation systems aren't operating beyond the capabilities of your operators to adequately override them in case of an emergency. Humans are one of the safety layers in any process. They should have the ability to override certain controls.
|Lufthansa Airbus A320|
Lufthansa Airbus A320 overran the runway and collided with an earthen bank.
MARS POLAR LANDER
The Mars polar lander launched in January 1999—it was supposed to land on Mars in December of that year. Designers built legs to deploy prior to landing. Sensors would detect touchdown and turn off the rocket motor. The deployment of the landing legs generated spurious signals in the touchdown sensors. The software requirements, however, did not specifically describe this behavior, and the software designers did not account for it. The motor turned off at too high an altitude, and the probe crashed into the planet at 50 miles per hour—it was destroyed. Mission costs exceeded $120 million.
Are all operating parameters documented and accounted for in your design? Might the safety requirements differ during different plant operating phases, such as start-up, operation, maintenance, and shutdown? Have you considered and reviewed factors like these during your hazard analyses? Make sure you account for the impact of spurious sensor signals in the rest of the system design and operations. Cell phones, radios, walkie-talkies, and large motor starters can create signals to disrupt pieces of equipment. Do you have policies on the use of such equipment? Do all your affected systems have the appropriate level of shielding?
Mars polar lander
Terra Industries, Inc. operated an ammonium nitrate unit in Port Neal, Iowa. An explosion on 13 December 1994 released 5,700 tons of anhydrous ammonia and 25,000 gallons of nitric acid, killing four employees and hospitalizing 18 others. Officials evacuated residents from the surrounding area, and they detected ammonia plumes several miles away.
Analyzers used a probe to monitor pH in an ammonium nitrate neutralization tank. It was out of service for two weeks prior to the accident, yet operations continued. Operators were unable to determine when unsafe acidic conditions developed in the tank, which contributed to the accident.
The plant had not performed a process hazards analysis. Interviews with Terra personnel revealed they were not aware of many of the hazards of ammonium nitrate. No single engineer had responsibility for overseeing operation of the ammonium nitrate plant or reviewing operating procedures.
Remember—always perform a hazard analysis. Documents show we can prevent accidents if we conduct a proper hazard analysis first. The cost associated with doing such an analysis is a fraction of the cost of an actual accident. Make sure your operators are aware of the process risks. Studies have shown operators were not aware of risks posed by the process or their actions. Proper training should take care of this.
Make sure your instrumentation and safety devices are operating properly. You should periodically test all safety-related devices. Are you maintaining records that auditors can easily access—even internally?
Establish policies for operating with bypassed instrumentation. All devices fail; it's just a matter of when. If you must continue operating with an item out of service, realize the safety impact. Recent safety related standards clearly state you should have documentation and procedures in place to maintain the safety of the process if operators place something in bypass. Placing safety-related components in bypass has destroyed boilers and other process equipment—resulting in numerous fatalities and considerable financial loss.
One of the Soviet reactors at Chernobyl exploded in 1986. Two dozen operators and firefighters died within days. Thousands probably died downwind. At least 70,000 people across northern Europe were contaminated.
The accident was primarily due to reactor operators performing an undocumented test. They wanted to see, after they shut off steam from the turbogenerator, whether the still-rotating generator would create enough power before they could get auxiliary motors online. They thought it was an electrical test, not a nuclear test. In fact, it was both.
They thought operating the reactor at low power would be inherently safe. They didn't understand the unstable operation of the particular reactor design under such conditions. An analogy compared it to driving your car at one mile an hour— with the gas and brake pedals both floored. The reactor designers knew operation under such conditions was dangerous, so developers installed automated safety systems. Unfortunately, the operators really wanted to do their test, so they intentionally disabled all automatic safety shutdown systems. They violated their own safety rules!
Make sure your operators understand the operation of their facility. Many accident reviews have found personnel didn't have adequate process knowledge or realize the potential hazards. Periodically perform audits to determine if personnel are actually following safety policies. How else would you even know if they weren't? Does engineering and management really know what operations and maintenance are doing? Your engineering group may have written policies forbidding certain activities and practices, yet maintenance and operations may scoff at them as being unrealistic and impractical without realizing the underlying hazards.
If possible, modify the process to make it inherently safe. Instrumentation and procedure are merely bandage safety layers that add complexity, can fail, and can be overridden or bypassed. A better practice would be to modify the process so it's inherently safe.
BRENHAM GAS EXPLOSION
The Brenham gas explosion occurred northwest of Houston in 1992. An underground salt dome storing liquefied petroleum gas had a control room 90 miles away. No automated remote existed to shut the well in. There was no local breathing apparatus at the site and no flare system to burn off escaping gas. The operating company didn't accurately estimate the gas inventory, and a single pressure switch was all that detected back flow out of the well.
The pressure switch had a rated operating range of 160 to 2,000 pounds per square inch (psi), yet the operators had the set point set at 100 psi. Operators reported in their depositions that the switch didn't work half the time they tested it. And they didn't do anything about it. No one detected a back flow out of the well.
A car drove into the gas cloud and ignited it. The blast killed three, injured 23, and caused more than $6.5 million in damages. The explosion had an estimated force of a 3-kiloton bomb—people heard it from 100 miles away. A jury recently awarded $138 million in punitive damages and $5.4 million in compensatory damages.
What did we learn from this disaster? Make sure you have an accurate indication of your capacity and throughput. Check that it's within your design limit. Not having a clear indication is almost inviting disaster. Are construction materials still appropriate for the products you make? Are instruments able to measure the ranges you're currently running at? Are pressure vessels still adequate for their current usage, in terms of materials and wall thickness? Make sure all your safety devices are functioning and maintained properly. Your maintenance records must be adequate enough to indicate if they aren't.
Adequate safety layers are a must. Not having a flare, a breathing apparatus, or a remote means of shutting in the well seems rather surprising. Putting all your eggs in one basket is never a good idea. Providing adequate independent safety layers means when one system fails, which it inevitably will, another will be able to prevent the hazardous event.
The Ocean Ranger was the largest floating offshore drilling rig when it was built in 1976. It sank off the eastern coast of Canada in 1982, killing all 84 aboard. The rig's support legs were 40 feet in diameter. Crew used them for storage and working compartments. The designers located the ballast control room in one of the smaller middle legs, 27 feet above the water's surface.
The operators needed to observe the draft marks on the outer legs, so the ballast control room had four glass windows. These windows wouldn't open, but the thinner-than-specified glass would break under stress. Operators used buttons that operated electric solenoids controlling compressed air running down pipes that controlled valves in the pontoons. The valves connected pumps spaced along the pontoons that used seawater to control the trim of the rig.
The rig had a mechanical backup system installed as an afterthought during construction. It was designed to bypass the electrical ballast control in the event of an electrical failure. Nobody documented the system's operation. The operator wasn't formally trained on either system.
At the top of each larger outer leg was a huge chain locker used to store wire rope and anchor chains. The top of the lockers had holes 5 feet across, used to feed out the rope and chains. There were no means to close these holes and no indication if the lockers began to fill with water.
The rig was stationed more than 180 miles offshore in the North Atlantic with three (possibly four) working lifeboats, 10 life rafts, and a helipad. There were no full-immersion exposure suits. The crew had not trained for or attempted evacuation during a storm. The only way to survive would be to get everyone in the lifeboats and into the water safely—not an easy thing to do during a storm.
The rig crew did not ignore the approaching storm, and they worked to secure the rig. A few waves reached 50 feet, and one of them blew out a window in the ballast control room. The steel storm covers over the windows were not in place. Salt water shorted out the electrical control panel, and there was no way to dry it out. The mimic panel indicated valves were opening and closing on their own.
Power to the panel shut off one hour after the event, forcing all valves to go to their closed position. (Power should have been shut off immediately.) Nobody knows why, but three hours later, they restored power to the panel.
The rig continued to list out of balance. Eventually, water started entering the chain locker at the top of one leg and the larger storage compartment below. There were no indications of this happening. This made the rig list even further out of balance.
The crew tried to evacuate. The combination of high winds and seas caused the fiberglass lifeboats to crash against the rig legs, cracking them open. The crew called their support vessel, just 5 miles away, to evacuate them. It took the support vessel an hour to reach them in the rough seas. Only one lifeboat was left floating. The eight occupants perished trying to climb up the high gunwales of the supply boat's aft deck in the rough seas. The supply boat was not a rescue craft, and its crew had no gear to save the men.
The world's largest, supposedly unsinkable rig was lost with all aboard, due to a small porthole.
The Ocean Ranger offshore drilling rig sank off the eastern coast of Canada, killing all 84 aboard.
Such a loss is a real tragedy that could have been prevented. Make sure there are no undocumented systems operating in your facility. How many devices and systems have been installed to correct problems without undergoing a proper management of change procedure? If it's not documented, how many people know about it? How many should? Make sure your personnel are formally trained on all systems they are responsible for, as well as on added systems. Operation manuals should list the hazards of incorrect operation.
Perform a hazard analysis: hazard and operability study. The Occupational Safety and Health Administration and the Environmental Protection Agency require hazard and risk studies for covered facilities (those falling within certain design criteria), even for existing facilities. IT
BEHIND THE BYLINE
Paul Gruhn is president of L&M Engineering in Houston.