Drowning in data 

By Richard Palluzi

Thirty years ago, engineers had to carefully evaluate every request for data.   We looked at each thermocouple, flow, pressure, or level measurement for its value and utility to the overall process. We had to. Each point carried a relatively heavy cost to install, maintain, and read, as each required a new and separate device to read and collect the data. 

Today with microprocessors and cheap front ends, data input and processing is cheaper than ever, and as a result, we tend to add data as if it were free. In fact, we often end up close to drowning in data.

It was common, for example, to measure manually one or two points in a sample system where one wanted to avoid condensation. After some testing, we would select the coolest or most representative points and monitor them once a shift, content simply to confirm we were maintaining a temperature above condensation. 

Today the same system is likely to have a dozen thermocouples, all recording the temperatures and working up the averages and deviations, with high and low alarms. These are stored to file, printed out, and cursorily scanned-or often ignored. 

Similarly, a heated enclosure now has six or eight automatic temperature points rather than the formerly common one or two manual ones. A process reactor may have dozens of temperature monitors rather than a select three or four. 

Each component in a process may have a separate pressure transducer; flows, even for purge gases, may receive scrutiny, undergo measurement, and archiving. We tend to measure and monitor everything.

What do we do with all this data? 

Automation parses, stores, presents, and prints the data for us. It comes to us in online process diagrams, grouped in common displays, and available in any unit we desire. 

However, how much of it do we need to make the daily decisions as to how to run the process?

I have seen numerous cases where techs scanned entire screens for only one key value or even tossed out whole printouts unread except for a few important points. 

Worse, I have seen less experienced personnel unable to quickly sort through voluminous data to find the few key points to let them know what is causing the process upset in time for them to take prompt action to recover before the process shuts down. 

They are in data overload.

All this extra data is indeed cheap. However, it's not free. A $25 thermocouple connected with $5 of wire to a $50 input point seems cheap, but it has to go on the P&ID, receive specification, install, undergo programming, and testing. Moreover, the sum of all those costs is easily greater than $2,000 a point.

It's time to restore some discipline to our data selection. 

We need to ask ourselves a few simple questions. 

  • Do we need this point?
  • What will we do with the data aside from have it available? Will it help us run or troubleshoot the operation? 
  • Can we get the data another way without adding another point(s)? 
  • Will the data tell me anything new? 
  • Do we need the data all the time? 
  • Should we make provisions to add the point if necessary or read the data locally without recording and work up in the event of a problem instead? 
  • When envisioning multiple points, can we better serve our needs by selecting the most appropriate ones after start up and leaving the rest unconnected? 
  • If we have the data, will it suggest we do anything different from what we would do with the other data we already have on hand?

Answering these questions will, I think, often suggest the additional data is not always required. 


Richard Palluzi (richard.p.palluzi@exxonmobil.com) is a senior member of ISA and a registered PE. He is a Distinguished Engineering Associate at ExxonMobil Research and Engineering. He has two chemical engineering degrees.