The quest for the most magical algorithm
By Michael Risse
The challenges of machine learning and cognitive computing in the context of process manufacturing may be a poor forum for humor and irreverence. Then again, perhaps not: It often requires a sense of humor to express the challenges.
The tale begins with a request for proposal (RFP). This RFP says vendors will be given a cleansed and structured dataset from which they are expected, via their algorithms, to find insights. It is a beauty contest for data science, and whoever finds the most valuable insight wins.
So, what’s wrong with this picture?
Well, everything. It’s Monty Python’s King Arthur searching for the Holy Grail: insights, by way of the Holy Hand Grenade of Antioch, from algorithms and algorithms alone.
The starting point of this misadventure is the data. Data never starts out cleansed and organized. Someone has to do the work to make it so, and, in doing so, the data is substantially modified. By disconnecting data from the source, relevant updated or new data is removed. By defining the dataset for analysis, the opportunity to include related data sources that come up in the process of analysis, perhaps asset history or pricing data, is lost. By cleansing the data—a majority of the effort in many data science projects—data important to the analysis may be deleted because one person’s noise is another person’s critical data point.
A disconnected perspective
In summary, to create a dataset separate from its context is to create a disconnected perspective representing a static view of a constantly changing process. A smarter approach is ensuring the freedom to explore all data from all sources.
Unfortunately, after the data is defined and modified to the point where it no longer reflects reality, the next stage of the realism gap is time. Obviously yesterday is not today. Something, even if we do not know what, could have changed between the point of data capture and today, because many things can change in operating environments.
Formulas, raw materials, regulations, ambient temperature, recipes, and best practices all change. Assets get maintenance, sometimes to their detriment; personnel change each shift; sensors drift; and market prices adjust—which all undermine the sustainability of analytics efforts.
The constant change in the plant assets, markets, and products is why so many algorithm-based optimization exercises end upside down: Instead of algorithms feeding insights to employees, employees spend their time feeding data to the model to keep up with changes.
Then there are the participants in these efforts: data scientists. With all due respect for the education and expertise of data scientists, who are lucky enough to be working at what is the sexiest job of the 21st century according to the Harvard Business Review, most data scientists do not know process engineering. They do not have engineering degrees or front-line plant experience, or know first principles. They are highly educated and trained in computer sciences, but not in the reality of plants and processes.
Electricity and pressure
One example among the many strange remarks heard from data scientists performing analytics is “electricity consumption increases when pressure increases.” This is true, except when what was being described was simply a pump turning on—not a breakthrough finding. Without the expertise and experience of employees who know the systems and processes, the application of algorithms to plant issues is wasted effort.
As a result of these challenges to the data, the state of change, and the expertise deployed on these efforts, what often results are findings incomplete relative to the complexity and number of data sources in the plant. And the “findings” will include many insights easily dismissed by anyone with plant experience, including irrelevant or already known insights. They may be obvious correlations, or simply backward to reality, but to anyone with process engineering expertise this will be obvious.
Fortunately, there is a better way. Rather than bringing data to data scientists, what is needed for a successful data science adventure is to unite the three required components for advanced analytics success.
First is the data—specifically, providing access to all of it or as much of it as possible. Second is empowering process engineers and their expertise. And the third is consulting with data scientists as required to take advantage of their fluency with algorithms to solve specific problems. This means that while data will still flow to the data scientists, algorithms will be accessible to the process engineers, chiefly in the form of software applications.
This approach to advanced analytics is as agile as the plant environment, with efforts constantly updated with data, leading to actual improvements in production outcomes. There is no magic to this, or to the algorithms; rather there is a recognition of the multiple challenges faced with data science projects.