What’s next for big data in process manufacturing
Big data stored in process historians for decades has often been underutilized—new advanced analytics simplify access to solutions by accelerating time to insight
By Michael Risse
According to the big data analytics cover story in the November/December 2015 issue of InTech (www.isa.org/intech/20151201), "Manufacturing firms have the benefit and curse of technology refresh rates typically measured in decades, and technology selection processes typically measured in months or years. This is a benefit, because the technology we buy is built to last and provides a return on investment well beyond that of other industries. It is also a curse. Since we do not get new technology very often, when we get the chance we often bite off more than we can chew, are paralyzed by fear of the unknown, or seek out the new shiny toy."
Big data is no longer new, but as the quote points out, it takes more than availability for new technologies to improve process manufacturing organizations. Implementing new technologies requires innovative products, opportunity for adoption, and applied use by skilled personnel.
As a result, while big data may be entrenched and accepted in less constrained environments, it is only now leaving its introductory stage for most process manufacturing organizations. Early adopters have deployments and have achieved success, and many companies are evaluating or including new analytics projects on their road maps, but the single most popular term in articles on big data and the manufacturing industry is "opportunity." As in, the opportunity is out there, but there is still too much:
- data (mostly stored in process historians) and too little insight
- hard work instead of easy access to innovation
- unplanned downtime
- time spent cleansing and modeling data before analytics can even begin
Ask process engineers what their most commonly used analytics tool is, and for most the answer is a 30-year-old, single-user, heavyweight-client piece of software called a spreadsheet. Ask managers about their data environments, and they will answer in terms of process historians, asset management systems, and other silos.
So big data, for many companies, is still out there as a promise, waiting for the intersection of innovation and opportunity to bring new insights and improved production outcomes to plants and organizations. And no industry has more need to create value from big data. Manufacturing plants generate twice as much data as any other vertical market, according to McKinsey and Company research from the seminal report that launched big data awareness-and hype-back in 2011 (figure 1). With this much data comes a corresponding opportunity for improvement, to the tune of $50 billion in the upstream oil and gas industry alone (figure 2).
Help is on the way as big data offerings and expectations have changed to better fit process manufacturing requirements. The market has moved from a Model T, "any color so long as it is black" product to a variety of sizes and shapes to meet customers' needs. The interface or user experience with many big data applications, for example, is no longer an erector-set experience.
In fact, a new set of themes around big data is emerging just as more companies are open to and interested in new advanced analytics experiences. If plant evolution is measured in decades, and big data awareness and innovation is approaching a decade of investment, the time should be ripe for implementing new technologies.
This article will discuss four of the expectations associated with a modern big data experience in process manufacturing firms. Fulfilling these expectations will result in a more polished and higher-level experience by taking advantage of the data management, storage, and analytics capabilities now available to improve production and business outcomes.
Figure 1. No industry produces more big data than manufacturing, creating a huge opportunity for improvement through advanced analytics.
Figure 2. Better use of big data presents a $50 billion opportunity in upstream oil and gas facilities, with hundreds of billions of dollars in opportunity across other process industries.
Context across data sources
The three "Vs" of big data-velocity, variety, and volume-are well known and have been part of the big data definition for longer than the term big data. But of the three, one of them is far more of an issue in process manufacturing than the other two.
The issue is not volume, because process historians and other sources have plenty of data stored and available for analysis. Similarly, velocity has a number of solutions with high capacity networks and faster ingest rates for historians. Variety, however, presents the biggest challenge to advanced analytics, and new big data solutions are working to address it.
The challenge with variety is that most existing plant sensors support only a limited data set of time, value, and perhaps state. Therefore, the most typical data type in manufacturing, time-series signals, is by definition separated from other data sources, which store the related context. So, before any investigation can take place, an engineer has to deal with the variety issue-in particular the integration of continuous analog signals with the relational or discrete data sets stored in other databases.
This integration, usually done by hand, is one of the biggest drivers of spreadsheet use within organizations. Even organizations with information models in enterprise manufacturing intelligence (EMI) solutions have to rely on spreadsheets for ad hoc analytics, because if a data set is not integrated and modeled in the EMI, and it rarely is, then it is back to square one and interpolation, alignment, and time matching by hand.
There are many terms for the alignment and integration of unlike data types in the industry. Data blending, data harmonization, and data fusion are three examples-but for process manufacturing firms, the term typically used is contextualization, which is adding context or information about the data as attributes of a time range.
This could be data stored in another source, for example, the periods of time defined by a batch stage or asset state in a manufacturing execution system (MES) or computerized maintenance management system. The context could be within the time series data itself, defined by when a reading is above or below a certain threshold. Or it could simply be time periods of interest, for example, when a signal "looks like this," with context created to define when a shape or pattern is present in a signal.
In each of these cases, context is added to identify the time periods of interest. Once identified, these time periods can be combined to create a new set of time periods describing an exact, multidimensional data set for analysis (figure 3). With new big data capabilities, there need not be any bounds to the depth or number of "stacked" layers required, up to 15 or more sequential layers of criteria in some cases. With most analytics efforts requiring integration of data from five to seven different sources, this is a critical advantage over current approaches.
With unlike data types, in particular time series and relational data sources, advanced analytics can get off to a slow start by requiring extensive manual mapping of data types, not to mention data cleansing and other aspects of data preparation. But with recent innovations, underlying big data technologies provide this type of data connectivity, alignment, and mapping to accelerate the definition and modeling of complex operations. What was once the month-long job of programmers and application programming interfaces (APIs) can now be features any process engineer can implement in minutes.
Figure 3. Using Seeq capsules, engineers can combine time periods to create a new set of time periods describing an exact, multidimensional data set for analysis.
Delivering self service
In the early stages, big data meant programmers writing code to map the analytics of a large data set to a cluster of compute nodes, and then to reduce the output from the nodes into a consolidated summary. The MapReduce algorithm, which defined this programming model, was open sourced by Google in 2004 and became the basis for Hadoop, which was later commercialized by vendors such as Hortonworks.
At the same time, Google did not expose the MapReduce API to users as the interface to their search engine. Instead, they presented the algorithm's functionality in a simple web page where any customer could simply search for whatever they wanted by just typing in data in plain English.
This approach to wrapping complex functionality in easy-to-use interfaces is a common experience in our lives as consumers, and these same approaches are now being adopted by analytics offerings for engineers in process manufacturing.
For example, the ability to "search like Google" across all the tags in a historian or other big data storage system is now available in some advanced analytics software. Other capabilities that make big data innovations more easily accessible are similarly delivered. This enables (never allows) engineers to work at an application level with productivity, empowerment, interaction, and ease-of-use benefits.
The ability to transform complex data science programming to features easily used by engineers is a critical capability of the advanced analytics offerings. Although there has been much excitement about data scientists and their role in improving production outcomes, such as the Harvard Business Review's "Sexiest Job of the Century" article back in 2012, more recent articles and anecdotes from end users tell a different story.
The issue is that while data scientists know their algorithms, they do not know plant processes and context. There has been a more recent spate of articles on the need for data translators or data liaisons between data science and engineering teams. But all of this can be avoided if vendors simply close the gap and bring data science innovation to engineers by creating features that enable self-service, advanced analytics for engineers and other subject-matter experts (figure 4).
The strategy cannot end with engineers, however, because self-service is what engineers have been doing for 30 years with spreadsheets. Therefore, the new generation of advanced analytics for big data must empower teams and networks of employees that rely on production and operations insights within the organization. If that sounds like fancy language for dashboards and reports, there is a critical difference.
The key change is maintaining a connection between the analysis that is created and the underlying data set, so users can click through and get to the underlying data. These advanced analytics offerings can be used to produce not just pictures of data in visualizations but can also provide access to the analytics and sources that generated the outputs. Engineers, teams, managers, and organizations can therefore use these new capabilities to enable the distribution of benefits throughout a plant and a company.
Figure 4. Advanced analytics software provides self-service capabilities for engineers to create various views of data.
Revolution in deployment
"We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run," observed Roy Amara, past president of The Institute for the Future. If big data is not new, then certainly the cloud is not either. Some popular picks for the start of cloud computing include the introduction of the first big "SaaS" application (Salesforce) in 1999; the introduction of AWS by Amazon in 2002, and then S3 and EC2 in 2006; and when cloud computing competition got interesting with Microsoft's and Google's cloud platform introductions in 2008. So conservatively, like big data, there is a decade of history and innovation to leverage for advanced analytics.
To be clear, the cloud is not a requirement for big data implementations. If someone says a cloud deployment is required for advanced analytics with big data, he or she is likely a cloud salesperson seeking quota fulfillment. In our experience, the data is just fine, wherever it is; it is the analytics that needs attention.
That said, there are many good reasons to leverage cloud computing, and it certainly has momentum in its favor, although it is impossible to generalize proposed benefits versus specific costs for every organization. Costs can include security, data governance, time to gain approval, and actual cost of deployment. If one couples the cloud, or not, with innovations in both open source data management and Industrial Internet of Things (IIoT) cloud platform investments, the result is a host of tempting elements for deployment of big data solutions. Consider that in just 90 days in late 2017 and early 2018, eight companies received more than $250 million in investment capital for open source data storage, IIoT cloud platform, and IIoT analytics-and one gets a sense of the interest in advanced analytics.
What this means is that the current model for big data storage in process manufacturing-which is on-premise, historian-based, and proprietary-is undergoing a transition, enabling new alternatives for how and where advanced analytics are run. The new model might be a data lake for data aggregation, on-premise or in the cloud, or a comprehensive IIoT solution-like a next-generation data storage platform. At a minimum, current process historian vendors need to introduce road maps with safe passage for data from on-premise offerings to the cloud.
As a vendor of advanced analytics solutions, here are examples of what this means to the end users we work with daily. Three years ago, we had customer requests for sales engineers to visit them on site to work with their on-premise and air-locked data sets. Today, in contrast, we have customers sharing five-year road maps that integrate cloud-based offerings, and specifically asking for context on some of the open source offerings, such as Hortonworks and InfluxData. The assumption that data can never, or will never, move to the cloud is increasingly uncommon, and has changed quite quickly in process manufacturing over the past few years.
Not only will the services and deployment models change, but new vendors will enter the market for data management and analytics. In particular, Microsoft, Google, and Amazon all have cloud platforms and time-series data storage services-Cosmos DB, Bigtable, and Dynamo, respectively. All three have acquired IIoT platform companies (Solair, Xively, and 2lemetry, respectively) to build out their manufacturing solutions.
GE with Predix, Siemens with MindSphere, and PTC and ThingWorx may have more industrial domain knowledge, and OSIsoft starts out with the best customer base and richest on-premise offering, but the deployment revolution offers flexibility in deployment and service levels for how companies license and run advanced analytics solutions.
The manufacturing industry would be well serviced by a marketing dictionary to define the large number of buzzwords, technology eras, and "marketectures" (marketing architectures that run on PowerPoint). In this dictionary of terms, big data would of course be included under "B," but it would be preceded by "analytics." Analytics: descriptive, predictive, diagnostic, interactive, prescriptive, basic, real-time, historical, root cause, and so forth. Analytics is now so over used that the word has lost specific meaning in a 30-year history of spreadsheets and in a 20-year role with the term marketed for "actionable insights."
But now, the role of analytics has to change to address the volume, challenges, and opportunity associated with massive data volumes, variety, etc. To the rescue comes a new entry to the dictionary, "advanced analytics." Just as adding "smart" to a noun denotes a thing with sensors for telemetry and remote monitoring services (e.g., smart refrigerator, smart parking lot), adding "advanced" to "analytics" brings analytics into a modern framework for today's challenges.
Specifically, advanced analytics speaks to the inclusion of cognitive computing technologies into the visualization and calculation offerings that have been used for years to accelerate insights for end users. As McKinsey and Company defines advanced analytics solutions: "These [advanced analytics solutions]-which provide easier access to data from multiple data sources, along with advanced modeling algorithms and easy-to-use visualization approaches-could finally give manufacturers new ways to control and optimize all processes throughout their entire operations."
What has happened is that vendors have recognized there is too much data from too many sensors, and potentially of too many types, for one person to simply solve problems manually with a spreadsheet. Therefore, through the introduction of machine learning or other analytics techniques, an engineer's efforts must be accelerated when seeking correlations, clustering, or any needle within the haystack of process data. With these features built on multidimensional models and enabled by assembling data from different sources, engineers gain an order of magnitude in analytics capabilities, akin to moving from pen and paper to the spreadsheet.
These advanced analytics innovations are not a black box replacement for the expertise of the engineers, but a complement and accelerator to their expertise, with transparency to the underlying algorithms to support a first principles approach to investigations. In this way, it is a natural next step in the history of statistical and control processes, rather than a data science approach to investigations. At the same time, advanced analytics recognizes the path to quicker insights must leverage innovations in adjacent areas to address the scope of data available for investigation.
Same last mile
As process manufacturers find an opportunity when their plant transitions or capital investments enable the introduction of new advanced analytics capabilities, they will find a new set of features and experiences, far removed from the early days of the big data era. Applying these advanced analytics solutions to big data will improve the user experience by accelerating the path to implementation.
Contextualization, self-service for organizations, new platforms options, and advanced analytics capabilities benefit from years of vendor investment and early adopter feedback. The one thing they do not guarantee, however, is success in the last mile of any analytics project, big or small, which is the landing or adoption of new insights into a conservative and questioning culture. That, always, is the largest obstacle for any analytics project, which no amount of technology innovation can paper over.
But by embracing an analytics culture and the innovation now available, organizations can seize the opportunities to create value from their big data, allowing them to remain competitive.