• By J. Patrick Kennedy, PhD
  • Automation IT
Scale and scope
The driving force of Industry 4.0

By J. Patrick Kennedy, PhD

A large steel manufacturer wanted to use its operational data to improve its competitive position. By concentrating on condition-based maintenance and product quality, the manufacturer increased equipment availability by 13 percent and boosted the percentage of "prime" product from 76 to 91 percent, while slashing the percentage of maintenance conducted in the more costly reactive mode from 80 percent of the total to 20 percent.

On paper, the company looks like a success story for Industry 4.0 or a similar Internet of Things (IoT) initiative, but this story is older than either of those buzzwords. The above example comes from a presentation by Dofasco (now part of ArcelorMittal) in 2002. The phrase "Industrie 4.0" would not be coined at Hannover Messe for another nine years.

The automation industry has always liked buzzwords and standards. People debate the definition of concepts like big data, IoT, machine learning, Industry 4.0, digital transformation, and digital twins, but they are all a single effort to drive change by increasing two factors: scope and scale. By increasing scope (the number of data sources) and scale (the amount of data collected), companies broaden and deepen the information they tap into for a more accurate, actionable picture of reality. The greater the scale and scope, the greater the opportunity to increase the productivity, efficiency, safety, and ultimately the health of your business.

What is different about what is occurring today versus 2002, or even 1992? From one perspective, it is business as usual. Good engineers are never happy with just the data they have on hand. They should always be (and usually are) seeking out new ways to do things better and looking for numbers that can either support their hypotheses or prove them wrong.

On the other hand, everything is different. Moore's Law, artificial intelligence, increasing global competition, and a changing economic environment have created a new business landscape. Producers now have to be more sensitive to the demands of suppliers, regulatory bodies, and their customers, and there is a new premium on intelligence, efficiency, and agility. Companies can no longer simply be manufacturers. They have to be software companies and manufacturers. The scope and scale of your data have fundamentally changed how industry operates.

Industry 4.0 is well documented. In this article, I will use it as a model to describe the real-world requirements of businesses and how data architectures need to be designed to meet those goals.

Quick history of historians

The new demands being wrought by increases in scope and scale have sparked an evolution in historians and data infrastructures. Early data systems collected information from individual assets. Later, those systems were expanded and enhanced to collect data across plants and entire enterprises. And right now, data architectures are being implemented that will allow enterprises to seamlessly exchange information with each other to improve supply chains. Seven years ago, many large industrial companies would have blanched at the idea of sharing their data or putting it on the cloud. Now, most are studying ways to build digital communities.

Likewise, the community of users of this information is expanding, moving from engineers and operators to CFOs, capital planning analysts, buyers, and others. We are also seeing old assets from the '70s and '80s being wired into the main corporate networks for the first time thanks to more robust, less expensive wireless technologies and data management strategies.


Figure 1. Welcome to my time machine: Dofasco main site, 2002

Golden rule for growth

Information gets more valuable the more people consume it, so we should always look beyond the benefits of a single project. Building too quickly, however, can also cause projects to collapse. The primary rule for balancing ambitious goals with real-world constraints on budget and time is this: Control your expenditures by rate of implementation, not your goals.

Most projects I have seen fail to return large benefits fall into two categories: either the projects were more complex than anticipated, requiring extensive on-site remedial work, or the scope of the projects was too small, requiring outsized amounts of time, energy, and money. Both are failures that must be addressed at the architectural stage, not the implementation.

In the first case, the job was not adequately scoped-it needed more than data alone could accomplish-and required expensive customization to meet user expectations. The second case was a failure to follow the rule above by attacking a problem that was too small to carry the proper benefits. There is a mistaken belief that a smaller goal involves less risk. That is incorrect. Small projects can contain as much technological risk as larger ones. With the proper goals, average engineering procedures will eventually succeed, but with the wrong goals, no amount of brilliance will succeed. The ideal situation is to create a system that can meet immediate needs while leaving headroom for the future.

To build a system that can scale, first look at the underlying models. In industrial facilities, there are three types of asset models: the physical model, the process model, and the product model. We first observe that historically scalable systems are all based on a physical model-i.e., here is a sensor, record its output with as high a fidelity as you can, and keep the data for all time. (This is the basic design principle of supervisory control and data acquisition, programmable logic controllers, distributed control systems, and other automation equipment. The applications that use today's information infrastructure are nearly 100 percent software based, but this does not mean that they will be small value. After all, the Apple iPhone is mostly software, as is Uber, Airbnb, Lime, and others.)

By contrast, manufacturing execution systems are based on the process model and, while quite valuable in some cases, have inherent issues of scale and customization when you are trying to integrate metadata into the overall system. The same is true of product life-cycle management systems, which are based on the product model. Both of these latter cases can provide immense value.

Once the architecture for scaling has been decided, there are other metadata models that need to be defined, e.g., digital twins to arrange, aggregate, cleanse, and view or analyze the data in such a way that it makes sense to people. A reliability application will pull from the same overall source as process views or other applications, but the calculations, individual data sources, and cleansing techniques can and will be different. The solution is to separate the process of data management from the applications that use the data and develop a clean, open interface between them. Once you have reliably built a data management infrastructure that can truly scale and support data, shaping, reliability, security, and privacy, you have a massive data repository.

As users expand to include other applications, such as supply chain and enterprise optimization, the scope of data projects often extends to include suppliers, vendors, and customers. Resistance to upgrades will frequently emerge. It is often rooted in the perceived reliability (or unreliability) of large software systems, which in turn creates a perception that there will be increased costs or added security complexities. This sounds like an impossible task until you go back to the original premise: to create an architecture that will scale, you need to control your expenditures by the rate of implementation, not scope. Done properly, addressing proper scope should not cost significantly more money.

How does this architecture and infrastructure approach map to the four requirements of Industry 4.0?


At the lowest layer is streaming data management designed for the scope of the project. Streaming is characterized by extremely high data rates and new information coming in unsolicited. Currently, there are systems that operate in the millions of new events per second, and this tendency will only increase as we dig deeper into the fidelity of the data and accommodate additional smart sensors and equipment. As noted above, take care to design information collection without regard to its use. For example, a comment from an operations user that all he needs is 15-minute averages would create a data management system that would not be useful to automation or maintenance.

Processing stream data, especially at these rates, is a high load on real-time systems. One of the design objectives should be to remove any unneeded calculation and processing from this layer. A comment that includes the phrase "computers are fast, and memory is cheap" is a giant red flag that there are problems in the architecture.

The next step is to design simple, discoverable microservices to manage exchange of data between modules. Discoverability is important, because it allows apps to automatically attach to the data and its history to support a "plug and play" design-an essential requirement in large systems, where building with the expectation of manual configuration limits scale.


There can be quite a lot of information generated from sensors and other computers (e.g., NOAA Weather for a power grid). Often, a curated view is provided to allow better interpretation of the information. The Industry 4.0 standard specifically addresses the need to have models to provide this added context. While the complete function is an application, managing the shaping or metadata is an essential part of the infrastructure, so that different applications can present similar views. Although part of the digital twin definition, shaping is also used for reporting, viewing, and production calculations, as well as models. When implemented online and used for operations, the digital twin requires the same level of technical support as the streaming data infrastructure.

Technical support

Unsupported software quickly becomes unusable. Technical support includes bug and error fixes, new logic, security updates, and a "help" function. Layering the parts of Industry 4.0 and defining the service architecture provides the base requirements for serviceable software and addresses the need for both interoperability and an open infrastructure. Finally, the system needs to be designed to be resilient in the face of abnormal events, whether caused externally (e.g., an exploit from a hacker), internally (e.g., fault or error in logic), or by a standard procedure (e.g., a system update). This often results in the use of high availability or redundant systems. In addition, the infrastructure needs to be monitored for detectible faults.

Decentralized decision making

The final requirement of Industry 4.0 includes two requirements for the architecture. I would like to emphasize caution on this: Automation should be done by a control system designed for the task. Most articles that extoll the value of having IoT start your car, open your door, or take other actions have not sufficiently addressed the potential for abnormal events that could bring harm to people or equipment. The same is true on alerts and alarms. At one of the American Petroleum Institute committees on alarms, a user noted that there are multiple cases in a refinery where it is safer to burn down a heater than shut it down. The procedure for handling alarms is thus more complex than it appears and requires knowing the larger context of the action.

In another example, one cause of the Northeast blackout of 2003 was the transient caused by the protective equipment. From an architecture perspective, the best protection is to provide all the information needed, a robust system, and tools for implementing automation. My belief is that control belongs in the on-premise equipment designed for this task.

Many of the new efforts to use computers to improve the management of industrial processes, variously called big data, Internet of Things, artificial intelligence/machine learning logic, Industry 4.0, or digital transformation, are merging into an approach that Professor Michael Porter called the "system of systems."

No individual system or piece of equipment has exclusivity on the use of digital computation, from lowest (e.g., the hardening of a sensor or smart meter processor) to highest (e.g., better management of the power balance of a country). The only difference between a small system and a large one is scope, and to have the proper scope, the system of systems must scale.

Reader Feedback

We want to hear from you! Please send us your comments and questions about this topic to InTechmagazine@isa.org.

Like This Article?

Subscribe Now!

About The Authors

J. Patrick Kennedy, PhD, the founder and CEO of OSIsoft, has been at the forefront of bringing digital technology to the energy industry for more than three decades. Contact Michael Kanellos with questions or comments.