Bookmark and Share
1 March 2006

Information Central

Pharmaceutical industry pilots a new information centric approach to process information

By Chunhua Zhao, Girish Joglekar, Ankur Jain, Venkat Venkatasubramanian, Leaelaf Hailemariam, and G. V. Reklaitis

The compelling drivers for the pharmaceutical industry are speed to market and getting the manufacturing process right the first time. To better support the development and manufacturing processes, a new systematic approach for pharmaceutical process information management would be just what the doctor ordered. 

As a foundation for this new approach, process information should be separate from the collection of tools that use the information. Ontologies, based largely on standards, including ISA-S88 and S95, describe the concepts and their relations. 

The pharmaceutical industry will not be the only market area that benefits. Process and batch industry automation will gain from the development of this interface between manufacturing control functions and other enterprise functions. The goal is to reduce the risk, cost, and errors associated with implementing these interfaces.

Because ontologies are based on description logic, rules and inference engines can be used to ensure the consistency and validity of process information. The proposed framework adopts principles from Semantic Web and provides the foundation for systematically managing process information.

“Ontologies can be considered as playing a key part in the Semantic Web since they provide the vocabulary needed for semantic mark-up,” said Christine Golbreich, an information and integration expert and a professor in computer science in Rennes, France. 

In this approach, the tools that use the process information become software agents. The framework acts as a medium with rich semantics, thereby facilitating smooth and easier information exchange among tools. 

A Web browser portal enables users to easily access and modify process information within a Web browser. Furthermore, the infrastructure facilitates other information management tasks, such as intelligent search, version control, and collaborative process development.

Current industrial response
Development of an active pharmaceutical ingredient and its dosage form evolves in stages, requires participation by several groups, and uses a wide range of application tools. The tools share substantial amounts of information in each stage; and between stages, information and knowledge are exchanged at appropriate times. Often, some of the work in earlier stages is repeated if the overall direction changes and/or more knowledge becomes available.

Therefore, proper support to these groups across various tools is crucial for them to function efficiently to achieve speed to market and get the process right the first time. 

Subsequent to process development, technical specifications and information base must be created to satisfy regulatory requirements.

The industrial response to all these challenges has been sub-optimal. Even with rapid progress in information integration and sharing in business functions (such as ERP systems) and on plant floor (such as MES systems), the intermediate area of process development has been largely neglected. Although  many individual islands of automation exist, no comprehensive, integrated environment links them. 

Fast Forward
  • Even with rapid progress on information integration, neglect  of the area of process development is ongoing, and there exist individual islands of automation.
  • Developments created new software capabilities that will facilitate the implementation of the pharmaceutical informatics infrastructure.
  • The present challenge for the Web is to evolve toward a Semantic Web that can provide readable information for humans and machines and easier access to heterogeneous data distributed  on multiple sites.

Therefore, during process development, practitioners must make do with limited computer-based assistance to acquire, manage, analyze, and interpret complex product and process information with enormous amounts of human intervention. 

This increases the inefficiencies, uncertainties, costs, delays, and product quality concerns all along product development. This also hinders the interactactions between process development and business or manufacturing functions.

 

The industry needs a new paradigm for pharmaceuticals process development that can effectively address the challenges. In this paradigm, the data/information/knowledge as well as their flows in the entire process development are modeled. An informatics infrastructure supports the key decisions spanning the entire process, including product portfolio selection, capacity allocation decisions, pilot plant operation, process simulation, production planning and scheduling, process safety analysis, and supply chain management. 

Let’s look at the current status, the problems of current practice, a new way of thinking about the role of information, and creating ontologies using existing standards, and the issues on implementation of the infrastructure.

 

Tasks 

Application centric view
At present, process information is specified according to the requirements of the application that uses it. For example, here are three tools that work in one pilot plant operation, BatchPlus (AspenTech) for recipe development, Batches (BPT) for simulation of batch processes, and PHASuite (IPS) for process safety analysis.

Even though the information required for these tools is very similar, each has its own syntax and semantics to describe it. To store information, BatchPlus uses a relational database, PHASuite uses a database and object binary serialization, and Batches uses plain text files. Sharing information among these tools requires a three-step process:

1. Access and extract information, based on the understanding of syntax and semantics of the information source.

2. Translate the acquired information into semantics understood by the information destination.

3. Store the information based on the syntax defined in the destination.

Clearly, this scheme to share information is error prone, requires expertise in the source and destination tools, and is very time consuming. Also with different versions of the same information, managing the change of information becomes very challenging. In this environment, developing new applications also becomes problematic because a new syntax for the process information is necessary.

In the application centric view, the scope and representation of a process are limited to the requirements of a specific stage, so a complete representation of a process often is not available. For example, a simulation tool does not need to have material safety related data for its execution, but this kind of information is crucial to process safety analysis.

 Application Centric View of Process Information

Information centric view
The lack of a coherent and unified process view results in islands of information with virtually no provisions for the computer-assisted exchange of knowledge among tools. So in order to create a systematic method for process information management, we must adopt an information centric view to the process information. In this new paradigm, the underlying process information separates from the tools that use the information. 

Instead of encoding process specifications in objects specific to a programming language or tool, the process information is explicitly described using universally accepted concepts. 

A repository of process information will be at the core of the informatics infrastructure. It will include a wide array of information blocks such as recipes, equipment configurations, experimental data, plant operation data, etc.  To describe the information explicitly, the syntax as well as semantics for the information must be defined. The explicit description of domain concepts and relationships between these concepts is called an ontology. 

Developments in the field of ontology have created new software capabilities that will facilitate the implementation of the Pharmaceutical Informatics infrastructure. The shared understanding is the basis for a formal encoding of the important entities, attributes, processes, and their inter-relationships in the domain of interest. 

Ontologies can describe the semantics of the information sources and make the contents explicit, thereby enabling integration of existing information repositories, either by standardizing terminology among the different users of the repositories, or by providing the semantic foundations for translators. 

Compared to a database schema that targets toward physical data independence and XML schema that targets toward document structure, an ontology is based on agreed and explicit semantics of information. As a result, while the functionalities of this infrastructure can work in a traditional client-server framework, the main benefits of this ontology-driven architecture are its openness and semantic richness.

Web Ontology Language (OWL), recommended by and based from Resource Description Framework (W3C), is used to define the ontologies. OWL uses XML syntax to express semantics.  OWL can formalize a domain by defining classes, properties of these classes, and relations between them. 

OWL can also define individuals and assert properties about them, and furthermore reason about these classes and individuals to the degree permitted by the formal semantics of the OWL language. Using OWL, we are creating the Purdue Ontology for Pharmaceutical Engineering (POPE), which is the first attempt to create a unifying ontology to address the informatics and modeling needs and challenges faced in pharmaceutical product development and manufacture. 

 Pharmaceutical Informatics Infrastructure

The proposed POPE framework has a strong relationship to Semantic Web and is an implementation of the same concepts in the domain of pharmaceutical product development and manufacture. 

The goals of the first stage of the current project are to identify the key components of the infrastructure and technologies to implement these components, to integrate the components into an infrastructure, and to demonstrate the feasibility of the proposed framework using a pilot plant operation.

 

Process ontologies

We start off by creating ontologies for the domain of process recipe information. As the first step, the existing standards related to process information and literatures on process ontologies were reviewed as they summarize a common view of the process information from industrial practitioners. 

The standards group into three categories: equipment specifications (AP231, FIATECH, and STEP), process engineering computations (CAPE-OPEN), and batch recipe specifications (ISA-S88, S95). Although there is no single industry accepted standard that can adequately describe a pharmaceutical process, the review provided a good basis for developing the required ontologies. Whenever appropriate, the same terminology and keywords served in the new ontologies. 

The implementation of this infrastructure breaks out into three main tasks. Process ontologies and embedded logic appear in the rules to ensure the completeness and validity of the process information. 

Based on the ontologies, instances of the concepts and relations arise for a particular process using the Web-based information management facility. The second task is to create an interface to the repository so users can access, view, and modify the information. 

A Web-based interface for information management was a logical step because the Web is the natural environment for the use of the infrastructure with such a wide scope. Additionally, development of thin-client applications in Web environment has become feasible due to the recent advances in Web technologies. 

The third task is to provide application-programming interface for various tools to access the information repository. The process information repository is in OWL format or related databases. 

Given the diversity of the tools that will access the process information, a middle layer consisting of controller, adaptor, or translator is imperative for the tools as well as the Web interface to access the repository. 

Given the rich semantics encoded in the process information, it is feasible and desirable to generate a graphical interface for the user or programming interface for software applications to access the information.

In this work, we investigated and assessed various technologies in order to make this infrastructure work, including the link between Web interface to the information repository and different scenario for tools to access the information. 

A Web-based process-information management prototype using POPE is ready. The current effort is to use the interface to create a specific process based on the ontologies. The design guideline is to minimize the necessary written code to create the presentation. Thus, XSLT (the Extensible Stylesheet Language Transformations, a transformation language for transmuting XML document to another XML document) works to link the OWL files to the presentation, and XForms help to define forms for specifying process information. 

Scalable Vector Graphics (defines vector-based graphics in XML format) is used as the format to generate recipe network or PFD from process information. A portal exists now to manage the process information.

The software tools that will use the process information defined in POPE break out into the following three types:

1. Tools that have native interface for the ontologies—tools that are developed based on the ontologies defined in this work 2. Tools that have the ability to read or import process information from databases or XML
3. Tools that use proprietary input and output formats

Proposal to transfer data
The main concept behind the proposed infrastructure is the separation of process information from the tools that use this information and also to explicitly describe that information. 

A prototype of the informatics infrastructure is ready for process information in the pilot plant stage, to demonstrate the feasibility of the framework. 

Work is in progress to further expand and implement the informatics infrastructure to incorporate more information management functionalities, including intelligent search, version control, and collaborative process development.

ABOUT THE AUTHORS

Chunhua Zhao (chunhua@ecn.purdue.edu), Girish Joglekar (gjogleka@ecn.purdue.edu), Ankur Jain (jain18@ecn.purdue.edu), Leaelaf Hailemariam, Venkat Venkatasubramanian (venkat@ecn.purdue.edu), and G. V. Reklaitis (reklaiti@ecn.purdue.edu) work at the Institute of Advanced Pharmaceutical Technology at Purdue University. 

Terminology

Ontology is the representation of the hierarchies of concepts and their relationships, but they also are/apply to controlled syntax, database schema, semantic networks, or thesaurus.

Semantic Web is an extension of the current Web that will allow one to find, share, and combine information more easily. It relies on machine-readable information and metadata expressed in ontology languages.

RDF—Resource Description Framework—is a general framework for describing metadata, or the information about the information. It provides interoperability between applications that exchange machine-understandable information on the Web. For example, as metadata for a Web site, RDF details information such as a site's sitemap, the dates of when updates took place, keywords that search engines look for, and the Web page's intellectual property rights.

Metadata is data that describes other data. Examples of metadata include schema, table, index, view, and column definitions.

SWRL—Semantic Web Rule Language

OWL—Web Ontology Language

RESOURCES

Combining Rule and Ontology Reasoners for the Semantic Web, by Christine Golbreich
www.med.univ-rennes1.fr/lim/doc_101.pdf

Rules and Rule Markup Languages for the Semantic Web: Third International Workshop, RuleML 2004, Hiroshima, Japan, November 8, 2004, Proceedings (Lecture Notes in Computer Science) by Grigoris Antoniou (Editor) and Harold Boley (Editor) www.amazon.com/gp/product/3540238425/qid=1137515366/sr=1-1/ref=sr_1_1/102-5140245-3702507?s=books&v=glance&n=283155

Jena – A Semantic Web Framework for Java  http://jena.sourceforge.net/

SemWebCentral: Open source tools for the Semantic Web http://projects.semwebcentral.org/projects/kazuki/

Protégé: A free, open source ontology editor and knowledge-based framework http://protege.stanford.edu/

OWL Web Ontology Language Overview www.w3.org/TR/owl-features/

SWRL: A Semantic Web Rule Language Combining OWL and RuleML
www.w3.org/Submission/2004/SUBM-SWRL-20040521/


Read questions answered by our experts or join the email list.