01 February 2005
Under the hood of XSLT
Doesn't apply to you? Automation engineers should think again.
By John T. Sever
Outside the world of industrial automation, eXtensible Markup Language (XML) is fast becoming the world's most popular means of data interchange. A significant contributor to the success of XML has been eXtensible Stylesheet Language Transformation (XSLT) because of its simplicity and ability to transform XML data from one structure to another. The popularity of XML is driving automation system vendors to add XML as an optional format for control code import/export functionality.
To take advantage of this technology, an automation system must be capable of basic control code import/export to an XML data structure. While not all systems fully support this functionality, vendors continue to add XML support to their systems. Much of their support provides an interface between the plant automation system and ERP, MES, or other business systems. In fact, it is difficult to find any information about utilizing XML other than for web applications or business system integration.
While these are certainly excellent applications for XML and XSLT technology, its value is not limited to any particular application domain. Automaton engineers remain unfamiliar with XML or XSLT and do not consider the technology pertinent to delivering or maintaining applications. In fact, automation engineers have little experience with either XML or XSLT and might wonder if either technology has any value on automation projects. Think again.
Time is money
While learning a new language is a formidable undertaking that most engineers avoid if they can accomplish the task by other means, learning XML and XSLT is much easier than one might expect. An automation engineer today must know some combination of automation concepts including programming in a PLC or DCS controller, programming HMI code, advanced continuous control, or batch recipe management—all of which are significantly more complex than learning XML and XSLT. The time saved applying this technology on automation projects far outweighs the time spent learning the technology.
Here are a few examples of how to save countless man-years utilizing XML and XSLT on automation projects:
- Auto-document existing control configuration in HTML, Word, or PDF.
- Execute bulk control code edits across an entire control application. The bulk edit only modifies code constructs that meet specific selection criteria. For example, modify a module parameter name and all references to the parameter through out the code.
- Import specific control parameter values into MS Excel or MS Access for review and modification for process personnel and subsequent download back to the control system.
- Auto-document batch recipes complete with parameter descriptions, values, default values, and deferral chains across the recipe hierarchy.
- Verify code or parameters for consistency, and report exceptions to specific search criteria. A control module configured for interlocking logic may require that parameter values be set properly for interlocking logic to function properly. This verification can generate results with properly configured parameter values for import and system download.
- Use module and parameter definitions developed in a relational database to auto-generate controller code and documentation ensuring the documen tation accurately represents the actual control code.
XML is a text-based hierarchical data structure that looks very similar to HTML. HTML and XML claim common ancestry in the Standard Generalized Markup Language (SGML) and are specifications maintained by the World Wide Web Consortium (W3C). While the HTML specification defines a set of elements and attributes that describe data and its presentation, the XML specification is much more general and defines only basic structures and markup without specifying element or attribute names. For example, the <br> element acts as a line break in an HTML document, whereas XML includes no specific elements and allows users to define any set of elements and attributes to best describe their data.
Conceptually, XML data is in a document. XML data may be a text file on disk, it may be text streamed from a server, or it may be hard coded text that is in an HMI VBA application. Although the data may have many different sources, the document metaphor still applies as long as the document contains a single top-level element. In most cases, the document will exist as a single file.
A tree metaphor can describe the data contained in an XML document. Under the tree model, elements, attributes, and text are all tree nodes where each has a different node type. Throughout this article, the term node will refer to a document element but will not refer to attributes or text.
By itself, information stored as XML data is not much more valuable than information stored in another text format. To promote the adoption of XML as a commonly used data structure, a simple text-based and platform-independent method for converting (or transforming) XML data into other structures is necessary. The value of any data structure lies in the ability of users and applications to manipulate, view, modify, or analyze the data for purposes defined by the end user. W3C designed the XSLT specification to meet these needs.
The XSLT specification defines the syntax and semantics of a language for defining transformation directives performed by an XSLT processor. Unlike XML, XSLT is actually a language—a declarative language written as an XML document. Automation engineers will likely find this language to be significantly different from most others they have used. Although it may take some adjustment to become comfortable with a declarative language, XSLT is quite simple to learn, yet remarkably powerful.
The XSLT specification defines a set of language keywords and functions and their interpretation by an XSLT processor. Compared to other languages, there are very few keywords and functions in the XSLT language.
A declarative language is one in which the programmer defines (or declares) relationships between data and variables. An external engine or interpreter applies a set of fixed algorithms based upon the declared relationships and generates a result set. A procedural language such as Visual Basic is one with which the programmer defines an explicit sequence of steps executed by a computer processor.
An XSLT processor parses and interprets an XSLT document as instructions to execute against another XML document (the input document) and generates an output file. An input document must be a well-formed XML, and if there is a schema, it must adhere to the schema definition (valid XML). The output can be any text-based data such as XML, raw text, CSV, PDF, or HTML.
The central component in the transformation process is obviously the XSLT Processor. While XSLT processors are available, Windows 2000 and Windows XP include a single application called MSXML that includes an XML parser and an XSLT processor. Transformation stylesheets in this article are MSXML version 4.
An XSLT document contains a single transformation or stylesheet and one or more templates that define a set of rules. An XSLT processor applies the XSLT template(s) against an input XML document to create an output document. The template structure allows transformation rules to go into separate modules used separately or chained together.
Hello World XSLT style
No language introduction would be complete without a "Hello World" example. This example creates an HTML Hello World page.
The top-level element xsl:stylesheet is part of the namespace http://www.w3.org/1999/XSL/Transform. This namespace must be exactly as shown to identify the document as an XSLT document to the parser/processor. Elements in this namespace (identified by the xsl prefix) are interpreted by the XSLT processor as transformation instructions and must adhere to the language syntax and semantics of the XSLT specification. The version attribute identifies which version of the specification the transform is based upon. The W3C has a working draft for the 2.0 specification, which adds considerable functionality. Until the 2.0 specification finalizes, it is advisable to use 1.0 for this attribute.
The output element supports 10 optional attributes for instructing the processor regarding output format. The method attribute in this example instructs the processor that the output is an HTML document and therefore an xml declaration will not be included in the output document. Valid values for the method attribute are xml, html, and text. Specifying xml generates the xml declaration in the output while the other two methods omit the xml declaration.
The template element instructs the processor to find (or match) the XML document root (the / character in the match attribute translates to the document root). The document root does not correspond to a specific part of the document but may be regarded as the entire document. Do not confuse the document root with the top-level element, which is often called the root element.
When the processor finds the document root, it generates the output specified by the contents of the template element. The content within the template element are not part of the xsl namespace and therefore are not interpreted as XSLT instructions by the processor. Text outside the xsl namespace passes directly to the output by the processor. The resultant output for this example includes an html page with a single line of text formatted using the H1 HTML element (Heading 1).
This transform will successfully execute against any XML file because the transform does not reference specific XML content. Although the transformation does not require a specific input XML document structure, an XML document—any document—is a requirement for the processor to perform the transformation.
The examples that follow assume an XML document created by exporting control code from the control system development software. Each example transforms the exported code into a different document format.
Each XSLT transform is based upon the following XML document, which expands upon the previous I/O samples by adding more cards enclosed in a processor top-level element.
Transforming XML to CSV
This transform generates a text file in comma separated values (CSV) structure that can open via a spreadsheet application or another program that supports CSV files. The first row in the output includes heading text for each of the CSV columns. Notice the hierarchical input transforms into a flat table demonstrating the ability not only to transform XML to CSV but also to generate a fundamentally different data structure.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:text>"Tag","Card","Point","Type","Model" </xsl:text >
<xsl:for-each select=" Processor/Cards/Card/Point">
<xsl:value-of select=" @Channel"/>
<xsl:value-of select= "ancestor::Card/Type"/>
<xsl:value-of select= "ancestor::Card/Model"/>
CSV transformation output:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/19 99/ XSL/Transform">
The first line is the standard stylesheet element, which includes the required namespace that identifies the file as an XSLT, transform.
The output element tells the processor the output is standard text and to omit an xml declaration.
This line defines the only template in the stylesheet. This template tells the processor to find the document root then process the instructions contained inside the template.
<xsl:text>"Tag", "Card", "Point","Type", "Model" </xsl:text>
This is the first line that defines content for the output. The xsl:text element instructs the processor to output the text between the opening and closing tags. The text is output exactly as typed including all white space (spaces, tabs, and new line characters). This element was not in the Hello World example to generate literal text because HTML browsers generally ignore white space. When generating literal text, the processor must distinguish between space intended for output and space used to make the XSLT file more readable. Controlling white space in literal output may be somewhat confusing initially, and at times, the desired results may be impossible to achieve with literal text. In such cases, using the xsl:text element will provide full control of the output text and desired white space.
The sequence is a special character sequence for text that cannot be directly entered from the keyboard or non-printable characters. The sequence characters begin with & and ends with ;. The # character specifies that the following number identifies the numeric equivalent of a Unicode character. In this example, 13 identifies a carriage return.
<xsl:for-each select=" Processor/Cards/Card/Point">
The xsl:for-each element instructs the processor to repeat the contained instructions for each item in the source XML document that matches the select attribute. The string value of the select attribute is interpreted by the processor as a path to an element in the input document.
As the processor moves through a template, it maintains a pointer to a context node in the XML document. Various XSLT instructions, including xsl:for-each, automatically change the context node. In this case, the context node changes to each of the Point elements as the processor loops through this instruction. Understanding how the context node changes in an XSLT transformation is critical.
Remember that XML and XSLT are case sensitive, and the input document element names must match the names in the match expression including the capitalization.
The xsl:sort element sorts the output generated inside the xsl:for-each instruction based upon the criteria specified in the select attribute. In this case, the output sorts by the Tag element. You do not need the full path to the Tag element (Processor/Cards/Card/Point/Tag) because Tag is a child of the context node Point. Specifying Tag as the sort criteria means to sort based upon the value in the Tag element that is a child of the current context element.
The xsl:value-of element outputs the value of the element identified in the select attribute, in this case the text contained in the Tag element.
Behind the byline
John T. Sever, firstname.lastname@example.org, has a Bachelor of Science in chemical engineering and 20 years of experience as a process and automation engineer. As president of Cascade Controls, Inc. in Tinley Park, Ill., he serves as Cascade's primary software architect, enhancing automation system usability with applications of software technology.