1 July 2000
Exchanging Data with XML—Part II
How is XML structured, and what can you really do with it?
By Al Chisholm and Charlie Gifford
In the first part of our two-part series on XML, or extensible markup language, we learned about this hot new universal Web language's power, interoperability, and broad applicability. But how is XML structured, and what can you really do with it?
Opening the Package
We previously stated that an unlimited set of tags may be defined in XML. While HTML tags specify how to display data within a browser, such as in bold or italics, XML defines data content, such as street address and city.
In addition to that difference, XML has a simpler set of parsing rules than HTML that make XML tags self-describing. This allows XML data to be broken down into recognized characters without external descriptions. Thus, any text inside an XML document is going to be delimited by the XML tags that tell you what it is. In many ways, this makes it easier to understand and more robust than comma separated variable (CSV). For many, it's useful to think of XML as the latest version of CSV.
As with CSV, XML is intended as a universal way to package data so that it can be exported to some other loosely coupled application. The format of the data needs to be reasonably simple, flexible, easy for the originator to generate, and easy for the receiver to parse.
|An XML document is easy to create; anyone familiar with HTML can write one. Here, XML is used to describe a simple customer profile.|
XML's self-describing quality is a significant advantage over CSV. For example, if you leave out a field in CSV, the whole document shifts—and there's no way to know this has happened. If you leave out a field in an XML document, the absence might be acceptable—but if it isn't, you'll see it immediately because of the missing tag.
Although at first glance writing XML appears more complicated and less compact than CSV, it is much more flexible, expandable, easier to add optional data to, and less prone to error.
To write well-formed XML documents, six rules need to be followed:
- Start tags and end tags must match.
- Elements cannot overlap.
- XML tags are case sensitive (i.e., and are three different elements).
- Empty elements should be denoted. XML has a shorthand for empty elements: A sole tag ending with a /> signals that the element has no contents.
- XML has several reserved characters; for these characters a special character sequence, which XML calls an entity, must be substituted:
substitute < with <
substitute & with &
substitute > with >
substitute " with "
substitute ' with '
- Each XML document must have a unique root element. In the example above, the element denotes the unique root element of the XML document.
With XML, document type definitions (DTDs) can accompany a document to define its rules, which could include items such as the elements present and the structural relationship among the elements. DTDs are an optional component to XML that helps validate the data when the receiving application doesn't have a built-in description of the incoming data.
Data sent along with a DTD is known as valid XML. With valid XML, an XML parser can check incoming data against the rules defined in the DTD to verify that the data is structured correctly. Data sent without a DTD (as in the customer profile example above) is known as well-formed XML. Here the document is used to implicitly describe itself.
In the Whole Schema of Things
In both valid XML and well-informed XML, data is self-describing because descriptive tags are intermingled with the data. This open, flexible format allows XML to be used anywhere the exchange and transfer of information is needed. This makes it an exceptionally powerful tool.
XML can be used to describe information about HTML pages or it can be used to describe data contained in a manufacturing process. Because XML is separate from HTML, it can also be added within HTML documents.
The World Wide Web Consortium has defined a format by which XML-based data, called XML data islands, can be encapsulated in these pages. By embedding XML data inside an HTML page, delivered data can be viewed in multiple formats, using the semantic information contained in XML.
An XML schema is a formal specification of the rules of an XML document. In other words, it contains the element names that indicate which elements are allowed in a document and in what combinations. Using a schema, an author can define precisely which element names are permitted in a document and, within each element, which subelements, attributes, and relations are allowed. An author can import fragments from other XML schemata and extend types through inheritance. This permits complex relationships among elements while keeping the simple lexical tree structure.
XML on the Move
We learned in the last Tech Talk that XML would likely have a profound effect on several areas of process control and automation, from the enterprise level to the factory floor.
Such development is beginning to happen rapidly. The OPC Foundation, for example, plans to publish XML schemas compatible with Microsoft's BizTalk framework for improved business-to-business and business-to-consumer computing. This initiative is testimony to the effort now under way to define how manufacturing concerns can better leverage the Internet through the application of XML. The goal in this case is to provide OPC-compliant solutions with enhanced e-commerce capabilities and functionality.
Similarly, Open Applications Group Inc. has put a pilot project in place to move its XML definitions to a BizTalk framework specification. The project is designed to lead to a true component-based framework for business applications, whereby a vendor can use the specifications to build a BizTalk-compatible framework and populate it with XML-compatible components. The advantage is that users can implement components separately as their business needs dictate. This will speed the implementation of powerful new technology and reduce the capital requirements necessary to improve functionality.
Microsoft has implemented support for the XML standard within Internet Explorer 4.0 and will use XML in its next release of Microsoft Office and other products. This commitment to XML is important because corporations are increasingly moving from classic client/server two-tier application models to three-tier models, in which a browser front end interacts with a middle-tier Web server, which in turn communicates with a back-end database server for storage. This three-tier architecture has several benefits over client/server models, including easier scalability and better security. And XML will make possible richer implementation of such models through structured data exchanged over HTTP.
Separation Is Power
The power and beauty of XML is that it maintains the separation of the user interface from structured data, allowing the seamless integration of data from diverse sources. Customer information, purchase orders, research results, bill payments, medical records, catalog data, and other information can be converted to XML on the middle tier, allowing data to be exchanged online as easily as HTML pages display data today. Data encoded in XML can then be delivered over the Web to the desktop. No retrofitting is necessary for legacy information stored in mainframe databases or documents, and because HTTP delivers XML over the wire, no changes are required for this function.
Once the data is on the client desktop, it can be manipulated, edited, and presented in multiple views without return trips to the server. Servers now become more scalable, due to lower computational and bandwidth loads. Also, because data is exchanged in the XML format, it can be easily merged from different sources.
By providing interoperability through a flexible, open, vendor-neutral standard, XML is enabling new and powerful ways of accessing data and delivering it to Web clients. IC
Al Chisholm has more than 25 years of experience in creating factory automation and process control software. As chief technical officer and co-founder of Intellution Inc., he also serves as technical director and is a member of the board of directors of the OPC Foundation. Chisolm is a graduate of Brown University and holds an M.S. in computer and information science from the University of Massachusetts. He has written numerous articles for trade publications and conferences and gives frequent lectures on OPC and related topics all over the world.
Charlie Gifford is an industrial IT specialist with more than 14 years of experience in facility and system analysis and design. Currently, he is a manager for the Interprise Supply Chain Solutions Group at AnswerThink Consulting Group (Miami). His group directly manufactures in an e-business environment by connecting the customer, suppliers, corporate, and the plant floor through MES, ASP, flow manufacturing, warehouse/distribution management, and global supply chain partnering. Gifford received two B.A.s in chemical and material engineering and an M.S. in electronic materials processing from the University of Maryland.
An XML document is easy to create; anyone familiar with HTML can write one. Here, XML is used to describe a simple customer profile.