31 May 2001
A question of architectural advantages and limitations
by Ramnivas Laddad, Gerardo Pardo-Castellote, Stan Schneider
Smart interactive devices leverage publish-subscribe model.
The publish-subscribe model is the premier method for disseminating information in complex distributed applications. Publish-subscribe middleware takes care of all the network programming and message-passing chores, dramatically simplifying application development for enterprise and Internet applications.
Smart, interactive devices in distributed, real-time applications also benefit from the simple publish-subscribe programming model. However, most commercial middleware targets desktop applications and thus does not meet real-time application requirements for determinism, fault tolerance, robustness, and data delivery control.
Data flow semantics
Developing distributed applications starts with an understanding of the data flow among the smart interactive devices and other, more functional network nodes. The flow can be quite simple, such as when a sensor sends the current temperature regularly to a remote monitor.
It can also be quite complex, such as when hundreds of real-time network nodes exchange sensor and actuator information with controllers to synchronize operation and ensure safety.
When analyzing the flow, the developer must understand the different types of flow and delivery patterns. Important data flow characteristics include timing: how quickly the data must be delivered; reliability: whether or not the data is guaranteed to reach its destination; and bandwidth: how much data must be transferred.
Real-time applications must handle many types of data flow.
Signals: Many real-time applications require measurements from some gauge or parameter (for example, temperature or position). Signals usually change or update continuously. Signal data flow is typically the following:
- Time critical, where updates are useless if old
- Idempotent, where repeated updates are acceptable
- Last is best, where the latest information is more important than retrying missed samples
Signals can require high bandwidth. They often have a short useful life. For most applications, it is more important to get the most recent value quickly than to get every value ever produced.
Commands: Many real-time systems must process sequential instructions, or commands. Command data flow requires that each instruction in the sequence delivers reliably once and only once. The application cannot miss any intermediate instructions or execute an instruction twice. Commands are often not time critical.
Status: Status indicates the current state or a goal. Status data flow requirements are typically less rigorous than commands or signals. Status usually persists for some time. It is not usually critical that it delivers only once, and it may or may not be time critical or require reliable delivery.
Events: Events synchronize execution between concurrent tasks and external operations. For example, a pump control task may not be able to run until the float indicates the level has fallen below a set value. The float sensor will send an event to the pump task to begin operation. Events may or may not have associated data. Event data flow can have both critical timing and reliability requirements.
Requests: Real-time applications occasionally need to issue specific requests for data. Requests imply a two-part communication: A client sends the request to a server, and the server returns the response. In real-time applications, requests may be blocking or asynchronous. Programs issuing requests must specify time-out actions so unfulfilled requests do not excessively repeat.
Once the information properties are clear, attention turns to the distribution patterns.
There are three key questions:
- From where is the information generated?
- Where does it have to go?
- When does it have to get there?
The answers to these questions have profound implications for the application's underlying communication requirements. For example, a client/server system with a centralized shared database is a vastly different system than a distributed control system passing time-critical data directly among distributed nodes.
Data flow semantics are critical to the communication architecture. For example, TCP ensures reliability by retrying dropped packets. However, the number and frequency of retries are global system parameters.
The application thus has little control over the time of delivery. As timing is often a critical parameter, TCP is inappropriate for most real-time applications.
Publish subscribe reduces
Publish-subscribe data distribution is gaining popularity in many distributed applications, such as financial communications, Web-based push technologies, and command and control systems.
Its popularity is justified. Publish subscribe substantially reduces development, deployment, and maintenance effort while delivering better performance for applications with complex data flow. For example, rather than keeping track of each subscriber and sending data to each individually, publish subscribe lets a publisher send each piece of data once anonymously.
Several features characterize publish-subscribe architectures.
Distinct declaration and delivery: Communications occur in three steps:
- Publishers declare intent to publish a publication.
- Subscribers declare interest in a publication.
- Publishers send a publication issue.
Named publications: Publish-subscribe applications distribute data using named publications. Each publication has a topic and a type. The topic is a name used by publishers and subscribers to create a logical data channel. The type describes the data format. Each incremental value of a publication is an issue.
Most publish-subscribe architectures support arbitrary, user-defined types with automatic type conversion among computer architectures.
Many-to-many communications support: Publish subscribe distributes each issue simultaneously from one publisher to many subscribers. The model's flexibility also helps developers implement complex, many-to-many distribution schemes quite easily. For example, different publishers can declare the same topic so those subscribers get issues from multiple sources.
Event-driven transfer: Publish-subscribe communication is naturally event driven. Publishers send each issue when it is ready. When the issue arrives, the subscribers receive notification.
Middleware: Publish-subscribe services are typically made available through middleware that sits on top of the operating system's network interface and presents an application programming interface. The middleware handles three basic programming chores:
- Maintains the database that maps publishers to subscribers. The result is logical data channels for each publication between publishers and subscribers.
- Serializes and deserializes the data on its way to and from the network to reconcile publisher and subscriber platform differences.
- Delivers the published data.
Publish subscribe offers some clear advantages for real-time applications:
- Because its direct peer-to-peer transport is very efficient in both bandwidth and latency, publish subscribe offers the best transport for distributing data quickly.
- Because it provides many-to-many connectivity, publish subscribe is ideal for complex distributed applications.
- Because it requires no configuration, publish subscribe can handle applications that add and remove nodes and data streams dynamically.
This publish-subscribe model works well for desktop and Internet applications. However, real-time applications often require more functionality:
Delivery timing control: Distributed real-time applications must control timing. For example, many subscribers need to receive issues within specific time frames, and publishers need to specify how long an issue remains valid.
Reliability control: Real-time applications must be able to trade off reliable delivery against their own determinism requirements. Moreover, different data channels can have different needs. Within a single application, for example, different subscribers can require different reliability characteristics.
Request-reply semantics: Complex real-time applications often have onetime requests for actions or data. These do not fit well into the publish-subscribe semantics.
Limitations spur answers
Publish subscribe does has some limitations, however.
Flexible delivery bandwidth: Each subscriber's bandwidth requirements — even for the same publication — can be different. For example, logging subscribers may require all signal data issues, while an end-user workstation can get by with far fewer.
Fault tolerance: Real-time applications often require hot standby publishers and servers.
Thread priority awareness: Real-time communications must often work without affecting publisher or subscriber threads.
Selective degradation: Each real-time logical data channel must be protected from the others. That is, the slowdown or failure of one publisher due to dropouts, network congestion, or CPU overload should not affect a subscriber's receipt of publications from other publishers.
Robustness: The communications layer should not introduce any single-node points of failure to the application.
Dynamic scalability: The lifetime of a real-time distributed application often exceeds the individual lifetime of any one publisher or subscriber. Publishers and subscribers need to be able to join and leave the application at any time.
Efficiency: Real-time systems require efficient data collection and delivery. Only minimal delays should be introduced into the critical data-transfer path.
Industry is working on and successfully addressing these limitations. IT
Figures and Graphics
Ramnivas Laddad is principal engineer, Gerardo Pardo-Castellote is chief technology officer, and Stan Schneider is president at Real-Time Innovations, Inc. in Sunnyvale, Calif.