01 February 2005
Certainly it's not human
The machine vision process consists of four byte-intensive steps.
For most of us, vision is essential to many of the activities we carry out every day. Imagine for a moment how many things you would have to do differently, or could not do at all, if you were blind. There is little wonder, therefore, that many efforts are afoot to add vision capabilities to robots and other production machinery.
Until recently, these efforts stalled because of the lack of computing power to process the enormous amounts of data needed to represent a visual image. An image with 256 × 256 pixels (or picture cells, the smallest unit of visual information in an image) and 16 shades of gray requires over a million bytes of storage capacity to represent it.
By contrast, the entire written content of this article needs only needs 70,000 bytes to represent it.
Although we now have more computing power, we are still far from understanding how human beings process visual images. After all, the human brain uses biological processing elements 10 times more slowly than digital chips, yet processes visual data in real time. We have evidence that the human brain makes some use of parallel processing and a great deal of use of hierarchical abstraction, but we still cannot explain how the brain can process visual data so fast, let alone build a computer to match its performance.
It is important to emphasize that the problem of building a machine vision system is larger and more complex than that of building an artificial eye. The human vision system consists of the eye—the vision sensor—working integrally with the brain—the vision interpreter. Both must be, to some extent, understood and replicated to provide machines with a vision capability. Fortunately, most industrial applications do not require the full range or sensitivity of human vision. Merely identifying the shape of an object, or the presence or absence of gross manufacturing defects, may be sufficient.
The machine vision process consists of four main steps:
1. Image formation: A camera senses an illuminated image of, say, a manufactured part and converts it into a series of voltage signals.
2. Image preprocessing: The analog camera image undergoes conversion to digital form. Additional processing such as windowing or image restoration may happen to make the resulting digital image easier to process.
3. Image analysis: The system identifies important features of the image such as object position and geometric attributes.
4. Image interpretation: The image description matches up against stored image models, identification takes place, and control or classification decisions on the part result.
Reduce data processing
The process starts with the sensing of an image using a camera and a light source. The nature of the illumination used is very important and depends on the nature of the task. For simple shape detection, backlighting usually works; if identifying surface features is a must, front or side illumination is more appropriate.
Early vision systems, and less expensive ones today, use a vidicon camera similar to that in a home video recorder. The output of a vidicon camera is a series of horizontal scan lines, each represented as a continuous electrical signal, with variations in amplitude corresponding to variations in light intensity. More recent systems use solid-state cameras with an array of vision sensors—typically 128 × 128 or 256 × 256. Each sensor converts the light falling on it into an analog electrical signal that is proportional to light intensity. The images provided by these cameras are of higher quality and are less vulnerable to distortion than those provided by vidicon cameras.
Each camera image, typically once each 16 milliseconds, links to an image processor that transforms each analog voltage value into a corresponding digital value. Depending on the requirements of the system, the image may be interpreted in "black and white" (pixels with voltages higher than a cutoff value are white, all others are black) or a gray scale (typically, 16 value), which provides greater refinement and reduced sensitivity to imperfect lighting at the cost of greatly increased storage requirements.
Since a production vision system is normally required to operate in real time, it is crucial to reduce data processing for each image to a minimum. One approach frequently taken is windowing, which involves placing an electronic mask over the scanned image so only the region of greatest interest receives the scrutiny. Image restoration techniques may also work to make subsequent processing easier by compensating for deficiencies in the camera image, such as blurred lines and poor contrast. For example, image stretching increases the relative contrast between high and low intensity pixels. A more sophisticated technique, Fourier domain processing, analyzes the changes in brightness in the image into a series of sine and cosine waves that can provide better edge definition or a reduction in background noise. Techniques of this nature have worked to make features such as small moons and meteor craters visible on pictures sent back from interplanetary space probes.
Upon obtaining a good quality image of the region of interest, extracting the important features of that image, such as position, distance, and orientation, must take place. Position is usually no problem; distance measuring techniques leverage a stadimeter (using the apparent size of an object in the camera's field of view), triangulation, and binocular vision (using two cameras). Orientation (in two or three dimensions, as required) is possible using geometric techniques (fitting an ellipse to the image or noting the relative positions of three non collinear points), interferometry, or light intensity distribution. A trickier problem, image segmentation, involves identifying the component regions of a complex object. Machine vision systems find true region boundaries difficult to distinguish from shadow edges or image imperfections.
Image interpretation consists of matching the processed image against a set of stored images to make identification. Template matching involves superposing the processed image over the stored image and measuring, for example, the percentage of pixels that do not correspond. Feature matching, a more sophisticated approach, involves calculating a weighted function of a number of features of the processed image and comparing it with the same function calculated for the stored image. Once the image has been identified or classified, further action may be taken, whether updating a database or directing a robot arm to move.
![]() Each stage of the machine vision process with the hardware needed to implement it. |
Performance of vision systems
The performance of a machine vision system depends on several factors:
Resolution—described earlier, processing speed—not only the bit-level image processing speed but how fast each item can be examined by the system
Discrimination—how many gray levels, and how easily the system can detect edges
Accuracy—the probability of correct interpretation of images
A typical system in use today could process two to five simple parts per second with 90% accuracy. Higher speeds will come with faster processors, better recognition algorithms, and the use of parallel processing.
Current machine vision systems can be successful only in highly structured and controlled environments. A system that works in the laboratory to recognize clean parts may fail in a plant where the parts are oily and dirty.
Many vision tasks that human beings perform easily, such as picking jumbled parts out of a bin, are far beyond the capabilities of today's systems. Unfortunately, the economic justification of many such systems depends on reducing the need to structure the environment, for example, by eliminating jigs and fixtures or the precise placement of parts on a conveyer belt. IC
Nicholas Sheble (nsheble@isa.org ) edits the Control Fundamentals department. The source for this critique is Fundamentals of Industrial Control, ISA Press 2005, D.A. Coggan, Editor.
That's knot qualityLumber classifies and grades out on the basis of characteristics that directly relate to appearance and soundness, mainly the number and size of knots per board and the presence or absence of flaws such as resin inclusions and rotten spots. The cut boards travel sideways on a moving belt, where, typically, human inspectors evaluate each board visually and either mark it or push it off onto the appropriate belt or bin. The work is boring, and lapses of attention can lead to mistakes in grading and customer complaints. A machine vision system can scan the board and analyze the image to determine the number and size of dark spots that indicate defects. |
Read questions answered by our experts or join the email list.


