Encoder Front Page
SRS Home | Front Page | Monthly Issue | Index
Google
Search WWW Search seattlerobotics.org

 

A Guide to Visual Techniques

by Dominic Ford

 


Section I: Introduction

1.1 Sensors

For a robot to show intelligent behaviour, it must be able to monitor its surroundings and respond to changes. The development of a source of this input is often one of the most difficult areas of a project. Sensors almost invariably produce analogue output, which must be converted by appropriate hardware to a digital form before it can be accessed by software. Usually the format of the raw data then requires the software to perform extensive decoding before any analysis can begin.

By far the most common source of input that robotic engineers use is ultrasonic rangers, or sonar. This has the advantage of being cheap and easy to use - they are often available in complete packaged kits. The software required to analyse data from such sensors is fairly simple to write.

But ultrasonic ranging has its limitations:

Visual detection is a more complicated but often more effective alternative. The hardware requirements are greater - the raw data transmits at a much greater speed making software sampling less feasible. Unlike sonar, video data has two dimensions - the x and y planes of the image - and so the synchronisation process is much more complex. The sampler must form a matrix of data, not just a single stream. Commonly this is achieved by a video digitiser, but for many tasks there are alternatives. This needn't be as daunting as it first appears, and the rewards can be substantial:

A suitable camera for a robotics project is usually fairly easy to find. If the camera is not to be mounted onto the robot itself, it is possible to use a standard video camera. If the camera does need to be mounted on a compact robot, then there are plenty of small camera modules available. Colour does add a lot to the price (often four times the price), but monochrome is often all that is required. Colour video signals are much more difficult both to digitise and to analyse, and hence monochrome is often preferable. These modules can be as little as 50 dollars if you look in the right place.

Section II: Digitising the Raw Video Signal

2.1 The Analogue signal

Cameras generally produce a variety of outputs depending on what they are designed for. The simplest signal to decode is the "Composite" signal, sometimes labelled "CVBS". Ignore any UHF outputs - they are modulated to radio frequencies, and need a radio receiver circuit to demodulate.

I shall assume at this point that you are using a monochrome camera. Much of this information also applies to colour cameras, but if you need colour output you have no option but to use a packaged video digitiser card as the colour data is sent at a rate beyond the sampling power of any normal software system.

The composite signal varies between 0V and +1V, where 0V indicates maximum darkness and +1V is maximum brightness. The figure of +1V is typical - cameras vary from 0.5Vpp to 2Vpp, but most use 1Vpp. The signal is broken by pulses where the composite output is clamped to 0V. These indicate horizontal sync pulses and vertical sync pulses. Vertical sync pulses last in excess of 10us and indicate the start of a new frame. The end of a vertical sync pulse indicates the start of transmission of the first line of a frame. Horizontal sync pulses last less than 10us and indicate that a new line is being transmitted. Below is a sample oscilloscope view of such a video signal:

There is some variation in the exact timings of the signal, but the following is a rough guide:

NTSC(USA) PAL(Europe)
Frame rate (frames per second) 60 50
Lines per frame 243 287
Pixels per line 320 384
Vertical sync pulse time (ms) 33 40
Horizontal sync pulse time (us) 4.3 4.4

2.2 Digitising procedure

The procedure for digitising a video frame is roughly as follows:

  1. Wait for a vertical sync pulse
  2. At the end of the pulse, start sampling the signal at the required speed (generally as fast as possible).
  3. Stop when a horizontal sync pulse is reached, and advance to the next line.
  4. Restart sampling at the end of the horizontal sync pulse.
  5. Stop when the required number of lines have been read.

The maximum time that this procedure can take is the transmission time for two complete frames. If the procedure starts immediately after a vertical sync pulse, the period of one complete frame will be wasted waiting for the next vertical sync pulse. This is unavoidable since these synchronisation pulses are the only way of telling which part of the frame is currently being transmitted, and so reception cannot begin until a vertical sync pulse has been met.

2.3 Preblanking and Postblanking periods

This information is not important for most robotics projects, but is worth bearing in mind. The first and last few lines of the frame contain null data and is not worth sampling if it can be avoided. Obviously, if RAM is limited and you can only afford to capture a certain number of lines of the frame, it is wasteful to spend some of those lines capturing this null data. Often, though, it is easier and safer to ignore this and start sampling immediately after the vertical sync pulse. The exact number of lines varies from one camera to another, but never usually exceeds 32 lines of null data at the top and 16 lines at the bottom.

The same phenomenon occurs with the data at the far left and right of each line. Here the effect is often negligible, although can peak at the first and last fifths of the line containing null data.

Generally no allowance needs to be made for either vertical or horizontal preblanks, but it is useful to be aware of their existence.

2.4 Video digitisers

The easiest way of capturing a frame of video data is to use a pre-packaged video digitiser card, which will deal with all of the signal processing and synchronisation automatically, leaving you with an array of pixels which are ready to analyse. This solution is all very well for desktop PCs, but it is very hard to come by such cards that interface with the kind of microcontrollers that mount onto robots. If your robot is controlled remotely by your PC, this is more acceptable, although even then it can be difficult to access data from your video digitiser from within your control program.

Alternatively, the digitisation can be split between hardware and software. There is a choice here of where to place the boundary between the two. Some of the capturing process must be performed by hardware, as the raw signal is in an analogue format. Thus at the very least an analogue to digital converter is needed. But to perform the entire capture by hardware means essentially building a complete video digitiser card, which involves a huge amount of electronics. A few possibilities are described below:

2.4.1 Software digitisation

It is possible to perform much of the sampling in software, requiring little more than a single analogue to digital converter in the hardware. This generally produces a lower resolution of capture, and does tie up the processor while the capture takes place (if the capture takes place entirely in hardware, the processor is free to do something else). Often, neither of these factors are important.

A suitable ADC for the job is Harris Semiconductor's CA3306. This produces 6 bit output and is capable of sampling at video speeds. It requires a clock input to time the captures - it digitises one 6 bit word for every clock pulse. If it is available, your processor's clock is ideal for this.

It is often advisable to place a coupling capacitor between the ADC and the input from the video camera. A suitable setup is shown below:

This setup has the following important features:

The software waits for a vertical sync pulse which appears as a long string of zeros from the ADC. It then samples data from the ADC into a RAM buffer as fast as it can. Every time a zero is received from the ADC, the sampler starts storing the next line as this is a horizontal sync pulse. It skips on through the buffer to the space for the next line. Thus you can form a neat array such as:

0000 - 003F Line 1
0040 - 007F Line 2
0080 - 00BF Line 3
.... - .... ...

If a horizontal sync pulse is received when the sampler is at $007C, it skips to $0080 to keep the array straight in memory.

Putting this much emphasis on software capture does have its drawbacks. Every byte of data that is sampled from the ADC must be checked to see if it is zero to check for sync pulses. This slows the system down dramatically and reduces the resolution. There is an alternative method which requires more RAM and more time to capture a frame, but works well when neither of these are important issues:

When the vertical sync pulse is received, sample data straight into a buffer for about the time of one frame. Don't monitor for sync pulses, simply sample as fast as you can. Then, go through in a second pass and transfer the data you have just captured into a second buffer which you sort into a neat array using the sync pulses. Since the data is already captured, you can take as long as you like over this.

2.4.2 Hardware synchronisation

The synchronisation method used by the software capture technique above can prove unreliable at times. If your sampler misses horizontal sync pulses the pixel array that you store will drift from side to side. Even if it catches them all, there will be "pixel jitter" to some extent. This is where pixels are seen to jump from side to side between successive frames. All video digitisers produce some level of jitter, but serious cases are a tell-tale sign of poor synchronisation.

It is possible, at little additional expense, to perform the synchronisation in hardware, leaving the sampling in software. The hardware required is a sync stripper, such as Elantec's EL4581CN. This takes the raw video signal and outputs pulses on two separate lines whenever a horizontal or vertical sync pulse is received. These can then be used to synchronise the software capture code. One good way of doing this is to connect the pulses to the interrupt line of the processor. The interrupt can be used to synchronise the capture software. When the microcontroller is not accessing the video data, the interrupts can be disabled either by using the microprocessor's interrupt masking facilities, or by using an available output line to switch the sync stripper hardware off.

2.4.3 Hardware digitisation

If the nature of the analysis requires high resolution image, the software capture techniques above may become unsuitable. This is also the case if the microprocessor's clock rate is too slow. The nature of the transmission makes vertical resolution easy to enhance - simply sample more lines. But increasing the horizontal resolution means sampling faster, and that often isn't possible. In this case, it may become necessary to perform the capture procedure entirely in hardware, using a video digitiser card.

There is one alternative that is sometimes possible. The timings of the video signal are determined by a crystal within the camera module. For NTSC this is usually 12MHz. If you replace this with a slower crystal, there are two effects on the video signal:

 

IMPORTANT: This is a risky operation. Always consult the relevant camera data sheets first. Neither the author nor SRS take any responsibility for screwed up cameras!

If you decide that the only solution in your particular robot is to perform the capture entirely in hardware, you need to build a video digitiser card. This is a huge topic that I will not go into the details of here. If you are interested in building a video digitiser, please contact me and I will give you further information.

Section III: Visual Analysis

The exact method which you will need for your robot will depend heavily on what you are looking for, and what other distractions you will have to distinguish this desired target from. I will describe a few of the most useful visual analysis techniques.

In most situations there will be much data in the image which is irrelevant, such as objects in the background, and reflections off surrounding surfaces. The computer must pick out any areas of the image which look as if they may represent the object(s) in interest. Therefore, before any detailed analysis can begin, the image must be simplified. Once again, I shall assume that a monochrome image is used, as this makes the visual detection system much simpler, and is more common anyway since the capture is so much easier.

3.1 Differential Analysis

This technique helps when it is the shape of the object that is important. Shape is normally the most important feature of an object - we humans ourselves usually immediately focus on the edges of an object when we are trying to identify it. These edges can be recognised as lines along which there is a sudden change in brightness. Thus we are not so much interested in the absolute brightness as the rate of change of brightness.

The absolute brightness is not particularly useful anyway, as it is affected by so many factors. Outdoors, it changes depending upon the weather conditions and the direction of the sun. If your camera has an automatic exposure control (AEC), the colour of the surroundings will also affect the absolute brightness of your object. But the patterns in the rate of change of brightness are affected only by the presence of edges, not lighting conditions.

In mathematical terms, the first step is to differentiate the image with respect to the x-axis. This means that each pixel has the brightness of the neighbouring pixel to the left subtracted from its brightness. The result is that the "bright" areas of the image now do not represent areas where there is bright light, but where there is a sudden increase in light intensity when looking across from left to right in the image. Similarly, dark areas now represent areas where there is a sudden change from light to dark when looking across from left to right:

3

3

4

9

CHANGES

-

0

1

5

2

3

9

9

TO:

-

1

6

0

2

9

9

3

-

7

0

-6

9

9

4

3

-

0

-5

-1

Note that the resultant image has one less column than the original image.

In the new differential image, objects stand out much better. This is illustrated by a sample image below. The same photo has been processed twice, but in the second case (the lower photo), the original was underexposed. Notice that while the underexposure had a dramatic effect on the absolute intensity of the original, the processed image has a much less marked effect. Notice also how the circle stands out much more clearly in the differential image as it is the only part of the image where there is a continuous pattern in the data:

Source

Differentiated

A series of passes can then be carried out over the resultant image to search for particular patterns which commonly indicate the presence of particular objects. A few more common objects are listed below. Note that if an object has more defined edges on the top and bottom than on the left and right, it can be more effective to differentiate with respect to the y axis. This means applying the process above, but subtract the brightness of the pixel above, as opposed to that to the left. This is illustrated with the example below, where a musical score is being processed:

Original

X Differential

Y Differential

The same image has been processed both horizontally and vertically. In a music score the vertical edges are much more important to spot than the horizontal edges - all of the lines that we will want to detect run horizontally, and thus do not have left and right edges as such.

 It may appear that the differential image is little more use than the original - it simply looks rather more artistic! The point is that when analysing the original image, we are only interested in the edges of the notes.

3.1.1 Circles

Many targets that robots are looking for appear as circles in a camera frame. These can generally be picked out in one sweep down the image (i.e. perpendicular to the direction that you differentiated along - if you differentiated vertically, sweep from left to right instead). There will be a line of colour down each side of the circle. The colours on the two sides will be opposite, because one side is getting lighter, the other is getting darker.

 

Original

   
 

Differentiated

Light circle

Dark circle

Notice that for a light circle and a dark circle, which colour is on which side is reversed. This can be useful if you want to distinguish between the two (say between black squash balls and white golf balls).

The detection of balls for various sports seems a popular use of visual analysis in many robots. By popular demand, an algorithm for spotting these (which appear is circles) is given below. First the algorithm is given as a mock program, and then it is explained in detail below:

3.1.1.2 An algorithm for spotting balls

 

Ball visual recognition algorithm

Dominic Ford 1999

 

NB: This algorithm is written in the format of a computer program, but is designed to be as flexible as possible, allowing easy translation into whatever language is desired. It is not intended to be immediately executable in any language, and is designed purely to be as readable as possible. For example, the procedures are given high-level first; low-level later. Although this is most readable, most programming languages demand that the procedures are given in the reverse order. Comments are given throughout, and are enclosed in {brackets}.

 

{Variable declarations}
constant xsize = ??? {horizontal size of image}
constant ysize = ??? {vertical size of image}
constant col = array[0..xsize,0..ysize] of Integer {the source image}
constant threshold = ??? {the threshold differential to indicate an edge. This depends on many factors, including the number of levels of greyscales sampled, the colour of the target object, and the lighting conditions}
constant ballsize = ??? {how big - in pixels across diameter - are the balls likely to appear}
variable x: Integer {horizontal position of pixel currently being scanned}
variable y: Integer {vertical position of pixel currently being scanned}
variable differential: Integer {used to calculate the differential of the current pixel}
variable ballrecord: array[0..xsize] of Integer {used to store the horizontal positions of patterns which might form balls - see comments below}

 

 

 

{Program code}
procedure spotballs {call this procedure to do the ball detection}
for y=0 to ysize {this loop scans down the image, line by line}
differentiate_line {First differentiate the current line, storing positions of any sudden changes in brightness, ie edges..}
spot_balls {..and then rescan it to see if these edges are balls}
end {for y}
end {procedure spotballs}
procedure differentiate_line {this procedure differentiates the current line and looks for large steps up or down which might be part of a ball}
for x=0 to xsize {this loop scans across the current line, pixel by pixel}
differential = col[x,y] - col[x-1,y] {get the differential for the current pixel}
if differential>theshold then ballrecord[x]=ballrecord[x]+ballsize {if we meet a positive going edge, then record it using ballrecord}
if differential<theshold then ballrecord[x]=ballrecord[x]-ballsize {if we meet a negative going edge, then record it using ballrecord}
if ballrecord[x]<0 then ballrecord[x]=ballrecord[x]+1 {slowly decrease the magnitude of the edges stored in ballrecord, so that old edges are slowly forgotten. First, negative going edges...}
if ballrecord[x]>0 then ballrecord[x]=ballrecord[x]-1 {...then positive going edges}
end {for x}
end {procedure differentiate_line}
procedure spot_balls {scan the data in ballrecord, and see if there are any balls}
for x=1 to xsize {scan along the line of data in the array ballrecord}
if ((ballrecord[x-1]>0) and (ballrecord[x-1]<3) and (ballrecord[x]<0) and (ballrecord[x]>-3)) then is_it_a_ball(x) {If there is a change from positive-going edges to negative going edges, it may be a ball}
end {for x}
end {procedure spot_balls}
procedure is_it_a_ball(x) {a pattern similar to (+1 -1) has been found by spot_balls. Check it out - Is it a ball? See program notes}
variable symetric_pairs: Integer {Define a variable symetric_pairs so we can use it to count how many pixel pairs are symmetrical on each side - See program notes}
symetric_pairs=0
for check=0 to ballsize {Fiddle with this value for ballsize and see what works best}
if ((ballrecord[x+check]<>0) and

(ballrecord[x-check]<>0) and

(abs(ballrecord[x+check]+ballrecord[x+check])<2)

then symetric_pairs = symetric_pairs+1

{If the magnitude of the two pixels either side of the centre at distance check are roughly equal and opposite, increment symetric_pairs.}
end {for check}
if symetric_pairs>(ballsize/2) then got_a_ball
end {procedure is_it_a_ball}
procedure got_a_ball
{We've got a ball, at [x,y]}
{Do whatever here}
end

 

 

PROGRAM COMMENTS

The program scans down each line of the image one by one, and for each one it performs two passes. Each of these passes involves scanning across that line from left to right. In the first pass, each pixel is differentiated, and the differential is compared with the threshold differential to indicate the presence of an edge (the constant "threshold"). If the differential is greater than threshold, a positive-going edge is recorded; if the differential is less than minus threshold, a negative-going edge is recorded; and if it's magnitude is less than threshold it is ignored.

The edges are recorded in the array ballrecord. This system is a little complicated at first. When there is a positive edge, the ballrecord value corresponding to the horizontal position of the edge is increased, and when a negative-going edge is detected, the ballrecord value is decreased. In the program above, these magnitude of these changes is set to how large (in terms of pixels) the ball is likely to be. You'll see why in a minute... Fiddle with the exact value to see what works best.

It's fairly obvious that a positive going edge in the line above the current line is much more important to the analysis of this line than a positive-going edge that was detected ten lines above. Therefore a system of decay is employed. After each line is processed, all of the values stored in ballrecord have their magnitudes reduces by one (ie 2 --> 1, -2 --> -1). Therefore both positive and negative edges are slowly forgotten, or at least become less significant.

When you're scanning a ball, you don't want to have completely forgotten the top (i.e. the ballrecord for the centre of the ball where the top was is 0). Thus the value by which ballrecord is altered for each edge (set in the program above to be the diameter that balls should appear), must be sufficient that an edge is not forgotten in the number of lines that will be passed before the ball is recognised as a ball. But it must also not be too large such that the ballrecord values explode out of control. To be effective, 95% of the time the values must be zero.

Experimentation is required here - for the first few times, make the program output a list of all of the ballrecord values after each line. Then you can study exactly what happens each line. Remember if you've got more than a very few ballrecord values that are non-zero, you're spotting too many edges, and responding to them too violently.

The second pass across each line looks for patterns in the ballrecord data which might represent a ball. When we are a few lines down into the ball, we are looking for a pattern like this:

0 0 +9 +6 +4 +3 +2 +2 +1 -1 -2 -2 -3 -4 -6 -9 0 0

 The key features are:

Firstly, look for a point where the ballrecord values change from slightly positive to slightly negative (or slightly negative to slightly positive for a dark ball). This means that we look for (+1 -1) but NOT (-1 +1) or (+6 -1) or (+6 -6). When we meet a pair such as (+1 -1) we know that a pixel that may represent the centre of the top of a ball was reached at this x co-ordinate, and that we have scanned several lines of the image since then. When such a pair is found, the program calls the procedure "is_it_a_ball".

When such a pair is found, search either side of it. Check that most of the values are close to being equal in magnitude and opposite. Allow for the fact that some values may be erroneous - maybe check that half the pairs of symmetrical values are within so many units of being equal and opposite. The requirements vary from one situation to another, so the only sure way is to study those ballrecord values.

3.1.2 Squares and other shapes

Circles are the easiest shapes to recognise because they appear the same from all angles. But sometimes it is necessary to spot crates and other less uniform objects. Let's start with a simple square lamina. This can appear as any of the below images:

In many cases it may be possible to ignore some of the more tilted appearances. If the square is mounted straight and the robot running on horizontal ground, the square should always appear straight (not tilted) in the robot's view. This assumption cannot be relied upon, though. There is bound to be some inaccuracy in the mounting of the square, and equally some inaccuracy in the camera mounting. As the robot moves along there will be mechanical vibrations and even swaying if suspension is employed. The overall result is that even in the most uniform conditions, it will be necessary to allow for a tilt in the region of +/-5 degrees.

In essence, the same technique is applied as was described in the section on spotting balls and other circular shapes. In the algorithm described in this section, a square would be recognised as two values, or groups of neighbouring values, of ballrecord which become exceptionally high (two straight vertical lines of edges - one all positive; the other all negative). The code would obviously have to be adjusted so that these "peaks" did not explode out of control - in the code above the ballrecord value would increase by five for every line of the square and hence would become ridiculously high. One possibility is to set a maximum value above which ballrecord is not increased any further.

3.2 Distance estimation

It is often very useful to be able to determine the distance of an object. This allows us to predict how long it will take us to get to the object if that is our aim, and also helps with analysis if we want to know how big it is. A golf ball immediately in front of the camera looks just like a basketball a few yards back. The only way to distinguish which is which is by knowing the distance. Once we know the distance, we can do a little trigonometry involving the size of the object in the video frame (in pixels) to find its actual physical size.

3.2.1 Vertical position

This only applies when the robot is seeing objects lying on a horizontal surface such as the floor.

When analysing the frame, we look out for the bottom-most edge of the object. This is the part that will be in contact with the floor (or at least closest to the floor). If the object is near, this bottom edge is near the bottom of the frame; if it is far, the bottom edge is nearer the horizon.

The bottom edge is in many cases very difficult to see clearly as it will be in the shadow of the object. The top edges of the object can be seen very clearly, but the lower edges fade away into darkness more gradually, which is not easy to spot by differential means. In this case it is often acceptable to use the vertical position of the top of the object instead, but this is not desirable when we want to know the physical size of the object. When the top is used, it is obvious that the exact distance depends not only on the measured vertical position of the top, but also the height of the object. If the exact shape/size of the object is to be determined, the presence of this uncertainty is not acceptable, making it essential to use the bottom. However in most cases where only a rough idea is required, it is acceptable to ignore this.

Note that the scale for converting vertical position to distance is not a linear scale, and also depends on the angle of the camera mounting. Often the easiest way is to analyse frames from objects at certain known distances, and use these to create a look-up table of distances for different positions.

3.2.2 Parallax

This is the system that we humans use, and has the advantage of working under more or less any conditions. The object in question does not have to be on any kind of floor. The disadvantage is that two cameras are required:

Two cameras are mounted a fixed distance apart, and pointing in the same direction. The mounting need not be too precise that the angling of the two cameras is identical - any inaccuracy can be ironed out in calibration stages - but MUST ensure that there is no relative angular movement between the cameras. This means that the two cameras must be fixed together so that any difference in the angle which the face is absolutely constant.

During analysis, the target object is traced in both camera images. The horizontal position of the target is found in each image - it doesn't matter part of the object you trace the horizontal position of (left, right or average), so long as the same is done in both frames. If there is a great difference in horizontal positions, the object is close; if the difference is less, the object is further away. This is illustrate to the right - the views from each camera is shown on the screens below.

Often it is simply enough to categorise objects as near, middle-distance and far, in which case little calibration is required. Simply take a few test readings for objects on the scale of distances you will want to measure, to set suitable boundaries in the measured positional difference for your categories of distances. Where greater accuracy is required, the relationship approximates more accurately to:

 

Distance = k (d-z) / Tan (c x (d-z))

Where:  k, c, z = Constants - to be determined or approximated by calibration.

d = Difference in horizontal position of object.

To calibrate, test your system three or four times with an object placed at known distances. For each case, measure the difference in horizontal position of this object between your two camera images. Note that it doesn't matter what scale you use to measure the distances - the constants in the relationship will adjust accordingly. You can then use your results as follows to find the values of the three constants as follows:

Measure the most distant object you can find (a tree out of a window is usually a good bet). Your measured horizontal distance for this object (assumed to be infinitely distant) is the value for the constant z. This should be close to zero (or even equal to zero), as it represents any fixed inaccuracy in the mounting of the cameras.

Finding the constants c and k is a little harder - the following method is rather trial and error, but should be fairly easy to follow. Measure two fairly close objects at known distances. For the moment assume that k=1, and vary c until the output results for distance are in the same ratio as the ratio of the measured distances. The output distances do not have to be the correct values - just in the same ratio. For example, if the two objects were measured at 1m and at 3m, you would accept values of 0.1 and 0.3, but NOT 1 and 4. It may be helpful to know that increasing c increases the ratio; decreasing c decreases the ratio.

Finally set k to be the relative difference between the two ratios. This means the value that the calculated ratio should be multiplied by to produce the measured values.

You may be thinking that this arithmetic is not is easiest thing to do on a small microcontroller unit on a robot. This is true - you are unlikely to have many floating-point facilities, and certainly no built in trigonometry functions. The solution is generally to approximate to integer arithmetic, and use look-up tables of pre-programmed tan values.

3.3 Optimisation

It is often possible to eliminate much unnecessary information from the image by altering the mounting setup for the camera, and this obviously gives the visual analysis setup a head start.

When working outdoors, trees and clouds are a source of substantial amounts of confusing data. Notice in the sample image above how well defined the pattern made out by the tree is. If the camera is angled down more towards the ground, this pattern will be out of sight and so will not have to be eliminated by software. An alternative to angling the camera downwards is to attach a sun-shade immediately above the camera mounting which obscures the sky from view. This will also help to cut out glare from the sun, which leads to a white-out at best, and a burnt out camera when severe.

The same idea can be useful indoors, although it is less important. Overhead lighting can cause white-out just like the sun. And if our robot is only interested in things it finds on the floor, it can only be a source of confusion if it can see anything other than the floor.

When robots are dealing with coloured objects, these colours can be indistinguishable in a grey-scaled image from a monochrome camera. This is not good - for differential analysis to work well, objects need well-defined edges. It isn't usually necessary to invest in a colour camera, though - placing a coloured filter over the lens can often enhance the colour difference just enough to make the two offending colours look different. Which colour to use can be a challenge to work out; experiment is often the only sure way.

You can often help you robot by placing logos on objects that are difficult to spot by visual analysis. Let's say your robot must find it's base after finishing a job. The base may be hard to spot, but if a logo is painted on it the robot simply has to look for that. This logo can make use of an unusual shape which is unlikely to crop up elsewhere - maybe a circle and a square side by side. Both are easy to spot, but the combination is unlikely to crop up and be mistaken for the base. Alternatively an unusual colour could be used that the visual analysis can look for.

4 Conclusion

Working with visual sensors is without doubt a more complicated business than ultrasonic alternatives, but the rewards are enormous. As well as just increased range, and often more reliability, the realms of object type detection suddenly become available. Give it a go...

 


Dominic Ford, September 1999.

Website: www.ballbot.mcmail.com

Comments? Email me on: dominic.ford@cwcom.net