Encoder Home

List Articles by...
Issue
Topic
Author
Encoder logo

Color Vision System

by Kenneth Maxon Winter 2006

'A Hardware Based Color Vision System for Embedded Robotics Applications'

Introduction:

Article Format:
  • Introduction
  • Overview
  • Interconnects
  • VIP
  • FPGA
  • Code
  • Algorithm
  • Image Transfer
  • Application
  • Other Functions
  • Schematics
  • Implementation
  • First Steps
  • Future Steps
  • Wrap-up
In the author's past Encoder Articles the review of technology related to amateur robotics grows and grows in an ever-increasing fashion. Some of the more recent articles focus on digital logic like CPLDs and still others on advanced logic implementations like laser range finding. This article will take the progression of advanced logic application one step further. The article will detail the implementation of a generalized color vision system for embedded robotics applications.

Typical of many of the systems developed by the author, this system offloads most of the functionality from the processor making it ideally suited for single processor embedded robotics application. Offloading the processor from vision data gathering tasks makes this system truly real time. Not that it is extremely fast, or resource efficient, but it is real-time in the sense that the processor is unloaded to the point that it may handle any number of other events without being bogged down with the need to service the vision system constantly. Compared with other vision systems available to amateur robotics enthusiasts today, there is no overhead involved with moving the data into or out of the processors memory space.

This article, in its presentation, will attempt to cover all high level aspects of the vision system at a level appropriate for intermediate-advanced reader already familiar with some concepts relating to video imaging, programmable logic and general high speed embedded processor design. An overview of the technology will be followed by application data and wrapped up through the presentation of implementation details.

Overview:

Where past vision articles present specific implementations detailing data extraction from NTSC style video signals, this article presents a generalized vision system implementation. The system presented in this article is more generalized in that it allows for full frame real time video capture. As the article progress, detailed implementation as well as some specific applications are presented, like color blob detection and a new version of laser range finding. To start off, an overview with description of the large / granular blocks of functionality follows:

This design makes use of six basic functional blocks to build the vision system (u-Processor, FPGA, SRAM, Video Input Processor & Camera). Referring to fig.1.1 below the reader can identify each of these blocks and the connectivity they share. The thicker lines with arrows represent the video paths (analog & digital).

Figure 1.1: Design Overview

The first of these functional blocks is the u-Processor system. Since this is a generic vision system, the article will not focus in on any u-Processor specifically. The implementation is generic enough that just about any processor with an address bus and data bus that allow for memory mapped operations will support it. Near the end of this article the author will demonstrate the specifics of the design implementation and processor choice. It is enough for now to understand that practically any processor will suffice. It is also relevant to note that even under powered 8-bit processors can be used as the high overhead of running the system is taken on through the use of programmable logic.

The second functional block in the vision system is the heart of the operation. Programmable logic is used to implement a number of state machines, memory mapped registers and other functions to off-load the routine image capture functions from the processor. The logic quantity and density to implement all of the functionality required by the color vision system require devices much larger than the CPLDs used in the author's previous projects. In this project an FPGA produced by Xilinx will be used. In choosing an FPGA it was important to find one that used voltage levels compatible with the processor. The need for voltage level conversion chips just drives up board size and debugging complexity. The choice that best suited the color vision system comes from the Spartan II-E family by Xilinx, the XC2SE-300. A much smaller FPGA could have been used in it's place, (even the smallest one they make the XC2SE-50) however the robot uses many vision systems in the same chip concurrently as well as other features, not discussed in this article, that need the extra logic.

The third functional block in the vision system is the analog front end. In a sense it represents an analog to digital conversion function. Since the input to this system is color NTSC video and the output from this system is raw digital video, it takes a bit more circuitry than a standard high speed analog to digital converter. If the goal was black and white image capture, simple A-to-D would be enough, but capturing the phase relationship to high-speed carriers in color video requires much more power. This raw silicon power comes in the format of a VIP (Video Input Processor). The VIP used in this design is the SAA7111A produced by Phillips Semiconductor.

Figure 1.2:

Camera

The next functional block (fourth) in the vision system is optional in some cases. The block represents a static RAM pool. It is connected directly to and only to the FPGA. Data is moved into and out of the RAM by a state machine with-in the FPGA. In this case, the data is that of the video image being captured. This component is optional based on some of the design constraints. Today's modern FPGAs contain large amounts of RAM. The amount of RAM required to store an image depends on the size (X-Y + color depth) of the image the system is designed to capture. The FPGA used in this example has enough internal RAM (referred to by Xilinx as Block-RAMs) to store an entire video image (of size sufficient for many robotics applications like 1/4-D1). This system hosts external SRAM due to system requirements to implement multiple concurrent video systems in the same FPGA as well as other functions. To simplify the interface, a pair of SRAMs was chosen giving 16-bit wide access to accommodate color depth and timing issues. More on this later... The parts are part number AS7C34096 from Alliance Semiconductor.

The fifth functional block in the vision system is the color video camera. The choice of video camera, many years back, centered around physical size parameters required to meet the space available on the robot. In this application, as long as the optics and CCD are of reasonable quality, the NTSC - CCIR/ITU601 definition assures that most cameras will work. Cameras covering the entire range from JameCo to Edmund Scientific will fit the bill. The image to the right (fig.1.2) shows the camera, including optics, installed in a test fixture on the author's bench. The laser, sonar and IR sensors are also included in the picture to offer the reader a sense of just how small high quality color NTSC cameras can be packaged these days.

The sixth and final functional block in the vision system is that of a built in color LCD direct drive display. It is drawn with a dotted line in the figure above, as it will not be detailed in full in this article. The complexity of the vision system compounded with that of the display system and their interactions to the same SRAM will have to wait for yet another article. It is noteworthy that built in display technology greatly simplifies the debugging of video systems.


Table 1:
  • Camera Interface:
    • Power +12VDC & Return
    • Composite video & signal return

  • VIP Digital Output:
    • Digital Color Space Data
      • 5-bits Red
      • 5-bits Green
      • 5-bits Blue
    • H-sync
    • V-sync
    • data clock

  • SRAM Interface:
    • 16-data bits (15 used, Bi-Dir)
    • 19-address bits
    • Write Enable
    • Output Enable
    • Chip Select

  • Processor Interface:
    • 8-data bits (bidirectional)
      • Processor Data Bus [31:24]
    • 18-address bits
    • Read/Write
    • Chip Select (8 bit access)

Interconnects:

In fig.1.1, above, the arrows represent electrical interconnection. The thicker lines depict data paths for video data, both analog and digital. This data takes on different formats between the different blocks in the picture. The easiest data path to follow is that of data capture (right to left), from the camera to the processor. Between blocks 5 & 3 (Video Camera & VIP) the video is in analog NTSC format and is conducted through impedance matched 75ohm coaxial cable. Between blocks 3 & 2 (VIP & FPGA) the data is in a multi-wire time synchronous digital format with separate syncs (H&V)m a clock wire and 16 bits of color data. Between blocks 2 & 4 (FPGA & SRAM) the video data is only content data, and the time content (H-Sync, V-Sync & Clock) has been abstracted as the address in which the data is stored. (More about this later.) Finally, between blocks 1 & 2 (u-Processor & FPGA) the video data is randomly accessed memory mapped image data (across the processors address & data bus). The application outlined in this article will demonstrate an 8-bit data bus, to reach directly the largest reader base, but any width can be used.

The table, right, outlines the number and type of signals represented by each of the thick black interconnect lines in Fig.1.1, above.

There are two other types of interconnect depicted in Fig.1.1. Thin lines represent these connections. The first of these interconnects, between block 1 & 2, connects some general purpose IO pins on the u-Processor to the programming pins on the FPGA used to load the FPGA at boot time. The second interconnect, between blocks 2 & 3, is an I2C connection. Phillips uses I2C to interface to the control registers in most of their video processing ICs, so it is no surprise to find it here. The FPGA in this application implements several concurrent I2C engines. (More on this later in section 9, Other Verilog Functions.)

Operational Overview:

Referring to Fig.2, below, the reader can see the physical instantiation of each of the six elements from the preceding discussion. There are three major differences between the physical implementation and the overview depicted in Fig.1.1. First, the vision system design is a daughter card and as such the u-Processor system resides on another circuit card. The 64-pin expansion header interconnects the u-Processor address and data bus including a handful of control signals between PCB-assemblies. There are bi-directional bus buffers on the u-Processor circuit card to add signal drive integrity, at speed, through the connector. The second primary difference is that there are 3x VIPs installed on this circuit card for interface to 3x video subsystems all running concurrently with-in the FPGA. The third and final deviation from the design template of Fig.1.1 is the connection of the I2C busses. These don't run from the FPGA on this board as indicated in the overview Fig.1.1. Rather, these I2C communication channels run from a separate FPGA on the main (u-Processor) circuit card assembly and are connected to the 3x VIPs via discrete signals on the 64-pin expansion stacking bus. Even though a 208-pin TQFP package is used in this design, a little later the article will demonstrate that a limitation of available IO pins drove this 'awkward' design choice.

Figure 2: Design Footprint

Looking at these two figures (1.1 & 2) as data flow templates, the reader can identify a right to left data path for the video as it passes through the system. (5=>3=>2=>4=>2=>1)

VIP: (Video Input Processor)

SAA7111 Data Sheet

VIP Config 'C'-code

The article is focusing in on digital FPGA capture and processing of video images, however as indicated in the section above, the connection to the video camera is analog NTSC video. It follows then that somewhere in between a conversion process from analog to digital must take place for the capture process to begin. The device for the task is called a VIP, or video input processor.

The VIP used in this project is the SAA7111A made by Phillips Semiconductor. A link to the PDF data sheet is available, right. Given the complexity of a 72 paged data sheet, a couple dozen configuration registers controlling wide ranging filter values and digital interface settings, the 'C' code used in this project to initialize and configure the VIP has also been provided.

The i2c_write and i2c_read functions used in the 'C' source to the right are presented, later in the article, in the section that deals with I2C Verilog functions. For now it is enough to know, that the parameters passed to the functions are as follows: 1, I2C bus number to write to. 2, Address of the unit on the bus to access. 3, Sub address to access. 4, Data value to write.

Table 2:
  • R = Y + 1.371 Cr
  • G =Y-0.336 Cb - 0.698 Cr
  • B = Y + 1.732 Cb.
The SAA7111A proved the right choice for this project due to its support of legacy RGB raw digital video format with separate syncs (H&V) plus clock. In contrast most of the newer VIPs, on the market today, use the YUV color space. The SAA7111A provides the color space conversion for the designer, which removes computational complexity from the FPGA. The conversion from Y,Cr,Cb to RGB is governed by the equations in table 2, left. Various blocks of open source, Verilog code for FPGAs exist to covert between color spaces. Another design criteria considered was the parallel 19 bit wide data output format of the SAA7111A Vs the 8-bit time multiplexed coded CCIR-656 format supported by so many other VIPs on the market. At the VIP decision point of the design, this project was somewhat complex as far as amateur projects go. Thus, the decision was made to simplify and choose a chip set with the desired output format.

In this application, RGB color space is preferred for the ease in which color "closeness" can be determined. Color closeness has application in edge detection, and advanced blob detection. The relationship is governed by projecting the 3 components of color (RGB) as Cartesian coordinates into a three dimensional color space (cube). The closeness of a color match between two differently colored pixels is then given by the radius of the 3D sphere encompassing both pixels with one at its center. The quantitative value is governed by the standard 3D distance formula were R2,G2,B2 and R1,G1,B1 are the color components of the respective pixels being compared. Future developments related to this vision system will take advantage of this convenient relationship.

Figure 3: Color Space Distance Equ.

The connectivity of the SAA7111A is pretty straight forward, as the snippet from the app-note below (Fig.4.1) indicates. Complexity arises when dealing with the volume of data produced by the VIP in real time. In the digital configuration used, the VIP outputs a 16-bit word of data + 2 syncs and a data clock at a rate of 13.5Meg-words/sec. Storing and algorithmic processing of data with these high volumes is where the FPGA that the article will address next comes into play. First, a quick review of some issues when mixing analog and very high speed digital circuitry.

Figure 4.1: Implementation Snippet

Figure 4.2:

Plane Separation

Working with a PCB-Assembly like this, one that combines high frequency digital edges along with analog video conversion, it is extremely important to keep the analog and digital zones separated. The CAD snippet depicted in Fig.4.2, above, shows one of the inner copper layers of the board. This layer is the ground plane layer of the right side of the board. If the reader compares this edge of the board artwork to the image of Fig.2 it should stand out that the separation of the ground planes runs directly under the VIPs. The separation is at its minimum point 0.036".

The Video Input Processor is designed with all of the analog connections emanating from one side, grouped conveniently for shielding and termination. The analog ground plane is mirrored on a second internal layer by an analog power plane. The connection between these analog power and return planes and the digital ones takes place via a PI filter with center notch tuned to the 5th harmonic of the digital edges from the FPGAs driving circuitry. Note: Remembering back to Fourier theory and a signals class, the critical piece in the design of the filter is that it is tuned to the frequency content of the rise of the digital edge, NOT the speed of the signals running through the drivers.

The complexity of keeping digital noise out of the analog signals is only half the battle when working with the board layout for this project. The fact that the digital signals them selves contain very high speed edges and the setup / hold margins for the 10-ns SRAMS are so very low, even when accessed every other cycle, requires careful board layout. Finally, this is compounded by the fact that the power supplies are 2x PCBs away which drives the need for adequate low impedance decoupling and again, PCB layout becomes extremely important.

When configured for 16-bit R,G,B operation, the SAA7111A VIP actually outputs 6 bits of green, and only 5 bits of blue and red. Since green has the most direct correlation with black and white video (contrast) many video systems encode it with a higher bit density. For this application, the extra LSB of green is discarded to deal with the color space as flat [5,5,5] bit encoding. The VIP supports other modes including [8,8,8] however these are unused in this application. The physical connections for [8,8,8] data are the same, but the data rate increases from 13.5M-words/sec to 27M-words/sec. The FPGA can easily handle this data rate, but the author could not justify this high level of quality in an amateur robotics implementation.

FPGAs:

Xilinx Data Sheet

The Video Input Processor reviewed above spews forth data have a very high rate. The remaining components of the video capture system are now left to handle a very high data rate (13.5M-words/sec) stream of digital video data + clock + syncs, that need to be transformed before it can be used. The next functional piece in the design, the FPGA, implements these transformations as well as the rest of the functionality in the design.

It is not the intent of this article to provide an entry-level introduction to FPGAs, rather, the article will focus in on the specifics of implementation to form a generic color vision system. For those wanting to dig a little further into the inner workings of an FPGA, the link, right, is a PDF data sheet from Xilinx. This of course will need to be supplemented with further readings on the languages used to configure FPGAs and a firm understanding of how that language synthesizes into logic constructs within the fabric of the device. Several references for further reading are provided at the end of this article.

As mentioned above, the FPGA used in this design is from the Spartan IIE family of FPGAs by Xilinx. In particular this design is implemented with a XCS2E-300. All logic internal to the FPGA is clocked from a single source through the FPGAs clock distribution network. That clock is 50MHz. The clock source is PLL locked to the processor clock, so the interface to those signals is direct. The external VIPs however operate on separate asynchronous clocks, so their input control signals require double buffering to cross clock boundaries. Note: Driving signals across clock boundaries in a completely synchronous design is a deceivingly complex problem and warrants further research on the part of the reader.

Figure 5.1: FPGA Loading

Xilinx FPGAs are RAM based devices. RAM based technology requires that all RAM cells with-in the device be re-configured (set/cleared) upon every power cycle. There are many ways to configure an FPGA device. For this robot application, the u-Processor is used to configure the FPGA through the connection of a few general purpose IO pins on the u-Processor to special programming port pins on the FPGA. With this set-up the FPGA's configuration (code/logic) is stored in a static array which is compiled and linked with the processors 'C' code so that each time new processor code is loaded into FLASH, a new copy of the FPGA logic configuration is also loaded.

Loading configuration from a u-Processor is an acceptable means of FPGA configuration however, there are a few items to take note of. The XCS2E-300 FPGA used in this project requires over 234K-Bytes of code storage space. This large memory requirement is more than the typical 8-bit processor's address range. In such cases, using a flash chip loaded directly by a PC using JTAG or some other means, and automatically loaded by the FPGA at start up may be better suited. Of course, smaller FPGAs require much less configuration space. The data output from the Xilinx tools (after some parsing / massaging) used to configure the FPGA looks like the following:

unsigned char fpgadata_Data[] = {
    0xff, 0xff, 0xff, 0xff, 0x55, 0x99, 0xaa, 0x66,
    0x0c, 0x80, 0x06, 0x80, 0x00, 0x00, 0x00, 0x88,
    0x0c, 0x00, 0x03, 0x80, 0x00, 0x00, 0x00, 0x00,
    .
    .
    .};
unsigned long fpgadata_ByteCount = 180252;

Another detail for the amateur to take note of is the state of the FPGA's IO pins upon power up. Since the device is RAM based and does not retain its configuration, then it makes sense that the IO pins are not configured out of reset either. In this implementation, a boot mode is applied that brings the device up with all of its IO pins in tri-state mode with very weak pull-ups.


XApp176 FPGA Config

C-Code for FPGA loading

Free Xilinx Software:

From their web site, Xilinx provides several excellent application notes describing the different algorithms used to configure their FPGAs. The 'C' code used to load the FPGA from the u-Processor in the generic color vision application is provided through the link to the right:

Xilinx offers free tools to compile and work with their FPGAs up to a certain size. Unlike most companies, these tools can handle some pretty large / complex programmable logic. The user will have to register the tools with Xilinx to get them up and running, however there is no charge for them. These tools will compile Able, VHDL and Verilog. These tools require the use of either windows 2000 or windows XP. All of the code examples including the final robot application reviewed in this article with 3x vision systems and display drive functionality operating concurrently can be synthesized / fit by the freely downloaded Xilinx web-pack tools.



Verilog Source:

dj_top.v

ram_ scheduler.v

fifo_34.v

video_ capture.v

blob_ detection.v

serial_ divide.v

dj_vid_ pins.ucf

A complete project file (600+ KBytes, when zipped) with all files, source, synthesized, fitted, place & route parameters are available on the author's web site. Files #15
As the reader might imagine programmable logic of this complexity takes substantial amounts of HDL code to describe. The language used by the author to describe the logic in this application is Verilog, a 'C' like Hardware Description Language. The files and structure for them are outlined in Fig.5.3, below.

The overview below, Fig.5.2, graphically depicts the connectivity between the files left. Each file encompasses a block of functionality, and the arrows delineate the interconnectivity of the ports on these modules. As the graphic shows, the highest level of data traffic is through the RAM scheduler module. The data flow through that module is compounded when the other two vision systems (not shown here) and the LCD display drivers are layered in with their respective FIFOs. Each of these modules moves data to & from the external SRAM through the same RAM scheduler function.

Figure 5.2: FPGA Functional Structure


Fig.5.2, above, takes the reader a long way towards imparting an understanding of the data paths between the different modules used within the FPGA. Understanding these linkages is important to getting the "bigger picture" of the FPGA overall architecture. Each module (green block) with-in the FPGA represents a separate function, and a separate Verilog source file, above-left.

As the article progresses, figure 5.2 will be revisited many times. Each time a new layer of functionality will be overlaid onto the existing code base. The modularity / encapsulation of the language makes for easy work in these matters as the article will demonstrate in coming sections.

Fig.5.3, right, depicts the project files and structure. The structure is important as it delineates the hierarchical relationship and port / instantiation inter-connectivity between Verilog modules.

Figure 5.3: Verilog Hierarchy

  1. The top level file contains the processor bus interface as well as the instantiation templates for all of the other hierarchically nested modules. As mentioned previously, there is nothing special about this u-Processor bus interface. It could be interfaced to any standard 8-bit processor with an external address & data bus.

  2. The next file (ram_schdeuler.v) contains the logic required to manage the external SRAM interface. This logic bi-directionally, moves data into and out of the external SRAM upon prioritized request by several sources / destinations with-in the design. Examples of this include data coming in from 3x VIPs, data going out to the LCD display being re-drawn 80x times / sec and data coming in for video overlay from the u-Processor. This version does not implement a fairness algorithm, as calculations have shown maximum throughput loading to have no bottlenecks.

  3. Fifo_34.V is a very shallow (4-entries deep) FIFO that is 33-bits wide. It is used to store data waiting in line as well as the addresses in which to store that data, to move out to the external SRAM from each of the afore mentioned sources. Multiple instantiations of this function are implemented in this design. Due to their small size, these FIFOs are implemented using distributed RAM bits across the FPGA's fabric rather than extracting them to dedicated block RAMs which consumes resources and entails longer routing paths.

  4. The file Video_Capture.V implements the basic conversion algorithm and will be reviewed in detail below.

  5. Blob_Detection.V is a simple function that stores a running sum in X&Y of the coordinates of all pixels that meet the filtering criteria. The filtering criteria, are merely above and below cut off values for color in each R,G & B set by writing to registers through the u-Processor bus interface.

  6. Serial_Divide_UU.V is a function that, as its name implies, implements a straight forward integer divide unit capable of giving binary weighted - fractional results. Serial divide is used to average the sums (both X & Y) tracked in blob detect. Two parallel instantiations are implemented to correspond to the calculation for X & Y. The function is considered a serial divider as it generates one bit of output for each clock cycle. It is scalable to any number of bits for both inputs and outputs. (This function was originally obtained from www.opencores.Org, it was written by: John Clayton. The file, redistributed here is done so with his permission and GNU GPL notice intact.)

  7. The last file is a pin definition file. It maps the port connections of the top level Verilog file to make physical connections with pins on the physical device. It can also be used to implement different signal drive strengths to compensate for routing or loading purposes.

Figure 6.1: State Diagram

video_capture.v

Figure 6.2: D1 Res State Digram

Modified Algorithm

Algorithm:

The primary algorithm that "captures" the digital video stream coming from the SAA7111A Video Input Processor is contained in the Verilog file video_capture.v, presented above.

The video capture function consists of a conversion process, where the color space information is transformed from a data vs time context to a data vs storage location context. In this translation process, the timing of the data coming into the FPGA from the VIP must be translated into the address in which to store the data in the external SRAM. The algorithm is just that simple.

The state machine that implements the video capture algorithm consists of only 10 discrete states. In the coding style, it is implemented in two pieces, the first half is dedicated to state transitions and latched register operation, while the second half (dependant only upon the current activation of a particular state occupation) uses continuous assignment to implement its functionality.

The state machine simply keeps three running counters, a pixel_counter that tracks pixels along the video image, a line_counter that tracks lines down the video image, and an addr_index_counter that tracks the address where the next "pixel" will be stored in the external SRAM.

The detailed list below, is best used / read side by side with the Verilog file video_capture.v and the state machine flow diagram outlined in fig.6.1.

  1. In State one, the system is dormant. It loops back on itself, always returning to state one until it receives the go ahead from the processor bus interface. This transition is denoted by an, active high signal called begin_capture. The signal begin_capture is generated by the processor bus interface in the top-level file dj_top.v.
    • State two initializes a variable that keeps track of the number of video lines (line_count).
    • It also initializes the address variable (addr_index_count) that points to the location in SRAM to store the first video pixel.
    • Finally, this state loops back on itself, until it recognizes a valid vertical sync pulse, thus successfully locking the capture function to the beginning of a valid video frame. (This is indicated by the active high signal vsync_edge_found.) This signal becomes active after the first 25 lines of the video frame have passed.
    • State three initializes a variable (pixel_count) that keeps track of the "pixel" or time increment across the video line that is being captured.
    • This state loops back on itself until the signal hsync_edge_found becomes active high. This signal becomes active after the front porch and color burst have expired indicating the visible first pixel of any given line.
    • State four waits repetitively on valid data clock signals (llc_edge_found).
      • The address where the video data will be stored (addr_index_count) is latched.
      • The data to be stored (video1_raw) is latched
      • this state is exited
    • Otherwise this state loops back on itself awaiting the assertion of llc_edge_found.
  2. State five accomplishes three tasks.
    • From the back ground continuous assigner, a pulse is generated to send the latched address and latched data to the RAM scheduler for storage in the SRAM.
    • The address to store the next data value in the SRAM (addr_index_count) is incremented.
    • The variable that tracks the current pixel with-in the video line (pixel_count) is incremented.
    • Finally, this state passes through to the next state automatically.
  3. State six waits for the next activation of the signal llc_edge_found. Waiting for this signal in a repetitive looping fashion, and then doing nothing with the data effectively skips every other "pixel".
  4. State seven checks to see if the variable pixel_count has been incremented to the end of a valid line length of pixels. If not at the end of a video line yet, loop back to State four. Otherwise we have reached the end of a video line and the state machine vectors to state eight.
  5. State eight prepares for a new video line.
    • The address to store the next pixel in the SRAM (variable:addr_index_count) is incremented.
    • The variable that tracks the number of lines down the video image (line_count) is incremented.
    • The state is automatically exited to state nine.
  6. State nine checks to see if we're done with the current image capture. This is done by comparing the variable line_count to a constant that represents the number of lines expected in the image. The state machine loops to state three if the number of video lines has not yet expired, otherwise, it progresses to state ten.
  7. State ten outputs a signal pulse through the variable end_of_screen_capture in the background continuous assignment phase and then loops all the way back to state one to await the launch of another capture cycle.

At the top of the Verilog file, (video_capture.v) there are three other pieces to review. The signals video1_llc, video1_hsync, video1_vsync are each latched into a register on rising clock edges. On successive clock edges these signals are again latched into a second set of registers before use. This is a technique called double buffering. This technique is required to assure proper setup and hold times for signals that cross clock boundaries. Since these signals come from external circuitry run from independent (asynchronous) clock sources, they qualify.

Challenge to the reader: First, read through the Verilog file, review the state machine in fig.6.1 and the detailed explanation above. Once you have a good understanding, try to answer the following question without reading ahead for the answer: How many horizontal pixels and how many vertical lines are being captured?

If you could not find the answer and need a hint, the answers are in the file video_capture.v, above, in state 7 (title: VID_CAP5_CASE) and in state 9 (title: VID_CAP7_CASE).

Changing the Algorithm:

The Verilog files presented here in, were designed to work at approx.~1/4-D1 resolutions. The constraint was arbitrarily determined, aligning the image capture size to the display pixel resolution of the LCD the author is installing on the robot that will first implement this vision system. Making changes to the captured digital image resolution is easy, next, a look at how to change the capture resolution to full CCIR/ITU601 - D1 resolution.

The system is capable of (and has been tested) capturing full D1 video frames. With the configuration file provided in this article, the SAA7111A Video Input Processor is already configured to convert analog data at D1 resolution. It merely requires a bit of cut and paste with a few minor changes to the state machine logic running in the FPGA to store the additional data properly. Fig.6.2, right, illustrates these changes (highlighted green ) and rearranging denoted by state numbers.

First, remove state-6 in the video_capture.v state machine. This state was merely throwing away every other pixel. At the end of each video line, the address must be incremented by a full line width to by-pass a line of video that will be filled in during the next interlaced frame. To fill in all of these interlaced lines, state machine lines 2-10 are replicated at the end of the state machine. State #10 can be discarded as the point in the state machine only indicates the end of the first interlaced frame, not the end of the complete D1 image. In the new states appended to the end of the state machine, the line_counter and pixel_counter variables are still initialized to zero and treated as they were in the first half of the state machine. The only piece that changes is that the initial address stored into addr_index_count is offset by one line length to start filling the interlaced lines. Finally, the numbers used to compare against for looping are adjusted appropriately.

Of course, before the second state machine can be implemented, the state numbers must be linearized. One simple way to do this is, as the author has done, give parametrically defined names to the states, and then rearrange the numerical values in the parameter list. An example of this can be seen in the two, parameter lists [9:0] near the top of the file video_capture.v.

Now that the reader has seen the physical components, HDL code and algorithmic approach to this project, it is time to see how all of these come together in a completely functional system.


Application 1: Colored Blob Tracking

Figure 7.1: Color Blob Detection

Figure 7.2: Color Filter Example (blue)

One of the many applications the vision system currently implements is that of simple color blob detection. The image, right, is a digital snap shot of a color LCD display screen that is also driven by the FPGA on this board. In this image, the system is identifying the green centroid. Green is defined to the system as follows:
  • red_bounds {8:2}
  • green_bounds {31:10}
  • blue_bounds {24:10}

These bounds are specified to the system through registers in the top level file called red_upper, red_lower, green_upper, green lower, blue_upper & blue_lower. The readers can identify them by name easily in the file dj_top.v. These six (5-bit) registers are written into from the u-Processor bus interface (memory mapped), and their contents in turn are output to the blob_detect module for comparison of incoming raw video.

Color depth in this system is 5-bits in each of R,G & B. This leads to a color span of 0~31, with the color white being represented by [31,31,31]. For storage and processing purposes, these are dealt with as a single 16-bit word formatted MSB first as [0, R4, R3, R2, R1, R0, G4, G3, G2, G1, G0, B4, B3, B2, B1, B0]

The second image right, fig.7.2, displays an image of the system that has been tuned to filter only blue values. In the image, the FPGA has removed all other color information that does not pass the six filter criteria. In this image the reader can see how other, unexpected, objects tend to contain bits of the color in question. This can be observed in the circular object, which is actually the outer reflection of the lamp in the logic analyzer's display. In conjunction with this image, the following filter values are used, Notice the substantial green content in the blue filter values:

  • red_bounds {16:5}
  • green_bounds {30:16}
  • blue_bounds {31:12}

The text / font, cursor and locating lines are overlaid on the image real-time via information sent to the FPGA from the u-Processor. The entire section of the SRAM that contains the video image being captured is memory mapped into the u-Processors memory space. The X & Y data points (blob centroid) are read directly from the FPGA through memory mapped registers available to the u-Processor after video frame conversion is complete. All that is required of the u-Processor system to overlay text and graphics then is to write to the corresponding memory locations.

There is some hidden complexity in action here, as there are really two storage spaces in the external SRAM for each image being captured. The first is the location for the image currently being captured. This location is swapped (ping/pong style) with the secondary location which is the location currently being drawn on the display screen and also being used for processor graphics & font overlay. This necessary evil insures that torn / partially captured video displays are never output to the LCD which is also a real-time system.

Figures 7.3 ~ 7.6:

If the reader is paying close attention, they may notice that the numbers graphically overlaid (fig.7.1 & fig.7.2) are a little "off" for the location indicated on a typical CCIR/ITU601 video signal. In this application 1/2 of the vertical video lines and a little more than 1/2 of the horizontal video samples are being thrown away. This is only for formatting purposes of the LCD being used given its 320x240 resolution. The discussion above in the section on the capture algorithm could be applied if a higher resolution LCD were used.

The next four images in this section (figures 7.3 ~ 7.6) are direct digital captures from the vision system. The first pair of images show the laser light tracked on the authors finger between a distance of ~2' and 4'. The second image pair shows the differences tracked between 4' and ~6' with a lateral shift. Graphics overlay takes place in the FPGA prior to image download. In this set up, the vertical displacement between laser field generator and camera is small (~2") thus leading to lower parallax displacement between images. At this point the reader may be wondering about the color content on these images and the fact that it looks a little blotchy. These images were captured before a misalignment in the passing of the 4lsb green bits was discovered.


Application 2: Structured Light Extraction

If the reader has read the author's earlier articles like the one from Oct of '91 on laser range finding, you will then understand the concepts and importance of this functionality to mobile robotics. As a second application, this system has been used to implement a laser range finding function in the FPGA as well.

video ranging.v

figure 8.1: Additional Verilog Structure

Given all of the structure, video capture and blob detection implemented in the Verilog code above, it takes surprisingly little additional code, presented left, to implement laser range finding. This code is instantiated from the file video_capture.v. This module and the dual-ported block RAM module it instantiates are highlighted red in fig.8.1, right. One piece that really shines through here is the encapsulation and layering functionality offered by a modeling language like Verilog.

The premise of operation is simple. The code instantiates a 512x8 block RAM (dual ported) using one of the dedicated hardware resources of the FPGA. The module monitors the incoming address during valid video captures. As long as the video line_counter variable is zero, each successive pixel location scanned is stored as a zero effectively initializing the function's data space. After line zero is done, pixels are only stored if they pass the window comparator values set through registers from the processor bus. This is just another way of saying a minimum and maximum setting for R, G, and B are compared against the color space value of each pixel. If a match is found, the line number (line_counter) is stored in the linear array indexed by the variable pixel_counter.

The fact that video presents, time-wise, top of screen to bottom of screen, in this case, is very handy. The laser range finding function only keeps a 2-dimensional record of data points in front of the robot. The concern is always with objects that are closer, which will over write objects (false readings, etc) that recorded previously (farther up the video image). The important correlation here is that objects closer to the robot occur later in time in the video stream. Note: The previous assertion only holds true for the physical / mechanical configuration of laser and camera depicted in fig.8.4, below. For a discussion of the mechanics of laser range finding functionality, refer to the authors previous article on the subject that covers this and other topics in detail. Later, to read the data back out of this linear array, the processor bus latches an address into the second side of the dual port, and the data is fed directly out.

The two images below come from a piece of 2D graphing terminal software. They show laser returns, recorded with this color video system, looking at both a square item (fig.8.2) and a cylindrical item (fig.8.3). In this application, two methods where used. First, for a small robot that performs in a controlled environment like the Trinity Fire Fighting competition, it is sufficient to simply set the optical filter values within the FPGA. These filter values are memory-mapped registers written to through the u-Processor bus interface. The filter values are configured to "look for" / pass the red color of the laser light and the function in the FPGA can pull the data out of the video image automatically. For implementation in other noisy real world scenarios (contrasted against the engineered environment of the Trinity contest), the reader should consider the use of optical filters like those suggested in the authors other articles on laser range finding (link below).


Figure 8.2: Laser Range Image Square

Figure 8.3: Laser Range Image Spherical

Figure 8.4: Physical Setup for Laser Ranging

Referring to the author's earlier articles, the images above (8.2 & 8.3) are returns from a physical setup modeled as shown in fig.8.4 right. The difference between that setup and the one used with the color FPGA implementation is that the camera is not rotated 90deg onto its side. Additionally, the signal acquisition is performed by extraction of color / signal level within the FPGA as opposed to the use of comparators referenced against the analog black and white signal of the pervious version. To read more on that implementation, follow this link to a previous SRS-Encoder article: A Real Time Laser Range Finding Vision System.

This graphing terminal program is invaluable for developing distance based robotics sensors. This program, G-Term, was developed by a friend of the author, Alan Erickson and is available for download by amateur robotics enthusiasts. This program was used to generate fig.8.2 and fig.8.3, above.

G-Term


Figure 8.5:

laser range data overlaid on image through FPGA / processor interaction.

The last image in this section (figure 8.5) is a direct digital captures from the vision system. The image shows an overlay of tracking of the structured light output extracted by the FPGA real time and provided to the processor system at the same time as the captured image. This image was captured through the use of a second piece of software, I2Term also written by Alan Erickson. It is presented at the bottom of the next section.



Application 3: FPGA <=> Processor Image Transfer

Application sections one and two, above, focus on blob detection and structured light algorithms with the goal of off loading as much of the processing task from the on board embedded processor to external operators. Even with that goal realized, there are however sequentially based processing algorithms that are well suited to offline application with-in an embedded processor. In this section the project review will focus in on the mechanics of retrieving image data from the high speed SRAM attached to the FPGA and moving that data into the processors memory system.

Updated Verilog

dj_top.v:

Updated Verilog

ram_scheduler.v:

'C' Function

Retrieve_Image:

Any number of processes are available for this work, including not moving the data at all, and, rather, applying sequential algorithmic processing from the embedded processor system to the data through an FPGA "tunnel". Of course there are trade off's to be considered like processor bus speed and path delays due to the multitude of other accesses that must continue to the same SRAM real-time. (Continued image capture of the next frame, data extraction for display purposes, 2x additional vision systems running concurrently and their associated processing members.)

Given the system level constraints, data path constraints and timing issues a decision point drove to the choice of sequential data transfer of the captured image through the processors data bus interface for storage in external memory locally mapped. Two replacement files, updated versions of dj_top.v and ram_scheduler.v, are provided, left. These files implement sequential image retrieval from the FPGA to the Processor over the standard processor 8-bit data bus. The primary changes / additions from the two previously presented versions of these files are highlighted in bold / blue.

From an architecture stand point, when the processor is writing to the image data SRAM for image overlay, the interface is that of a wide parallel memory mapped space allowing instantaneous random access to any pixel in the array. This functionality is made possible due to the wide fifo's that can store both the address and data to be written for later writing to the SRAM when alternate accesses are being made. This is taken care of by the ram_scheduler.v module.

Figure 8.9

In the case of reading data in a random manner, there is no way to know the address for pre-fetch operations unless data is accessed sequentially and run, in a similar manner to storage, through a fetching fifo. This allows data to be present whenever the processor requests it and still allows the data accesses to the shared SRAM resource when time slotting by ram_scheduler.v allows. Further, this allows the same sequential bus interface methodologies already implemented in other section and not available on lower end processors such as PICs, AVRs or 8051 thus making it more accessible to a larger body of armature roboticists.

Layered into the data flow diagram, image 8.9, right, the two new blocks required to implement FPGA to processor image data transfer are highlighted in red. The code changes to implement the functionality is confined to two modules (dj_top.v, video_capture.v). The listing below outlines the theory of operation:

  • Variable retrieve_addr[17:0] stores the address of the next data point to retrieve.
  • The processor bus provides a means to reset this address corresponding to a write to address PROC_RESET_READ_PROCESS.
  • Retrieve fifo full flag is used to trigger an additional request to the ram scheduler for more data.
  • ram scheduler retrieves data based on the address provided by the retrieve engine, off line, and drives it into the fifo.
  • With-in the ram scheduler module, a new case has been added to the dispatch state and two new states have been added to the state machine to handle the case of data requests to the external SRAM to retrieve raw video samples.
  • The second state added to the ram scheduler triggers a write into the fifo of the retrieve engine through assertion of the bit wide variable retrieve_data_ready_strb.
  • Retrieve engine output is made available to the existing processor bus read mux in two halves.
  • Two addresses are provided due to the 15 bit nature of the data and the 8-bit processor bus.
  • Successful reads from the processor bus to the second address will advance the fifo's output index, changing the state of it's full flag and restarting the process.***
*** Note: some buffering is added to detect the de-assertion of the processors /CS (read = raising edge), to hold of the incrementing of the fifo such that the data driven onto the bus does not change until the cycle is over.



Figure 8.10:

captured scene

The image, left (8.10) is yet another example of direct digital capture through the system and transfer directly to the PC. In this image the back wall is some 30' back and the vertical separation between the camera and laser field generator is less than 2 inches.

I2Term.zip

The work to this point centers around FPGA firmware and minimal support by the embedded processor system. At some point in the development it would be helpful to move digital images directly into a PC for further research / work. To the rescue, Alan Erickson with this gem of a piece of software he developed to aid in exactly this situation. I2Term is that program. This software combines a terminal program and digital image capture / display functionality with auto image save features. This software was used to capture the images in figure 7.3 ~ 7.6 and 8.5, above. The software will begin capturing data after receipt of the character 0x07 in the serial stream. Data to be captured is in R,G,B color tripples, 8 bits each. That is all there is to it, just send enough data to fill the matrix defined by the image size in the image settings dialog box. The software is available for use by armature roboticists by downloading the zip file to the right.


Other Verilog Functions:

FPGA_I2C Instantiation:

'C' I2C Calling Functions:

I2C Verilog Source:

In this implementation the I2C bus (multiple 3x) connection runs between the VIP and the secondary FPGA (off board). This is where some of the IP (intellectual property) available through www.opencores.org can be leveraged. Since the logic description in Verilog is abstracted as a high level language, it can easily be re-targeted to Xilinx (or any other) fabric and used as a building block within any given HDL design.

  • The first of three files, right, shows how to instantiate (set up and call) the I2C IP function.
  • The second file, right, is the u-Processor side 'C' language calling functions to interface to the I2C engine.
  • The final file is the actual I2C code from the opencores web-site with license statements intact. It was originally 4 separate source files that have been pasted together into a single file. It was written by: Richard Herveille and is redistributed here with his permission.

Click to Enlarge

The photo, left, is one of those shot during the debug process of the I2C IP engine. Depicted in this photo is a screen shot of one of the author's digital logic analyzers monitoring the SDA and SCL lines. The analyzer used to capture these signals is an HP1662A by Hewlett Packard. This piece of digital trouble-shooting equipment plays an invaluable role in debugging the system without the need for extensive simulation. The tradeoff of in logic analyzer vs simulation time does come with the penalty of having to sacrifice some IO pins.

An interesting undocumented note with respect to the SAA7111A Video Input Processor data sheet is that the ACK at the end of a read operation is actually a NACK. Debugging with the logic analyzer really helps to find little missing bits like this. Note the NACK is the wide high pulse at the end of the top trace in the picture. The I2C specification clearly defines this mode of operation for standard bus transactions, while the SAA7111A's datasheet omits it.

Another interesting note with respect to the Verilog I2C engine. It must be disabled before writing new data to it, each time. It took a while in debugging to discover this. Reading through the source for the core popped up this fact, and once realized, all remaining implementation challenges were trivial. Richard did a great job and I was grateful for his answers to my questions while in the development phase of the project.

One of the first levels of debugging applied to this board was to turn on and off the LED connected to the GPIO (General Purpose IO) pin on the VIP. As outlined in fig.9.1 below, many individual pieces of the design functionality had to be up and running. The reader can see where the I2C engine in the FPGA fits into the data path. There are other functions like level converters and bi-directional data bus buffers, however the image still offers an appropriate overview...


Figure 9.1

Figure 9.2

The image, fig.9.2, right shows how the I2C module (red) fits into the FPGA / System hierarchy.

Given the increased functionality / data-piping through the processor bus module the interfacing becomes somewhat more complex. All of the functionality has been memory mapped to ease the interfacing issues. The top level interfacing (instantiation) of this function takes place directly in module "top.v". Since the Xilinx XCS2E family of FPGAs do not support open collector outputs, a tri-state interface is used. In this manner the output is enabled (OE asserted) to drive the line low, and when turned off, the output is tri-stated.

As mentioned above, the I2C engine(s) physically reside in a separate FPGA down on the processor board. This is done solely due to a pin count restriction on the FPGA package. The implementation is identical no matter whether it is instantiated in a single FPGA or 2x separate packages.


Implementation:

Leading up to this point in the article the focus circles around individual specialized pieces of the hardware design, connectivity and soft/firm ware implementations. Now it is time to present the full schematic which will nicely wrap up the system as a whole.

To make the schematic blocks a little more manageable, they have been broken up into the 6 pieces that follow. For those hearty readers that wish every last ounce of detail, a complete schematic in PDF format is available from the author's web site, link at the bottom of this page Electronics/Section-3/number-56:

Figure 10.1: Schematic

This first schematic section (fig 10.1) only hosts the power connections to the FPGA, as well as the programming signals from the processor section. Of note are the large number of decoupling caps. Given the distance from the power supply (2 PCBs & 2 cable connections) the slightly higher impedance demands this level of attention, especially at the speeds switched on this PCB.

The resistor 4-packs have been used because they greatly save on space. These four resistors are still reasonable to work with size wise and the package is just about the size of a standard 1206 resistor.

The FPGA comes in a 208pin TQFP. There are also BGA packages for it, but the visibly pinned TQFP package was chosen for ease of assembly in an amateur project.

The six-pin connector, along with other wire connectors on this board, comes from the DF13 family by Hirose. These connectors offer higher than standard density wire connections at the trade off of extreme cost in the crimping tools. Although this connector is not required in the final design, it provides a handy place to drop on the logic analyzer to verify the contents of the FPGA data loading sequence. These signals are the same as those depicted in fig.5.1, above.

The IO voltage for the FPGA in this design is 3.3v while the core voltage is 1.8v. This is done to lower the overall power consumption of the device. The split voltage then leads to 2x sets of decoupling caps. Due to the ground and power planes in the design, which slightly inhibit very high frequency response for power connections that pass through them, the I/O caps (3.3v) must be on the top side of the board, while the core voltage caps can be mounted on the bottom.

Given that the entire operation of the FPGA is derived from a single PLL locked 50MHz clock source, and that there are 57 outputs that can all change state simultaneously sourcing 8mA each, this leads to a possible current spike approaching 1/2-Amp for the outputs alone. Note, in this design, the author has specifically implemented the low current output mode on all FPGA pins. It remains to be seen, if installation into the robot with the display screen interface wires passing close to the shielded 1.5KV back light inverter will need higher drive strength. Xilinx provides analysis features built into their tool suite to help establish the worst case power transition based on the design logic changes for the 1.8v core supplies.


Figure 10.2: Schematic, cont.

The next piece of the schematic contains a large amount of the functionality in the design. Visible in the top left of the page is the 64 pin, 4-row metric vertical stacking connector. The next chunk in the upper right is another piece of the FPGA block. Here the address and data bus from the processor are connected (left side) and the address and data bus to the SRAM are connected (right side). Finally, the two pieces on the bottom are the SRAMs configured 16-bits wide.

The vertical stacking connector has metric pin spacing and is made by Samtec. (ESQT family) It's a bit hard to solder the center pins, but with a really fine iron, they can be reached. When ordering these, the tail height and seating height must be specified. These are semi-custom, so they cost about ~$6.50 ea. The gold finish and multi-point contact make them especially vibration resistant for use in mobile robots.

The SRAMs used come from alliance. These are 512K-byte x 8 giving 1Meg-Byte total. The part number used is AS7C34096-12 from Alliance Semiconductor. Yes, 12ns parts. I'm using them at about 20ns access time (50MHz state clock), it just greatly simplifies the setup and hold timing calculations. The design could have been implemented using a single SRAM part and double reads / writes for each access, if fewer that 3x VIPs were to be used concurrently + video output & overlay. (AS7C34096 Data Sheet )

The highest current draws on the board are the 3.3v and 1.8v power connections. Looking closely at the vertical stacking bus, the reader can see that 3x pins have been dedicated to each. There are additional bulk decoupling capacitance located close to the 64-pin connectors that are cut off in this schematic view.


Figure 10.3: Schematic, cont.

The snippet, left, contains the third and final piece of the FPGA schematic block. The two sections of connectivity here are made up of input from the 3x Video Input Processors, and the output to the LCD display.

Notice that of all the control signals coming from the 3x VIPs, only H&V syncs and data clock (LLC) are connected. These directly correlate with the three control signals that advance the state machine (video_capture.v) algorithm used to capture the raw video. They can be mapped to the state machine diagram above (Fig.6).

The connector for the display screen connection doesn't really carry the signals directly to the screen. Rather, the connector carries the signals to an intermediate board that correctly pins them, adds power and other signals and then directly connects to the LCD through a board to board connector (KX15). The connector is incorrectly marked on the schematic. It is in reality a dual row DF13 Hirose connector as well.


Figure 10.4: Schematic, cont.

In image 10.4, the clock drive circuitry for the 3x VIPs is shown. The resistors are used to soften the edges on these signals. Significant care must be taken during the board layout of these critical components.

The CY2309 family of parts is made by Cypress Semiconductor. It is specifically designed for zero lag clock distribution. Trying to route a high-speed clock to multiple destinations on a board is problematic, having stubs, reflections and multiple termination impedances. This problem increases with frequency. The clock distribution chip internally implements a PLL that keeps all of its output signals correctly phased. The 22-ohm resistors soften the edges for EMI purposes.

The 3x capacitive decoupling on the XTAL oscillator is due to the large (30ma) current swing from the part. The clock rate (24.576MHz) is derived directly from the ITU601 timing definition.


Figure 10.5: Schematic, cont.

Shown left is the connectivity for one of the 3x VIPs. There are 2x more redundant copies of this same schematic block with different signal names. Not shown here are the decoupling caps on both the digital and analog supplies. As mentioned previously, the analog and digital power supplies are driven from separate power and ground planes.

In the upper left of this view is the PI filter used to isolate the digital power and ground planes from the analog power and ground planes. One filter serves the planes shared by all three VIPs.

In the lower right corner of the VIP, there is a strapping input that allows the I2C address to be selected between two values. Since I'm using 3x of these VIPs there were not enough strapping pins to allow all of them to be used on the same I2C bus, hence the FPGA implementation of multiple concurrent I2C busses. I've got many other I2C devices on the robot that this goes into and having separate busses allows for use of all devices simultaneously.

Only one of the four, video inputs, is hooked up on each of the three VIPs. The video output is not used. The front-end termination components are implemented directly as recommended on the data sheet. These components are located on the bottom side of the board directly under each respective VIP, still over the analog ground plane. During development, on a separate test board, presented below, the video outputs were implemented to monitor the system during debugging.


Figure 10.6: Schematic, cont.

The final piece of the schematic shows the laser / LED driver and the camera connector. The drive for the LED comes from the VIP's single GPIO (general purpose IO) and is extremely useful for debugging. During actual robot operation it drives the solid state lasers used for range finding.

As with the other wire connectors on this board the 3x video, power and laser connections are DF13 Hirose 1.25-mm pitch connectors. These are used with 28 gauge stranded wire. To go any smaller would drive to DF10 connectors and solid conductor wire (between 32 & 36 gauge). It is noteworthy that the connection to the video camera is conducted via 75-ohm coaxial cable. Terminating the cable to the DF13 connector presents a substantial challenge.


The Final Design:

Add all of the above circuitry together with a couple nights of PCB layout, a six layer PCB, and some dedicated time soldering and you get the board in the photograph. Don't be taken by the miss-leading size of the photo, measuring 2.70" x 3.48", this PCBA is quite compact. Even with some of the components on the bottom side of the board, given several more nights of layout time, this board could have been smaller, however the current size meets the requirements of the robot it will be installed upon:


Figure 11.1: Completed Vision System Platform

First Steps:

As the reader might imagine, without any templates to follow, there are many steps in the development of a FPGA based color vision system such as this. Each step offers up its own unique problems and chances to make mistakes. Building a test system, helps to identify some of these errors early and gives the builder time to design them out. An example of this is the fact that the test board in fig.11.2, is the second incarnation due to a footprint mistake on the VIP and a missing LLC signal needed at the digital logic analyzer header and FPGA connector.

Figure 11.2: Prototype VIP test board

The first steps in the development of this system involved building a test platform, to evaluate the operation of the Video Input Processor, fig.11.2, right. This test PCB-Assembly was extremely useful in both debugging and learning some of the pitfalls of improper board layout for dealing with analog video.

The RCA jacks to the upper-right side of the photos are for video input and output. It is helpful to monitor the VIP's video output during development to visually see the artifacts left by the application of the various analog and digital filter pieces with-in the VIP. The large barrel connector next to the RCA jacks is to supply 12VDC to a small camera used to test.

The small group, of header pins in the lower right of the photo, is used to connect a third party I2C interface pod from the author's desktop PC to the unit. This allowed for test interfacing without that portion of the FPGA up and running. Registers inside the VIP could be set and functionality monitored. The PC=>I2C pod used in this development process is the iPort/AFM by MCC. This pod plugs into the PC's serial port and once its drivers are installed, interfaces to an I2C interface.

The final double group, of headers along the entire left board edge, provides double connectivity. The set facing upwards interface to a HP1662A digital logic analyzer, while the set facing downward, interfaces this test board to a development platform, fig.11.3, below.

Left over from some previous video generation work done by the author, the combination of the development board below and the VIP test board, both built by the author, greatly reduced the duration of this project. Having all of the components laid out on the top side of the board, using off the shelf power supply modules, color LCD driver technology and lots of extra headers for busses and FPGA logic, goes a long way to simplify the task.

In this photo the reader can see the darker green section in the front, right, quadrant of the VIP test board. This color indicates the ground plane fill on both the top and bottom layers of this two layer board, used to reduce noise around the analog signals. Again, power filtering is applied even to this simple board layout. L1, L2, C4 & C5 form a PI-filter network between the analog planes and digital power traces. (There are no digital planes on the VIP test board, an EMI nightmare.)

Figure 11.3: Author's development board w/VIP tester installed

In the figure above, the u-Processor, RAM, CPLD, FLASH, FPGA and VIP are all 3.3v devices. The single voltage across ICs greatly simplifies system integration and testing. A previous development platform, not pictured, used a MC68332 based 5v-processor system with level conversion circuitry on the interface to the 3.3v FPGA and VIP. Although that system is an incredibly powerful combination, moving to the Motorola ColdFire processor family (32bit-220MIPS), not only increased the processing power / speed of the system, it further reduced the power consumption drastically. In the realm of amateur robotics, the last statement can be interpreted to say that the battery life was greatly increased. To achieve these outstanding performance levels, both the FPGA and the u-Processor system run a 1.8v core with 3.3v IO's to further reduce power.

Throughout this article, the author has emphasized the use of an 8-bit oriented processor bus. The processor used by the author actually implements a full 32-bit wide processor bus. The reduced bus width of the vision system actually hampers processor performance when doing tasks such as video overlay, etc since multiple writes are required on a fraction of the bus width over many cycles, thus reducing through-put. A 16 or 32 bit interface would greatly improve the overall system performance. 8-bit data bus width was selected due to the number of pins available on both the FPGA and also the board stacking connector. This is more of an issue for the video output system (driving an LCD) than the vision system in general unless the reader attempts to implement full video image read-back.

Figure 11.4: Development Fixture

The final processor board that the vision system plugs into is of course much smaller than the development board above. It measures in at 4.35" x 3.6". For reference, all of the functionality discussed above, including the vision system fits in the small robot pictured in fig.12, below. Given the complexity and functionality of that system, and the fact that the vision system is processor independent, the author's processor system is not covered here, but can be viewed at the author's web site.

Even after building the development boards, above, time was still taken to develop a full debugging fixture. See fig.11.4, right. This fixture mounts all of the PCB assemblies, motors, encoders, and most of the sensors to a single portable unit for software debugging and test. This platform is where the final data to generate this article comes from.

The development platform mounts eight discrete PCB's, four sonar sensors, four IR sensors, three video cameras, three lasers w/line generating optics, 1 compass module, 1 tilt sensor, 2 servo motors, 2 DC motors, 2 256-line encoders and 1 LCD display. The eight PCB's in the system include the power supply board, the power supply interface board, the processor board, the gyro / accelerometer the vision system board, the display interface board, the LCD back light inverter board, and the radio interface board. The large cable bundles connecting the different PCBAs are only for use on this test fixture. In the actual robot, the boards stack / connect together directly. Given that configuration testing is impossible, so they are strapped to this test board, spread out where debugging and test probes can reach all of the circuitry. Complete details on all of these boards & devices are available on the author's web site. Once the reader has seen the robot in fig.12, and compared the size / volume of the components mounted on the development board, the scope of building this robot should become apparent.

Future Steps:

Figure 11.5: Convolution Kernel Def.

Current / on-going work is centered on FPGA algorithmic implementation of Laplacian edge finding techniques using horizontal & vertical convolution kernels. This work is developmentally challenged, and as such is still listed in the future steps section. Given the importance of the nature of this work and its broad reaching application, much work has been devoted to this in industry and institutes of higher education as well. A quick google search will turn up a plethora of further reading on the subject. Given that the implementation mathematics are relatively simple, the real challenge to this work is that of data flow scheduling to assure that the right data is accessible in internal FPGA RAM at the time of processing. Current work involves FIFO correlation for previous and past data lines for kernel application. Where the author's work departs from the classical Laplacian edge detection application, is through the use of the color closeness idea presented earlier in the article. Rather than reducing the image to black & white picture elements, applying the kernel to a presolved field of "color closeness" elements will make use of the additional information affored by the color in the image.

In addition to partial differential kernel convolution, playing with patterned kernel convolution using eight (9x9) weighted matrixes is a work in progress as well. As if that is not enough there is ongoing work exploring opening and closing functions (dilatation + erosion). Using iterative application of opening and closing functions with simple kernel maps "+" shape I anticipate being able to track multiple independent blobs, including edge isolation not just weighted averages. As of this writing there are no definitive results to report on this work but it is proving to be realistically implementable in fabric. The only cheating implemented in the short term is using a processor support function to extract angle functions when re-combining the weighted pattern results. Eventually this work around will be moved to the FPGA as well.

Figure 11.6: Parallax Based Range Eq.

Figure 11.7: ROM Look-up Template

module crc_rom(
        input wire [7:0] addr,
        output reg [15:0] data
        );
always @(addr) 
begin 
    case (addr) 
        8'h00 : data = 16'h0000;
        8'h01 : data = 16'hc0c1;
        8'h02 : data = 16'hc181;
        ...
    endcase 
end 
endmodule 
Another algorithmic implementation that deserves review is that of a structured light segmentation applied to the linear laser range finder output data buffer, again computationally offloaded to the FPGA. Work on this topic has not been started but, as a future step, promises great rewards to the author's robots if implemented successfully.

As discussed in the author's previous article on laser range finding the digital returns from the laser range finding portion of the system are governed by the equation, right. Given the FPGA implementation, a distributed ROM look up function to linearize the data returns from the system becomes exceedingly easy. Each of these steps removes just one more bit of processing requirements or code space if a look up tables is used, in the controlling u-Processor. The Verilog implementation is given below in fig.11.7.

To use this piece, instantiate it in the same laser_ranging.v module as the block RAM. The data output of the dual ported block RAM becomes the address input to the ROM module. The data output of this module is then fed back to the u-Processor bus interface as the output of the system. Once the size of the ROM look-up table reaches a certain set point, the Xilinx tools will automatically recognize it and extract it into a pre-configured block-RAM resource. If it remains small, it will be distributed across RAM cells in the logic fabric itself. The difference between the two implementations being signal delay due to routing time, which has an effect on maximum system clock speed.


References:

None of the references below directly involve working with video in an FPGA. Rather, the Verilog references detail the language and it's application to general problems. The knowledge learned through reading, can then be molded by the reader to apply the language to any problem.

Book: 'Digital Television Mpeg-1, MPEG2 and principles of the DVB system', H Benoit, Publisher: Arnold 1997.

Book: 'Verilog Designer's Library', Bob Zeidman / Prentice Hall 1999.

Book: 'Verilog HDL Synthesis, A Practical Primer', J. Bhasker / Star Galaxy Publishing 1998.

Book: 'Real World FPGA Design with Verilog', Ken Coffman / Prentice Hall PTR 1999.

Book: 'Verilog Digital Computer Design, Algorithms into Hardware', Mark Gordon Arnold / Prentice Hall PTR 1999.

Book: 'The Verilog Hardware Description Language, Fourth Edition', Donald E Thomas / Kluwer Academic Publishers 1998. <

Book: ** 'Advanced Digital Design with the Verilog HDL', Michael D. Ciletti, Prentice Hall 2002. ** Note: Of the above list, it is the author's personal opinion that this Verilog book is the best on the market.

Link: Xilinx Corporation: www.xilinx.com

Link: Free HDL source: www.opencores.org

Link: Author's home page: Max's Little Robot Shop

Link: Previous Encoder Article: Laser Range Finding

Link: I2C - iPort/AFM by MCC (makers of PC to I2C interfaces)

Wrap Up:

All in all the vision system in the photographs, above, required many hours on evenings and weekends, spread out over the better part of a year to fabricate, debug and test. One of the more complex tasks was that of board bring up. Debugging technical coding errors, and separating them form hardware errors is to say the least, a challenge. This was further compounded by the interface to the very complex processor board, on which this PCB rides piggyback, and the interactions between them.

As the level of technology used in some of these projects increases, highly detailed descriptions become more and more difficult / lengthy. Much of the difficulty is due to the layering of technology upon technology. At some point one must begin to cover only the higher layers burrowing down to the details only where necessary. The rest must come from counting on the knowledge of the reader to abstract the lower levels.

Many of the authors other articles wrap up with an invitation for the reader to jump right in and implement some of the discussed technologies. This one is no different, other than the following general thought. It is important to know one's own abilities. Since the technology presented here is somewhat advanced and takes special skills and tools to debug, the beginner may find themselves unable to implement and thus disappointed with the results. Intermediate - advanced users should take some time and play with this technology as the benefits to all aspects of mobile robotics are well worth it.

Figure 12.1: Author's Robot-"Dohn Joe"

Leveraging intellectual property building blocks such as those available form www.opencores.org from authors like Richard Herveille & John Clayton and others, is becoming a powerful tool. This is especially so for armature robotics enthusiasts as more and more of the high level Verilog and VHDL functions become available free of charge. A special thank you is in order for Richard, John and others who work in the HDL world and support open source projects. The author has written several of the basic functional blocks used in motion control and robotics and these are available, sprinkled throughout his web site.

The final image, fig.12.1, right, and the associated CAD model 12.2, below, is that of the Author's small robot named, 'Dohn Joe', for which this vision system has been designed. Although the cameras had not been mounted, as of the taking of this picture, the boards had already been test fitted, and the LCD can clearly be identified in the photo. It is very hard to see, but there is a black Delrin box directly behind the LCD in this photo. It is electrically shielded inside, and houses the 1.5kV back light inverter circuitry. Two of the cameras will be mounted in the lower left of the robot, looking forward in front of the wheels, just skimming the ground, while the third camera mounts to a pan-tilt mechanism on the top of the robot. Many of these pieces (camera mounts, pan-tilt head, etc) have already been fabricated, but as of the taking of this picture some months back, have yet to be integrated with the final robot.

For more details on the processor system, other code examples or to see the details on the construction progress of Dohn Joe come visit:

Max's Little Robot Shop

Kenneth Maxon


Figure 12.2: Robot CAD model

Postscript:

I have read a few texts on stereo imaging and a few others on different applications of object extraction from captured images. A few months after writing this article I picked up the following book that I highly recommend:

Book: 'Algorithms for Image Processing and Computer Vision', J.R. Parker / Wiley 1996

This book visits, in medium level detail, a large number of the topics and general overview of the work being applied to computer based image processing and object extraction / identification. One of the underlying themes that resonated back to the preparation of this article grows from the need for increased processing power and, where possible, off loading of ancillary tasks to other hardware to afford maximum possible application of processor throughput to vision / image tasks. The book does not address these topics in such a blunt manner however the mathematics and iterative nature of several of the algorithms discussed support a strong case. J.R. Parker's work is outstanding in its coverage of the broad level of the topic, and I was delighted in reading to see the correlation between topics presented and the applications I have planned for my mobile robotics development. Although not directly titled as such, several sections throughout this work have direct application in extracting multiple objects with varying lighting conditions, textures, unclear boundary conditions and other related complex image artifacts.