Nikkei Electronics Asia -- April 2009
Insights
FPGAs Used in Image-Based Driver Assistance System

E-Mail Article
Tweet This
Digg This
Share this with friends on Facebook
Buzz Up!
Apr 8, 2009 16:30 Nikkei Electronics Asia

Today's automobiles are incorporating new driver assistance systems using cameras as the primary aid to the driver. First-generation features in production today include lane departure warning (LDW) systems, rear-view with parking aid assist, and night vision systems. Second-generation systems under development include processing of several cameras for multiple in-car uses.

System developers face many challenges in designing these systems. Using the latest algorithms to successfully accomplish design requirements takes a huge amount of processing capabilities and power constraints, which limit the options available. A high level of scalability is necessary to address a wide range of requirements from the most basic systems with one camera input, to complex ones, such as a birds-eye and surround view, that integrate up to four camera inputs. These systems use a wide variety of semiconductor devices, including ASSPs, DSPs, and FPGAs or a combination of these, resulting in complex implementations that are difficult to maintain.

LDW Algorithm

The LDW software from Elektrobit Automotive GmbH, which is based on the PReVENT SAFELANE European project, features C++ floating point source code that is not optimized for embedded implementations. The algorithm (Fig 1) processes the video by extracting relevant measurement points, identifying lane candidates, and filtering results using information from previous frames. This results in the system identifying the position of the car within the lane. An in-car warning can then be applied based on previously calculated data.

The driver assistance development environment is based on the Platform ASSP Replacement Infotainment System (PARIS-1) development platform. The PARIS-1 platform consists of a Stratix II FPGA module and a motherboard.

The FPGA module contains two DDR2 memory chips, Flash memory, PHY devices for USB and Ethernet, and an EXM32 connector to the motherboard. The motherboard has different bus and multimedia interfaces that can be accessed by the module. The PARIS-1 platform features a hard disk drive (HDD) interface, two thin-film transistor (TFT) LCD connections (one for the driver assistance subsystem and one for the streaming subsystem) and a touchscreen interface.

FPGA Approach

Driver assistance projects start with developing algorithms and using test videos to evaluate the performance of video processing. The development environment streams real-time video recordings from the HDD and then applies the same video recordings for the algorithm validation stage on the final embedded application.

The FPGA design is divided into two separate entities, the streaming subsystem and the driver assistance processing subsystem, with each subsystem having an independent DDR2 memory interface. In the FPGA implementation, the two work independently, thus the load of one subsystem does not impact the performance of the other. One requirement is that the streaming subsystem must provide input test data at a sufficient rate to the driver assistance subsystem. In the final application, the streaming subsystem is replaced by camera inputs.

This streaming subsystem has a configurable touchscreen. The HDD interface and software driver integrated in the file system that can stream up to four uncompressed color VGA video channels to the FPGA (from the HDD to the driver assistance subsystem). 

The driver assistance processing subsystem is built around a Nios II soft-core processor with a performance counter for code profiling, a message buffer to interface with the streaming subsystem, and an LCD controller to display the frame buffer contents.

The processing code is instantiated within the software template. After a profiling stage, the portion of the code requiring hardware acceleration can be identified. The following two methodologies define code acceleration based on a simple observation of the algorithm: 1) If the algorithm is the processing pixels of the input video stream consecutively, it can be implemented as a front-end SOPC Builder component. The input from such a component is a video streaming interface, and the output data is stored in the frame buffer of the preprocessed video data; 2) If the video processing requires random access in each video frame received, it can be implemented in a process that reads input data from a frame buffer and writes the results to another buffer. This type of component is referred to as a back-end processing SOPC Builder component.

As these two methodologies are not exclusive, therefore designs can be implemented with different parts of a solution using one or both techniques. To integrate the LDW C++ code within the LDW algorithm into the development environment, the following steps should be performed:

* Remove the PC environment portion of the code that handles input data reading and post processing display settings;
* Integrate the processing portion of the code in the new environment by taking input from a frame buffer and storing the output to another frame buffer;
* Configure the synchronization between the streaming subsystem and the driver assistance subsystem;
* Convert unnecessary floating-point code to fixed-point code. Fixed-point operations are implemented more efficiently than floating-point code.

When this set-up and profiling activity are completed, the measurement points generation stage (taking 70% of the cycles) is identified as a possible front-end SOPC Builder component implementation.

Implementation Methodology

The code implemented and accelerated in the driver assistance development environment may come from multiple sources. It can be embedded code that targets a different device or high-level code in a PC environment. In either case, the implementation methodology is similar (Fig 2). The main task is to reduce the code to a standard C or C++ function that processes one frame buffer and outputs the result in another. When this step is complete, the function can be added into the provided driver assistance software template and is then ready to be optimized using the following techniques and tools.

Tools Review

The SOPC Builder can be used to build the two subsystems. The system is configured with a GUI and different SOPC Builder components (DDR2 SDRAM controller, Nios II processor, and performance counter) are instantiated around the Avalon memory-mapped bus. The SOPC Builder provides files to the Quartus II software for the FPGA configuration elaboration and to the Nios II software development tools to build a specific board support package (BSP) library.

The Nios II tools provide software drivers for the different SOPC Builder components instantiated in the system, and support systems with multiple Nios II processors. Therefore, only application-specific code will be developed. 

For driver assistance, the focus is on solutions that can generate SOPC Builder components. The DSP Builder software allows the developer to design SOPC Builder components using the industry-standard MATLAB and Simulink tools.

C2H compiler is integrated into the Nios II software development tools to generate hardware acceleration SOPC Builder components directly from functions within the C code.

Basic System Optimization

Front-end processing SOPC Builder components perform operations on the video stream as they are received. Such an implementation frees the CPU from performing repetitive arithmetic operations and removes the previous delay associated with the time the CPU takes to perform these operations.

A simple multiply accumulate (MAC) operation is performed on every pixel of an image of size N pixels. For a processor to process one MAC operation per clock-cycle, the delay is the number of pixels to process per frame: N cycles (i.e. 307,200 cycles for VGA). If a front-end SOPC Builder component implements this MAC operation in one cycle, the delay added to the video input is only one clock-cycle.

The LDW algorithm starts with a measurement point generation stage. To simplify the analysis, this stage is referred to as edge detection. The edge detection is a linear process in the flow, which goes through each frame pixel-by-pixel.

Repetitive operations are performed to retrieve edge information. The FPGA-embedded solution uses a front-end block to implement this function. The DSP Builder implementation of this front-end block exchanges 70% of the CPU load for a logic cost of approximately 2,500 logic elements (LE) and 40 9 x 9 DSP elements.

Back-end processing SOPC Builder components perform operations randomly on the video or a set of data. When using integrated master ports and optimized hardware implementations for specific functionality, these components process data faster than a traditional processor implementation. However, as back-end SOPC Builder components do not process the data on the input flow, the processing operation still introduces a delay. To realize the most benefit from the hardware accelerator, one can modify the code to execute tasks in parallel. With this code modification, instead of waiting for the hardware accelerator to complete its task, the processor can simultaneously process another portion of the algorithm. Parallelization can also be achieved by pipelining scheduling techniques when the remaining portion of the algorithm is dependent on the data provided by the hardware accelerator.

In this application the LDW does not need back-end acceleration, as the performance satisfies the requirements. But using this methodology, fish-eye correction, another algorithm commonly found in video-based driver assistance solutions can be implemented.

FPGAs are ideal for driver assistance applications. One set of development tools support all device families, offering unique scalability possibilities. By using parallel implementation techniques, instead of increasing the system operating frequency, processing performance can be gained without compromising power consumption.

byYann Le Henaff, Member of Technical Staff, Automotive System Solutions, Altera Corp