Embedded Vision Application – Design approach for real time classifiers

Overview of classification technique

Object detection/classification is a supervised learning process in machine vision to recognize patterns or objects from data or image. It is a major component in Advanced Driver Assistance Systems (ADAS) as it is used commonly to detect pedestrians, vehicles, traffic signs etc.
Offline classifier training process fetches sets of selected data/images containing objects of interest, extract features out of this input and maps them to corresponding labelled classes to generate a classification model. Real time inputs are categorized based on the pre-trained classification model in an online process which finally decides whether the object is present or not.
Feature extraction uses many techniques like Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP) features extracted using integral image etc. Classifier uses sliding window, raster scanning or dense scanning approach to operate on an extracted feature image. Multiple image pyramids are used (as shown in part A of Figure 1) to detect objects with different sizes.

Computational complexity of classification for real-time applications

Dense pyramid image structures with 30-40 scales are also being used for getting high detection accuracy where each pyramid scale can have single or multiple levels of features depending upon feature extraction method used. Classifiers like AdaBoost, use random data fetching from various data points located in a pyramid image scale. Double or single precision floating points are used for higher accuracy requirements with computationally intensive operations. Also, significantly higher number of control codes are used as part of classification process at various levels. These computational complexities make classifier a complex module to be designed efficiently to achieve real-time performance in critical embedded systems, such as those used in ADAS applications.
Consider a typical classification technique such as AdaBoost (adaptive boosting algorithm) which does not use all the features extracted from sliding window. That makes it computationally less expensive compared to a classifier like SVM which uses all the features extracted from sliding windows in a pyramid image scale. Features are of fixed length in most of the feature extraction techniques like HOG, Gradient images, LBP etc. In case of HOG, features contain many levels of orientation bins. So each pyramid image scale can have multiple levels of orientation bins and these levels can be computed in any order as shown in part B and C of Figure 1.
pyramid image
Figure 1: Pyramid image scales with multiple orientation bins levels

Design constraints for porting classifier to ADAS SoC

Object classification is generally categorized as a high level vision processing use case as it operates on extracted features generated by low and mid-level vision processing. It requires more control codes by nature as it involves comparison process at various levels. Also, as mentioned earlier it involves precision at double/float level. These computational characteristics depict classification as a problem for DSP rather than a problem for vector processors which has more parallel data processing power or SIMD operations.
Typical ADAS processors, such as Texas Instruments’ TDA2x/TDA3x SoC, incorporate multiple engines/processors targeted for high, mid and low level vision processing. TMS320C66x DSP in TDA2x SoC has fixed and floating-point operation support, with maximum 8-way VLIW to issue up to 8 new operations every cycle, with SIMD operations for fixed point and fully-pipelined instructions. It has support for up to 32, 8-bit or 16-bit multiplies per cycle, up to eight, 32-bit multiplies per cycle. EVE processor of TDA2x has 512-bit Vector Coprocessor (VCOP) with built-in mechanisms and vision-specialized instructions for concurrent, low-overhead processing. There are three parallel flat memory interfaces each with 256-bit load-store memory bandwidth providing a combined 768-bit wide memory bandwidth. Efficient management of load/store bandwidth, internal memory, software pipeline, integer precisions are major design constraints for achieving maximum throughput from these processors.
Classifier framework can be redesigned/modified to adapt to vector processing requirements thereby processing more data in one instruction or achieving more SIMD operations.

Addressing major design constraints in porting classifier to ADAS SoC

Load/Store bandwidth management

Each pyramid scale can be rearranged to meet limited internal memory. Functional modules and regions can be selected and arranged appropriately to limit DDR load/store bandwidth to the required level.

Efficient utilization of limited internal memory and cache

Image can be processed on optimum sizes and memory requirements should fit into hardware buffers for efficient utilization of memory and computation resources.

Software pipeline design for achieving maximum throughput

Some of the techniques that can be used to achieve maximum throughput from software pipelining are mentioned below.
  • Loop structure and its nested levels should fit into the hardware loop buffer requirements. For example C66x DSP in TDA3x has restrictions over its SPLOOP buffer such as initiation interval should be
  • Unaligned memory loads/store should be avoided as it is computationally expensive and its computation cycles are twice compared to aligned memory loads/stores in most of the cases.
  • Data can be arranged or compiler directives and options can be set to get maximum SIMD load, store and compute operations.
  • Double precision operations can be converted to floating point or fixed point representations but retraining of offline classifier should be done upon these precision changes.
  • Inner most loop can be made simple without much control codes to avoid register spilling and register pressure issues.
  • Division operations can be avoided with corresponding table loop up multiplication or inverse multiplication operations.

Conclusions

Classification for real-time embedded vision applications is a difficult computational problem due to its dense data processing requirements, floating point precision requirements, multilevel control codes and data fetching requirements. These computational complexities involved in classifier design limits its vector processing power significantly. But classifier framework on a target platform can be redesigned/modified to leverage platform architecture vector processing capability by efficiently utilizing techniques such as load/store bandwidth management, internal memory and cache management and software pipeline design.

References:

TDA2X, A SOC OPTIMIZED FOR ADVANCED DRIVER ASSISTANCE SYSTEMS Dr. Jagadeesh Sankaran, Senior Member Technical Staff, Texas Instruments Incorporated. Dr. Nikolic Zoran, Member Group Technical Staff, Texas Instruments Incorporated. 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)
Pathpartner author Sudheesh
Sudheesh TV
Technical Lead
Pathpartner author Anshuman
Anshuman S Gauriar
Technical Lead

How to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support

This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
If your Real time Image processing applications like Driver Monitoring system on SoC FPGA’s are dependent on Open CV, you have to develop Open CV build environment on the target board. This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
In order to start building the first Linux distribution for the Altera platform, you must first install the necessary libraries and packages. Follow the Initialization steps for setting up the PC host
The required packages to be installed for Ubuntu 12.04 are
$ sudo apt-get update $ sudo apt-get upgrade $ sudo apt-get install sed wget cvs subversion git-core coreutils unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk python-pysqlite2 diffstat help2man make gcc build-essential g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev mercurial autoconf automake groff libtool xterm *

IMPORTANT:

Please note that * indicates that the command is one continuous line of text. Make sure that the command is in one line when you are pasting it.
If the host machine runs 64 bit version of the OS, then you need to install the following additional packages:
$ sudo apt-get install ia32-libs
On Ubuntu 12.04 you will also need to make /bin/sh point to bash instead of dash. You can accomplish this by running: and selecting ‘No’.
sudo dpkg-reconfigure dash
Alternatively you can run:
sudo ln -sf bash /bin/sh
However, this is not recommended, as it may get undone by Ubuntu software updates.

Angstrom Buildsystem for Altera SOC: (Linux OS)

Download the scripts needed to start building the Linux OS. You can download the scripts for the angstrom build system from https://github.com/altera-opensource/angstrom-socfpga/tree/angstrom-v2014.12-socfpga
Unzip the files to angstrom-socfpga folder:
$ unzip angstrom-socfpga-angstrom-v2014.12-socfpga.zip –d angstrom-socfpga $ cd angstrom-socfpga
These are the setup scripts for the Angstrom buildsystem. If you want to (re)build packages or images for Angstrom, this is the thing to use.
The Angstrom buildsystem uses various components from the Yocto Project, most importantly the Openembedded buildsystem, the bitbake task executor and various application/ BSP layers.
Navigate to the source folder, and comment the following line in the layer.txt file
$ cd source $ gedit layers.txt & meta-kd4,https://github.com/Angstrom-distribution/metakde. git,master,f45abfd4dd87b0132a2565499392d49f465d847 * $ cd .. (Navigate back to the head folder)
To configure the scripts and download the build metadata
$ MACHINE=socfpga_cyclone5 ./oebb.sh config socfpga_cyclone5
After the build metadata, you can download the meta-kde4 from the below link and place it in the sources folder, as this was earlier disabled in the layers.txt file http://layers.openembedded.org/layerindex/branch/master/layer/meta-kde4/
Source the environment file and use the below commands to start a build of the kernel/bootlaoder/rootfs:
$. /environment-angstrom $ MACHINE=cyclone5 bitbake virtual/kernel virtual/bootloader console-image
Depending on the type of machine used, this will take a few hours to build. After the build is completed the images can be found in:
Angstrom-socfpga/deploy/cyclone5/
After the build is completed, you will find the u-boot, dtb, rootfs and kernel image files in the above folder.

Adding the OPENCV Image to the rootfs:

To add OpenCV to the console image(rootfs) we need to modify the local.conf file in the conf folder:
$ cd ~/angstrom-socfpga/conf $ gedit local.conf &
In the local.conf file navigate to the bottom of the file and add the following lines and save the file:
IMAGE_INSTALL += “ opencv opencv-samples opencv-dev opencv-apps opencv-samples-dev opencv-static-dev “
Then build the console image again using the following command:
$ cd .. $ MACHINE=cyclone5 bitbake console-image
After the image is built the rootfs will contain all necessary OpenCV libs for development and running opencv based applications.

Enabling Camera Drivers in the Kernel :

The Linux Kernal v3.10 has an in built UCV camera driver which supports a large number of USB cameras. In order to enable it, you need to configure the kernel using the menuconfig option:
$ MACHINE=cyclone5 bitbake virtual/kernel –c menuconfig
The above command opens a config menu window. From the menuconfig window enable the following to enable UVC:
Device Drivers Multimedia support Media USB Adapters [*] USB Video Class [*] UVC input events device support [*]
Save and exit the config menu then execute the following command:
$ MACHINE=cyclone5 bitbake virtual/kernel
The new kernel will be build with the UVC camera drivers enabled and will be available in the /deploy/cyclone5 folder.
For the camera to work, the coherent pool must be set to 4M, this can be done as follows:-

U-Boot Environment Variables

Boot the board, pressing any key to stop at U-Boot console. The messages dispayed on the console will look similar to the following listing:
U-Boot SPL 2013.01.01 (Jan 31 2014 – 13:18:04) BOARD: Altera SOCFPGA Cyclone V Board SDRAM: Initializing MMR registers SDRAM: Calibrating PHY SEQ.C: Preparing to start memory calibration SEQ.C: CALIBRATION PASSED ALTERA DWMMC: 0   U-Boot 2013.01.01 (Nov 04 2013 – 23:53:26)   CPU : Altera SOCFPGA Platform BOARD: Altera SOCFPGA Cyclone V Board DRAM: 1 GiB MMC: ALTERA DWMMC: 0 In: serial Out: serial Err: serial Net: mii0 Warning: failed to set MAC address   Hit any key to stop autoboot: 0 SOCFPGA_CYCLONE5 #

Configuration of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #setenv bootargs console=ttyS0,115200 vmalloc=16M coherent_pool=4M root=${mmcroot} rw rootwait;bootz ${loadaddr} – ${fdtaddr} *

Save of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #saveenv

Boot Kernel

SOCFPGA_CYCLONE5 #boot
Following all the above guidelines, you should be able to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support. This build was successfully implemented on Altera Cyclone V SoC.
Pathpartner author Idris Tarwala
Idris Iqbal Tarwala
Sr. VLSI Design Engineer