Parallelizing the HEVC Encoder

High Efficiency Video Codec is the next generation video codec which promises to cut the bit-rate in half when compared to the H264/AVC codec. HEVC has bigger block sizes, more prediction modes and a whole lot of other algorithms to achieve the said target. However, this comes with substantially higher costs in terms of computational power requirement. HEVC, as it stands today, needs about 8 times more computational power to deliver twice the compression ratio.
Keeping pace with Moore’s law, the general purpose processors have been cramming more logic in lesser space, thanks to the evolving manufacturing processes that are adding more and more computational power. This computational power is available in the form of increased cores per processor. So the entire focus of HEVC encoders is not just having the fastest algorithm with the best quality, but also on achieving the one which can be executed in parallel on multiple cores with a minimal penalty on the quality. Server grade processors have been cramming more than 18-cores into a single socket, increasing the importance of system design in HEVC encoders. This provides a strong new impetus to transform all algorithms and code-flows to maximally utilize the entire available computational power across all the cores.

So how does HEVC fare in a multicore scenario?

HEVC has included many tools which are parallel processing friendly. These multi-core friendly tools are discussed in detail in this whitepaper:
https://www.pathpartnertech.com/wp-content/uploads/2014/11/PathPartner_WhitePaper_Analysis-Of-HEVC-Parallel-Tools-1.pdf
Parallelizing HEVC encoder using slices and tiles is very simple. An input frame is divided into a number of slices, equivalent to the number of cores or threads required to process. This results in completion of the best possible multicore encoder in the shortest duration with a very good multicore scaling factor.
Scaling factor on a multicore system is the speedup achieved over a single core, by using multiple cores for the same job. If a job takes 1s on a single core but takes only 0.5s on an N-core system, the scaling factor in such case is 2. If the N-core is a dual core system, an ideal scaling of 100% is achieved. But if it is a quad core system, only 50% of scaling is achieved. It is almost impossible to achieve ideal scaling unless the job being done contains further independent jobs.
slice-tile
Figure 1 : Slices and Tiles
An HEVC frame, when partitioned into slices and tiles and encoded on different cores, should have an ideal scaling because each slice and tile is independent of each other. But due to the total complexity of the blocks being encoded in the slice or tile, this cannot be achieved. Each slice or tile varies in complexity, and hence, different cores take different amounts of time to encode them. The amount of time that the threads wait after their tasks are done is inversely proportional to the complexity of the slice or tile they encode. And the wait time too is inversely proportional to the scaling factor; i.e., the more the core waits, the less is the scaling. This is further aggravated by the fast algorithms that are present in encoders which let them predict the encoded modes accurately.
Also, by encoding with slices or tiles the frame basically develops into a collection of segments of independently encoded streams which have no interlink between them. This will have a large impact on visual quality with visible compression artefacts at the edges of slices or tiles. These artefacts can be partially avoided by applying de-blocking and the new SAO filters across the slices or tiles. But when encoding a high motion sequence with a challenging bitrate, the artefacts will definitely be noticeable.
The challenge in multicore HEVC encoding is always achieving the best possible scaling while sacrificing the least possible video quality. Performance and quality measures of a video encoder are always in battle with each other, and with multicore the battle gets more ammunition. But diplomacy, which is, Wavefront Parallel Processing (WPP) tries to keep peace to a certain extent.

WPP

Wavefront Parallel Processing
Figure 2 : Wavefront Parallel Processing
One of the major serial processing block in any video encoder is the block by block arithmetic coding of the signals and transform coefficients in a raster scan order. Again with slices and tiles, this can be parallelized but with a penalty on the bits taken to encode. With Wavefront Parallel Processing or entropy sync, the arithmetic coding is parallelized with a catch. The bottom row takes the seed context from its top right neighbor before starting its row encoding. This results in a penalty which is lesser than slices or tile encoding.
The same approach to parallelize the mode decision encoder is taken where each CTU block waits for the completion of its top right neighbor.
Parallel row HEVC encoding
Figure 3 : Parallel row HEVC encoding
The above figure shows a quad core encoder, whose pipeline is built up and is operating in a steady state. Before starting the encoding of each CTU, the completion of top right is checked and since the top row is ahead of the current row, the check is almost always positive. This design preserves the data dependencies present in the original encoder and gives the best possible performance while sacrificing least amount of quality. This design has an overhead of pipe up and pipe down stages, but when encoding a huge number of CTUs, they can be neglected. Also, there will be small scaling losses due to differences in CTU encoding times.
The multicore parallelizing schemes presented here scale up with more cores and bigger input resolutions, but with a drop in scaling efficiency. Having cores on more than one socket, the locality of the data used by the encoder contributes a major chunk in the encoder performance. PathPartner’s multicore HEVC encoder is NUMA optimized to achieve best possible scaling on systems with more than one socket.
Pathpartner author Shashikantha
Shashikantha Srinivas
Technical Lead

Microphone Array Beamforming

microphone array system – Beamforming microphone
Figure 1: Typical conference scenario using Microphone Array Beamformer
Beamforming, also known as Spatial Filtering, is a signal processing technique used in case of microphone array processing. Beamforming exploits spatial diversity of the microphones in the array to detect and extract desired source signals and suppress unwanted interference. Beamforming is used to direct and steer the composite microphones’ directivity beam in a particular direction based on the signal source direction. This technique helps to boost the composite range of microphone and increases the signal-to-noise (SNR) ratio. Figure 1 shows how Beamformed microphone array picks signal source of interest. In simple terms, the signals captured by multiple microphones are combined to produce a stronger one. To sum up and list, the advantages of Beamforming are –
  • Microphone array Beamforming provides high quality and intelligible signal from desired source location while attenuating the interfering noise.
  • Beamforming avoids having head-mounted or desk-stand microphones locally per speaker and also avoids any physical movement by both speaker and microphones.
  • Compared to single Directional microphone, microphone array Beamforming enables us to locate and track signal source.
The above advantages render Beamforming very helpful in the field of underwater acoustics, ultrasound diagnostics, seismology, radio communications, etc. However in this particular article, we will talk about Speech enhancement using microphone array beamforming.
Beamformer picking signals of interest
Figure 2: Beamformer picking signals of interest. Noise sources like projector, windows, etc. are not picked by Beamformer
In case of microphone array, there is interference between signals captured by each microphone. The microphones are placed in such a way that the resultant pattern from all the microphones allows signal energy from a particular direction. This is achieved by combining signals from one direction to undergo constructive interference and destructive interference for all other directions. This is the fundamental principle of Beamforming. Needless to say, the Beamforming combines the effect of microphone array to simulate a directional microphone whose directivity pattern can be steered dynamically based on desired signal source.
There are some products where multiple cardioid microphones are used to pick signals from different desired directions. These also provide an option to tune the directions from which we can suppress the signals, like door, window or projector, which are possible noise sources. More details here.
Following are 2 major types of Beamforming techniques –
  1. Fixed Beamformers
  2. Adaptive Beamformers
Fixed Beamformers – is the category of Beamforming techniques where signal source and noise source location is fixed with respect to microphone array. Delay-and-Sum, Filter-and-Sum, Weighted-Sum are some of the examples of Fixed Beamformers.
Adaptive Beamformers – is the category of Beamforming techniques where signal source and noise source can be moving. Making this technique useful for many applications where the Beamformer needs to adapt itself and steer in the direction of signal of interest and attenuate noise from other directions. Generalised Sidelobe Canceller (GSC), Linearly Constrained Minimum Variance (LCMV, Frost) and In situ Calibrated Microphone Array (ICMA) are some of the examples of Adaptive Beamformers.
The optimal Beamforming technique to use in a particular application largely depends on the spectral content of source signal and background noise. However in practice, this assumed information can change and in some cases the information is not available. So there is always a trade-off based on the Beamforming technique used.

Applications of Beamforming

Applications of Beamforming are immense. Beamforming was initially used in military radar applications. Beamforming is now also used to capture high quality and intelligible audio signal with advances in microprocessors and digital signal processing. Beamformed signal can be used in Speech processing, Hearing aid devices, Sonar, Radar, Biomedical, Direction of signal arrival detection and aiming a camera or set of cameras towards the speakers in a video conference and Acoustic surveillance applications.
PathPartner has ported, tuned and optimized Beamforming for real-time applications like Conference speakerphones with microphone array. Challenges are immense in case of conference speakerphones in acoustic environment. Apart from microphones placement in array; wide-band speech, echo, multiple speakers, moving signal source localization and tracking are some of the major challenges to be considered. Since Beamforming focuses on a particular direction of interest, the Acoustic Echo Canceller (AEC) part of the speakerphone benefits by getting majorly signal of interest, excluding the echo paths out of the Beamforming direction as shown in figure 3.
Application of Beamforming
Figure 3: Application of Beamforming along with AEC in Conference Speakerphones
Noise creeps in from all direction and can be easily removed from the signal of interest from a particular direction, thereby increasing the Signal-to-Noise ratio (SNR). Voice Activity Detector (VAD) is one of the modules which benefits from the increased SNR to a large extent.
We have designed and implemented all modules in the speech processing chain as part of the conference speakerphone. Beamformer, Acoustic Echo Canceller, Double Talk Detector, Noise Reduction, Voice Activity Detector, Non-Linear Processing, Comfort Noise Generator adapting to background noise and Automatic Gain Controller are some of the modules readily available working in real-time on Sharc DSP, Hexagon DSP, ARM and x86.
We have expertise in wide variety of other platforms for porting and optimization. Please contact sales@pathpartnertech.com for queries.
Pathpartner author Mahantesh
Mahantesh Belakhindi
Technical Lead
Pathpartner author Sravan
Sravan Kumar J
Software Engineer

Unconventional way of speeding up of Linux Boot Time

Fast boot is a requirement on most embedded devices. For good user experience it is important that a hand-held camera or a phone becomes usable seconds after switching it on. Apart from user experience, fastboot time also becomes a non-negotiable hard requirement for some applications. For example, a network camera or a car rear view camera must be accessible as soon as possible after power up.
A brief introduction to Linux boot process:
The boot time for a device consists of:
  • Time taken by bootloader
  • Time taken by Linux kernel initialization up to the launch of first user space process
  • Time required for user space initialization, including launching all required daemons, running all Linux init scripts
Boot loader and Linux kernel init time can be brought down to less than 1 second on modern embedded platforms. Some standard techniques for this have been documented here.
In most boot time optimization problems, more than 80% of the boot time is taken by user space. Optimization of user space on the other hand cannot be standardized, because the user space varies from application to application.
The first user space process that runs on Linux startup is called ‘init’. The init process should take care of all user space initialization required to boot up the system. This includes:
  • Mounting all filesystems
  • Creating device nodes
  • Starting all daemons
  • Performing other functions specific to the use case, like loading of firmware, starting of userspace programs
Reducing user space boot time therefore means optimizing this initialization process. The standard init process for Linux systems is Sysvinit (or busybox init for userspace systems)

Sysvinit:

Both Sysvinit and busybox init read /etc/inittab to find out what needs to be done at boot up. The first line of the init tab file defines the default ‘run level’ of the system. The default process executed by sysvinit is /etc/init.d/rcS, which in turn calls /etc/init.d/rc . The ‘rc’ script basically executes all scripts in /etc/init.d/rcN folder in order (where N is the run level).
Figure2
Figure 1: sysvinit init scripts
 
Figure1
Figure 2 : inittab file
 
The shortcomings of Linux Sysvinit are therefore apparent:
  • The order and dependencies of init scripts must be carefully assigned. As a rule, S01 will be executed before S02 and so on. Listing out the services and enforcing dependency has to be done manually, making the process tedious and error prone.
  • The init.d scripts are run in serial order. This leaves CPU resources underutilized and increases the system boot time

Systemd:

Linux Systemd is an alternative to Sysvinit. Systemd replaces the concept of runlevels in Sysvinit with ‘targets’. A target is a set of Systemd ‘units’. The Systemd units can be roughly understood as ‘services’. Each service is a file which consists of statements that define its behavior, for example, the process it starts, its dependencies, and many other parameters. The dependencies between services can be enforced by means of the ‘Wants’ and ‘Required’ sections. All the services that a service depends on should be listed under ‘Wants’ (good to have) or ‘Requires’ (necessary to have).
Figure3
Figure 3: example Systemd service
 
Systemd has a set of pre-defined targets that get started at system boot. The services that are need at boot up should be linked under the directory ‘/etc/systemd/system/default.target.wants’.
Systemd starts by looking at the basic.target and launches all services required for that target, following the dependency chain. All services required by a particular service are started before that service is started. All services that are not dependent on each other are started in parallel. Thus, Systemd achieves aggressive parallelization during boot up.
Figure4
Figure 4: Adding custom Systemd service to boot target
 
Figure5
Figure 5: Complete list of systemd services and dependencies on a system
 
Systemd reduces boot time in two ways:
  • By way of correct configuration of dependencies and targets, it is possible to do only required init for boot, leaving other activities for after boot. For example, instead of mounting all file systems, we can configure the boot target to be dependent on only root file system. In this way the boot time target will start without waiting for other filesystem mounts. Since filesystem mounting usually takes a long time, this will speed up the boot time. This way, it is very easy to control what gets started at boot up, and also track dependencies in a centralized way.
  • Systemd launches services in parallel unless a serialized dependency is enforced. This speeds up boot process significantly, leading to better CPU utilization during boot.
Thus Systemd optimizes boot time, by allowing fine grained and easy control over dependencies, and then by starting services in parallel. It comes with additional advantages of fine control of service dependencies, cleanly handling service behaviour in case of failure (restart vs fail), handles logging and system watchdog.
Systemd also comes with tools like Systemd-blame and Systemd-analyze that give a pictorial representation of how services were started. These tools make it quite convenient to know what is happening and optimize things accordingly.
It is easy to bring down user space boot time to less than 4 or 5 seconds on an embedded device by use of Systemd. Further optimizations can be squeezed out on a case by case basis. We have achieved a total boot time of 3 seconds on ARM running at 1 GHz,
Systemd is available for download here : https://www.freedesktop.org/software/systemd/ and can be cross-compiled for embedded platforms.
This article provides methodologies for improving boot time in PathPartner solutions which includes complete boot time optimization (bootloader, kernel parameters and userspace) on all embedded platforms.
Pathpartner author Komal
Komal Jaysukh Padia
Technical Lead

Blind Source Separation for Cocktail Party Problem

Blind Source Separation – BSS, as the name suggests, aims to extract original unknown source signals from the mixed ones. This is done by calculating an approximate mixing function using only the available observed mixed signals. “Blind” here means that the mixing function of signals which are recorded by microphones is unknown. BSS techniques do not require any prior knowledge about the mixing function or source signals and do not require any training data.
BSS block diagram
Figure 1: BSS block diagram
To understand BSS further, let us think of a scenario where a person is at a party. There are various disturbances like loud music, people screaming, and a lot of hustle-bustle. Yet, he/she is able to have a conversation with another person by filtering out the unnecessary signals and noise. In similar way, BSS provides a solution by separating and extracting the desired signals from a mixture of unknown signals, which is quiet similar in function to the human ear. This is what we term as “Cocktail Party Effect”.
Cocktail party scene
Image 1: Cocktail party scene
Signal un-mixing using BSS
Figure 2: Signal un-mixing using BSS
It all started around early 80s during which researchers formulated BSS in the framework of neural modelling. It was later adopted in digital signal processing and communications. BSS was initially developed and experimented for linear mixtures of signals and later it evolved for non-linear and convolutive mixtures.
The underlying principle for BSS is based on linear noise-free systems theory according to which, a system with multiple inputs (speech sources) and multiple outputs (microphones or sensors) can be inverted under reasonable assumptions with appropriately chosen filters. (Say convolutive BSS filters in case of convolutive mixtures)
Typical BSS process flow
Figure 3: Typical BSS process flow
As shown in above figure, BSS mainly involves 2 steps.
  • System Identification – where the filter coefficients of the mixing process are calculated, also called Mixing matrix factorization.
  • Separation or Un-mixing – where the sources are separated by filtering using coefficients calculated in step one.
Another consideration for BSS is to have at least as many number of microphones, compared to the number of signal sources under study for separation. If the number of microphones used is less than the number of signal sources, then separation task becomes difficult, if not possible. The paper “Single Microphone Source Separation Using High Resolution Signal Reconstruction” by Trausti Kristjansson, Hagai Attias & John Hershey, offers one such solution.
Blind source separation can exploit linear, temporal, spatial or sparsity properties of signal sources. Based on such properties, we have different approaches for BSS –
  • Principal Component Analysis
  • Independent Component Analysis
  • Spatio-Temporal Analysis
  • Sparse Component Analysis
Many such algorithms have been developed to solve BSS problem by assuming some properties of signal sources or mixing process, suitable for particular applications. When such assumptions are made, we call it “Semi-Blind Source Separation”.

Applications of BSS

BSS can be used for enhancing noisy speech in real world environments and the applications are not just limited to speech/audio processing but also used for image, astronomical, satellite and biomedical signal analysis. Double-talk (DT) issue in Acoustic Echo Cancellation (AEC) can be addressed using BSS algorithm, without using double-talk detector or step size controller.
BSS is also used in microphone array processing (multiple microphones), making it useful in diverse applications.

Pathpartner’s Expertise

– We have in-house BSS implementation for convolutive signal mixtures which is currently integrated with speech processing algorithms. The load on speech processing algorithms is eased as it receives only the signal of interest and hence better quality speech processing is achieved.
Pathpartner author Mahantesh
Mahantesh Belakhindi
Technical Lead

How to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support

This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
If your Real time Image processing applications like Driver Monitoring system on SoC FPGA’s are dependent on Open CV, you have to develop Open CV build environment on the target board. This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
In order to start building the first Linux distribution for the Altera platform, you must first install the necessary libraries and packages. Follow the Initialization steps for setting up the PC host
The required packages to be installed for Ubuntu 12.04 are
$ sudo apt-get update $ sudo apt-get upgrade $ sudo apt-get install sed wget cvs subversion git-core coreutils unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk python-pysqlite2 diffstat help2man make gcc build-essential g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev mercurial autoconf automake groff libtool xterm *

IMPORTANT:

Please note that * indicates that the command is one continuous line of text. Make sure that the command is in one line when you are pasting it.
If the host machine runs 64 bit version of the OS, then you need to install the following additional packages:
$ sudo apt-get install ia32-libs
On Ubuntu 12.04 you will also need to make /bin/sh point to bash instead of dash. You can accomplish this by running: and selecting ‘No’.
sudo dpkg-reconfigure dash
Alternatively you can run:
sudo ln -sf bash /bin/sh
However, this is not recommended, as it may get undone by Ubuntu software updates.

Angstrom Buildsystem for Altera SOC: (Linux OS)

Download the scripts needed to start building the Linux OS. You can download the scripts for the angstrom build system from https://github.com/altera-opensource/angstrom-socfpga/tree/angstrom-v2014.12-socfpga
Unzip the files to angstrom-socfpga folder:
$ unzip angstrom-socfpga-angstrom-v2014.12-socfpga.zip –d angstrom-socfpga $ cd angstrom-socfpga
These are the setup scripts for the Angstrom buildsystem. If you want to (re)build packages or images for Angstrom, this is the thing to use.
The Angstrom buildsystem uses various components from the Yocto Project, most importantly the Openembedded buildsystem, the bitbake task executor and various application/ BSP layers.
Navigate to the source folder, and comment the following line in the layer.txt file
$ cd source $ gedit layers.txt & meta-kd4,https://github.com/Angstrom-distribution/metakde. git,master,f45abfd4dd87b0132a2565499392d49f465d847 * $ cd .. (Navigate back to the head folder)
To configure the scripts and download the build metadata
$ MACHINE=socfpga_cyclone5 ./oebb.sh config socfpga_cyclone5
After the build metadata, you can download the meta-kde4 from the below link and place it in the sources folder, as this was earlier disabled in the layers.txt file http://layers.openembedded.org/layerindex/branch/master/layer/meta-kde4/
Source the environment file and use the below commands to start a build of the kernel/bootlaoder/rootfs:
$. /environment-angstrom $ MACHINE=cyclone5 bitbake virtual/kernel virtual/bootloader console-image
Depending on the type of machine used, this will take a few hours to build. After the build is completed the images can be found in:
Angstrom-socfpga/deploy/cyclone5/
After the build is completed, you will find the u-boot, dtb, rootfs and kernel image files in the above folder.

Adding the OPENCV Image to the rootfs:

To add OpenCV to the console image(rootfs) we need to modify the local.conf file in the conf folder:
$ cd ~/angstrom-socfpga/conf $ gedit local.conf &
In the local.conf file navigate to the bottom of the file and add the following lines and save the file:
IMAGE_INSTALL += “ opencv opencv-samples opencv-dev opencv-apps opencv-samples-dev opencv-static-dev “
Then build the console image again using the following command:
$ cd .. $ MACHINE=cyclone5 bitbake console-image
After the image is built the rootfs will contain all necessary OpenCV libs for development and running opencv based applications.

Enabling Camera Drivers in the Kernel :

The Linux Kernal v3.10 has an in built UCV camera driver which supports a large number of USB cameras. In order to enable it, you need to configure the kernel using the menuconfig option:
$ MACHINE=cyclone5 bitbake virtual/kernel –c menuconfig
The above command opens a config menu window. From the menuconfig window enable the following to enable UVC:
Device Drivers Multimedia support Media USB Adapters [*] USB Video Class [*] UVC input events device support [*]
Save and exit the config menu then execute the following command:
$ MACHINE=cyclone5 bitbake virtual/kernel
The new kernel will be build with the UVC camera drivers enabled and will be available in the /deploy/cyclone5 folder.
For the camera to work, the coherent pool must be set to 4M, this can be done as follows:-

U-Boot Environment Variables

Boot the board, pressing any key to stop at U-Boot console. The messages dispayed on the console will look similar to the following listing:
U-Boot SPL 2013.01.01 (Jan 31 2014 – 13:18:04) BOARD: Altera SOCFPGA Cyclone V Board SDRAM: Initializing MMR registers SDRAM: Calibrating PHY SEQ.C: Preparing to start memory calibration SEQ.C: CALIBRATION PASSED ALTERA DWMMC: 0   U-Boot 2013.01.01 (Nov 04 2013 – 23:53:26)   CPU : Altera SOCFPGA Platform BOARD: Altera SOCFPGA Cyclone V Board DRAM: 1 GiB MMC: ALTERA DWMMC: 0 In: serial Out: serial Err: serial Net: mii0 Warning: failed to set MAC address   Hit any key to stop autoboot: 0 SOCFPGA_CYCLONE5 #

Configuration of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #setenv bootargs console=ttyS0,115200 vmalloc=16M coherent_pool=4M root=${mmcroot} rw rootwait;bootz ${loadaddr} – ${fdtaddr} *

Save of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #saveenv

Boot Kernel

SOCFPGA_CYCLONE5 #boot
Following all the above guidelines, you should be able to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support. This build was successfully implemented on Altera Cyclone V SoC.
Pathpartner author Idris Tarwala
Idris Iqbal Tarwala
Sr. VLSI Design Engineer

HEVC without 4K

The consumer electronics industry wants to increase the number of pixels on every device by 4 times. Without HEVC this will quadruple the storage requirements for video content and clog the already limited network bandwidths. To bring 4K or UHD to the consumer, HEVC standard is essential as it can cut these requirements at least by half. There is no doubt about it. But what the consumer electronics industry wants might not essentially be what the consumer wants. Case in point is 3D TVs, which failed to take off due to the lack of interest from the consumers and original content creators. The same thing might happen to 4K. If the 4K/UHD devices fail to take off, where will HEVC be?
Here, we make a case for HEVC.
HEVC without 4k
 
According to the data released by CISCO, video will be the highest consumer of network bandwidth in the years to come and video has the highest growth rate across all segments. With HEVC based solutions these over clogged networks can take a breather.
Let’s look at how deploying HEVC solutions is advantageous to the major segments of the video industry.

1. Broadcast:

Broadcast industry is a juggernaut which moves very slowly. They will have the biggest advantages in cost saving and consumer satisfaction by adopting HEVC. But, they have to make the highest initial investment as well. There are claims that UHD alone does not add much to the visual experience at a reasonable TV viewing distance for a 40-50 inch TV. The industry is pushing for additional video enhancement tools such as Higher Dynamic Range, more color information and better color representation (BT 2020). UHD support in broadcast may not be feasible without upgrading the infrastructure and with the advantage that HEVC can provide to UHD video including it in the infrastructure upgrade is the optimal choice. But if UHD fails to attract the consumer then what will happen to HEVC? Without UHD broadcast becoming a reality, introduction of HEVC into broadcast infrastructure can be delayed heavily. On the other hand contribution encoding can be heavily benefited from HEVC with reasonable change in infrastructure. But whether broadcast companies adapt to HEVC just for contribution encoding without the support of UHD depends purely on cost of adaption and cost saving by the adaption.

2. Video surveillance:

Surveillance applications are increasing each day and now there is an added overhead for backups on the cloud. The advantage with video surveillance applications is that it does not need backward compatibility. Hence HEVC is an ideal solution for the industry to cut the storage costs for the current systems or to keep it at the same level for new generation of systems which can store more video surveillance data at higher resolutions. ASIC developers are already developing HEVC encoders and decoders and it’s just a matter of time video surveillance systems based on HEVC hit the market. But upgrading current video surveillance systems to support HEVC may not be feasible without the hard struggle of making legacy hardware support HEVC.

3. Video Conference:

Video conference applications can be a tricky situation as it needs to be highly interoperable and it needs backward compatibility with the existing systems. Professional video conference systems might have to support both HEVC as well as earlier codecs for working with already available systems. On the other hand, general video conferencing solutions such as gtalk or Skype would have a problem of licensing HEVC. As none of the current day browsers or Operating systems (Except Windows 10 which most probably has just the decoder) has announced support for HEVC. But advantage that HEVC can bring into video conferencing application can be magnificent. Irrespective of advancement in bandwidth availability and with the introduction of high speed 3G and 4G services, quality of video conferencing experience has remained poor. This can be massively improved with the help of HEVC, which has the potential to enable HD video calling on a 3G network. With or without the help of UHD, at least the professional video conferencing systems will adapt HEVC unless another codec (likes of VP9, Daala) promises better advantages.

4. Storage, streaming and archiving:

Advantage of upgrading archived database to use HEVC needs no explanation. Imagine the number of petabytes of memory that can be saved if all archived videos are converted to HEVC. OTT players like Netflix are already on the process of upgrading to HEVC as it will help them to reduce the burden on ISP providers. And it will also help in reducing storage and transmission cost in video streaming application. Converting such huge data base from one format to another will not be easy. OTT and video streaming applications would need scalable HEVC where each video need to be encoded at different resolutions and different data rates. This would need multi instance encoders continuously running to encode these videos at very high quality. However cost saving in these application by adapting to HEVC is very huge and upgrading to HEVC in storage space will become inevitable.

5. End consumer:

The end consumer gets exposed to HEVC at different levels:
Decoders in the latest gadgets viz. TV, mobile phone, tablet, gaming console and web browsers. Encoders which are built in wherever there is a camera, be it in video calling on mobile phone, tablet or laptop and video recording in mobile phone or standalone camera.
It is difficult to convince the less tech savvy end consumer about the advantages of HEVC, but they are a huge market. In the US it costs around 1$ for 10 Mega Bytes of data. HEVC can indeed help the consumers with higher video quality for the same cost or half the cost for the current video quality. Since HEVC is important to the consumers, almost all chip manufacturers are supporting HEVC or it is in their road map. Even there are consumer electronic products currently in the market with HEVC encoders built in. Definitely HEVC support will be a differentiating factor for the end consumer looking for a new gadget.
Deploying HEVC based solutions will definitely yield gains in the long term after the profits due to the reduction in bandwidth overtake the loss of the initial investments. This is true with each of the segments which we have discussed. With 4K or UHD this initial investment can be merged to higher quality offerings and the costs could be offset. But without any change in the resolution or any other feature, the returns on HEVC investments are definitely high. When the entire video industry adopts the newer and better codec standard HEVC, the gains will multiply.
Pathpartner author Prashanth
Prashanth NS
Technical Lead
Pathpartner author Shashikantha
Shashikantha Srinivas
Technical Lead

Future trends in IVI solutions

In the immediate future we shall witness the next big phase of technology convergence of Automotive Industry with Information technology around epicenter of ‘Connectivity’. There are several reasons for this revolutionary fusion. The first one being: exponential rise in consumer demand. Connectivity on the road is not a luxury anymore for our internet savvy generation; who would be 40% of new automobile users. They are expecting more than just in vehicle GPS navigation. Some developed countries are mandating potential safety features like DSRC on Wi-Fi, 360 degree NLOS, V2V exchange info etc., to reduce the number of accidents and casualties. Inclusion of these safety features on the dashboard is another reason why IVI systems and solutions are the current hottest sell in automotive industry. With these triggering points, automobile manufacturers are now bridging the gap between non IVI technologies and existing dashboards solutions.
The current dashboard solutions are provided by software giants Apple and Google who are very well ahead in providing SDKs for IVI solutions. A few other companies also demonstrated dashboards in CES 2015 with IVI solutions, but CarPlay and Android Auto have got the world’s attention with their adaptability, portability and expandability for various car market segments. Market experts are predicting the key IVI solutions would definitely be Android Auto and CarPlay. These current IVI solutions are only the beginning of a fast evolving technology.
The experts are forecasting the evolution in IVI features is based on maturity of architecture. In primary phase, Smartphone attached dashboard models using connectivity models like USB and Bluetooth continue to be widely used. By 2017, around 60% vehicles are expected to get connected by smartphones. The reuse of smartphone computation power and mobile integrated technologies like telephony, Multimedia, GPS, storage, power management and voice based systems would be in focus. Following next is the progression in direction of connecting beyond car. The conjunction with IoT and cloud technologies would lead involvement of ecosystems like OEM services, HealthCare, education, smart energy systems. Further into future, external control and integration of car ecosystem would be possible including On Board Diagnostics (OBD) systems, usage of SoC for smarter, reliable and better performing features on dashboard, augmented reality and many more options.
The current IVI systems’ architecture is designed to be flexible in adapting future upgrades (such as changing your Mobile phone model, upgrade of OS etc.). IVI solution core can be broadly divided into three categories: Hardware, underlying Operating System and User experience enhancing applications. The architecture mainly constitutes of horizontal and vertical layers.
The vertical layer is made of:
  • Proprietary protocols.
  • Multimedia rendering solutions.
  • Core stack
  • Security and UI
Whereas the Horizontal layer includes:
  • Hardware
  • OS
  • Security Module
Proprietary Portable Protocols are rudiments of IVI connectivity modules and the UI development of IVI head units can be deployed with core engines of Java, Qt, and Gtk. For user connectivity and a better User experience Wi-Fi, LBTE, advance HMI, OBD tools and protocols help, which are still under development.
From a car manufacturer’s point of view centralized upgradable dashboard with all IVI solutions and distinct non IVI technologies integrated would be the prime factor in choosing a car. Here the middleware of dash board needs to play crucial support, integration for multiple IVI solutions providing device independence experience. The flexibility in amending the upcoming IVI technology like data interfaces for application developers are also deciding factors.
Although IVI core solutions are fundamental differentiating factors for end user, car OEMs need satisfactory single roof vendor owning expertise in various IVI, non IVI technologies including system architecture, OS development, protocol, hardware acquaintance and applications. Pathpartner, a product engineering service provider for Automotive Embedded Systems, embraces all these requirements very well. PathPartner with visionary leadership has already crossed the threshold into Automotive Infotainment industry, with a strong business model, which comprise of development activities like porting on multiple platforms such as Freescale, Texas Instruments on infotainment systems based on Android, Linux, and QNX, customized middleware for Android Auto, MirrorLink and CarPlay.
We at PathPartner, with a dedicated team of engineers equipped with niche expertise in embedded products, successfully delivered the certified applications from platforms such as iOS and Windows and continue to work on other IVI solutions for renowned OEM’s.
Pathpartner author Kaustubh
Kaustubh D Joshi

HEVC Compression Gain Analysis

HEVC, the future video codec needs no introduction as it is slowly being injected into multimedia market. Irrespective of very high execution complexity compared to its earlier video standards, there are few software implementations proven to be real time on multi-core architectures. While OTT players like Netflix are already on the process of migrating to HEVC, 4k TV sales are increasing rapidly as HEVC promises to make UHD resolution a reality. ASIC HEVC encoders and decoders are being developed by major SOC vendors that will enable HEVC in hand held battery operated devices in the near future. All these developments are motivated by one major claim

‘HEVC achieves 50% better coding efficiency compared to its predecessor AVC’.

HEVC compression gain
Initial experimental results justifies this claim and shows on an average 50% improvement using desktop encoders and approximately 35% improvement using real time encoders compared to AVC. This improvement is mainly achieved by set of new tools and modification introduced in HEVC but, is not limited to higher block sizes, higher transform sizes, extended intra modes and SAO filtering. You need to note that, there is 50% improvement on an average when results for a set of input contents are accounted. This does not ensure half the bit rate with same quality for every input content. Now, we will tear down this ‘average’ part of the claim and discuss in which situation HEVC is more efficient considering four main factors.
  1. Resolution
  2. Bit rate
  3. Content type
  4. Delay configuration

1. Resolution:

HEVC is expected to enable higher resolution video and to substantiate this, results prove higher coding efficiency gains at higher resolutions. At 4k resolutions, compression gains can be more than 50%. This makes it possible to encode 4k resolution contents with 30 frames per second at bit rates of 10 to 15 Mbps with decent quality. Reason behind such behavior is a feature/tool introduced in HEVC that contributes more than half the coding efficiency gains i.e. higher coding block sizes. At high resolution, larger block sizes leads to better compression as neighboring pixels have higher correlation among them. It is observed that 1080p sequences have on an average 3-4% better compression gains compared to their 720p counterparts.

2. Bitrate:

Encoder results indicate better compression gains at lower and mid-range bit rates compared to very high bit rates. At low QPs, transform coefficients contribute more than 80% of the bits. Having larger coding unit helps in saving bits used for MVD and other block header coding. Bigger transform blocks will have better compression gains as it has large number of coefficients for energy compaction. But at high bit rate, header bits takes a small percentage of the bit stream suppressing the gains from higher block sizes. And low QPs reduce the number of non-zero coefficients which limit the gains possible from high transform blocks.
We have observed that, for ParkJoy sequence BD-rate gains were 8% better at QP range of 32-40 compared to QP range of 24 – 32. Similar behavior has been found in most of the sequences.

3. Content type:

Video frames with uniform content has shown better compression gains as it aids higher block sizes. Encoder tend to select lower block sizes for frames with very high activity which makes the block division similar to AVC, reducing efficiency of HEVC. Lower correlation between pixels limits compression gains produced by larger transform blocks. Streams such as BlueSky having significant uniform content in a frame produced 10% better gains compared to high activity streams like ParkJoy.
On the other hand, videos with stationary contents or low motion videos produced better gains as higher block sizes are chosen in inter frames for such contents.

4 Delay Configuration:

Video encoders can be configured to use different GOP structures based on application needs. Here, we are going to analyze compression gains of different delay configurations w.r.t AVC. Firstly, all intra frame cases (used for low delay, high quality, error robust broadcast application) produce nearly 30 – 35% gain deviating from 50% claims of HEVC. Higher block sizes are not very effective in intra frames and gains produced with the help of other new tools such as additional intra modes, SAO etc., limits the average gain to 35%. Hence HEVC-Intra will not be 50% better than AVC-Intra. It is also observed that, Low delay IPP GOP configuration produces slightly better gains (Approximately 3-4% on an average) compared to Random Access GOP configuration with B frames. This could be purely due to the implementation differences in HM reference software.
Thus 50% gains cannot be achieved for all video contents but many encoder implementations have proven nearly 50% or more average gains compared to AVC. Such compression efficiency can have a major impact on broadcast and OTT applications where half the bandwidth consumption can be reduced, thus half the costs. Though other emerging video compression standards such as VPx, Daala claim similar gains, they are yet to be proven. It would be interesting to see how VPx, Daala can change the future of HEVC. But right now one thing is sure, HEVC is the future video codec and it is here!
Pathpartner author Prashanth
Prashant N S
Technical Lead

Analysis, Architectures and Modelling of HEVC Encoder

High Efficiency Video Codec (HEVC) also known as H.265 has become the talk of the town in video compression industry for the past two years as it promises to significantly improve the video experience. HEVC is the next generation video compression standard being developed by Joint Collaborative Team – Video coding (JCT-VC) formed by ITU-T VCEG and ISO/IEC MPEG standard bodies. HEVC claims to save 50% bit-rate for same video quality compared to its predecessor AVC. This comes at a huge cost of increased computation power requirement as complexity of HEVC is higher in multiple magnitude, compared to AVC. Computational complexity of HEVC decoder is expected to be twice as that of AVC, but HEVC encoder complexity might be 6x – 8x of AVC complexity.
HEVC is a block based hybrid video codec similar to earlier standards with few new tools to improve coding efficiency which will increase the computational complexity, as configuration of these new tools with appropriate data is required for every block.
This blog is written in two parts. In the first part, we discuss the possible implementation of HEVC encoder on different homogeneous and heterogeneous architectures, by comparing them for Video quality, Execution speed, Power efficiency, Memory requirement and Development cost. Modelling of the HEVC Encoder on various architecture will be discussed in Part 2 of the same blog.

Single CPU

Single core CPU solutions are mainly targeted to achieve best video quality and do not really focus on encoding time. These solutions are used in generating benchmark video sequences and in archiving applications which are mainly PC-based. Taking advantage of sequential processing, feedback from each stage can be utilized in further stages to enhance the video quality. Another advantage with single core solution is limited memory usage as single instances of data structures would suffice. These solutions may not be power efficient as complex algorithms are used to achieve best video quality. There are H.264 video encoders in market which run on single-core that achieves real-time encoding with decent quality but may not generate benchmarking sequences. But when it comes to HEVC, single core real- time solutions are not available in the market at this point of time and are not practical.

Multicore CPU

HEVC encoders are highly complex due to increased number of combinations in encoding options. Real- time HEVC encoding can be realized in multi-core solutions with trade-off in the video quality. Amount of trade-off purely depends on the type of parallelism implemented. Multi-core implementations offer flexibility to either have data partitioning or task partition. In case of data partitioning, there are high chances of breaking neighbor dependencies which will result in higher trade-off in terms of video quality. HEVC introduced new tool known as TILES in addition to slices, which is targeted for multi-core solutions without much impact on video quality. In task partition kind of design, achieving right load balance among tasks is a challenging job for better core utilization. The performance that could be achieved on multi-core gets limited due to number of cores present in a single-chip. Mutli-core solutions have bigger memory footprint as data structure needs to be replicated for different CPUs. Usage of cache needs to be carefully managed as shared memory can be accessed by all the CPUs. And also, increased number of CPUs will require higher amount of power making the solution power inefficient. Development cost of multicore solutions are relatively high compared to single CPU solutions as it requires complex designs for efficient task scheduling. Multicore solutions though power inefficient, can produce real time performance with decent video quality finding its applications in broadcast domain.
hevc_solutions
Fig: Comparison of Key factors across different Architectures
 

CPU + GPU

Heterogeneous architectures are one of the solutions for achieving real time performance in HEVC encoder. As HEVC encoders are highly complex, using GPU for performing multiple data processing tasks would help in improving the performance by a great margin. This will reduce the load on main CPU thus freeing it for other tasks. Introduction of GPU helps in power optimizing the encoder solution as GPUs are highly power efficient. Usage of GPU for video encoding will further boost the hardware resource utilization as GPUs are generally idle during video processing. All these advantages comes with the cost of video quality. Usage of heterogeneous architecture pose challenges in handling functional dependencies and neighbor data dependencies in block based HEVC codec, which will lead to reduction in video quality compared to sequential execution. Hence, the independent execution nature of many-core GPU architecture might degrade video quality by significant level. Further, usage of GPUs for sequential functionalities — such as entropy coding — is inefficient. This posses a greater challenge of syncing between CPU and GPU. Along with syncing issue, heterogeneous systems have another challenge of managing distributed memories. In highly band-width intensive video processing, problems like memory requirements & bandwidth increase with distributed memories. These solutions require in depth knowledge of the distributed memory architecture and frameworks supporting heterogeneous platforms, resulting in increased development cost and time. Most of the SOCs used in consumer electronic devices have GPUs which can be used for video processing.

ASIC

Application Specific Integrated Circuit (ASIC) are the best way for achieving real time & power-efficient HEVC encoder solutions. In case of ASICs, hardware IPs will be built for different functionalities of HEVC codec. Though hardware solutions are much faster than their software counterparts, functional and neighbor data dependencies required by HEVC codec would limit its capabilities, as it is necessary to build an intelligent pipeline between these hardware modules. These hardware modules will have their own memory there by increasing memory footprint. These limitations will eventually lead to drop in video quality as it compromises the need of sequential execution. Video quality drop can be minimized to a great extent by proper implementation of hardware pipeline and efficient video algorithms. Generally these ASIC solutions are highly power optimized, targeted at consumer electronics with huge volume requirement. Complex designs, verification, validation and silicon manufacturing will increase the development cost, but provides best performing power efficient encoders. Not many ASIC HEVC solutions are expected due to increased complexity of the codec, increased cost of SOC manufacturing with advanced process, lack of VC funding in semiconductor startups and limited volume broadcast market where HEVC is required. Also, it is well known fact that video encoders evolve over the time and ASIC solutions fail to adopt the new algorithms. Because of this particular reason, ASIC solutions ar solutions may be adopted once HEVC encoders achieve a greater maturity in software solutions.
FPGA can be placed in between GPU solutions and ASICs in terms of performance, with similar video quality. FPGA providfigcaptionardware implementation and can adapt to evolving encoder algoritfigcaptionms. With software background, it would be easier to implement HEVC encoder on GPU rather than on FPGA, but greater performance can be achieved using FPGA while maintaining the video quality. Though there is a price advantage in using GPUs, the power consumption is much higher when compared to FPGA. FPGAs require larger die area making it unsuitable for consumer electronics. Cost and time needed for developing FPGA based video encoders are higher than GPU solutions due to hardware programming, but less than ASIC solutions as complex hardware designs are not necessary. FPGA video encoder solutions find its usage in markets with low volume requirement where ASIC are not a cost effective option. Due to right combination of power, performance, video quality and development costs, these solutions are effective for video surveillance and broadcast applications.
Each one of these solutions has its own pros and cons. Single CPU is the best solution especially for archiving applications where real time encoding is not needed. And ASIC SOCs finds the best usage in consumer electronics as it provides power efficient real time performance. Every solution has its own importance based on the application requirements about speed, quality, power consumption and development time & costs. (Stay tuned for the part-2 of this blog…….)
Pathpartner author Prashanth
Prashanth NS
Technical Lead
Pathpartner author Praveen
Pravenn GB
Sr. Software Engineer