Analysis, Architectures and Modelling of HEVC Encoder
June 23, 2014
High Efficiency Video Codec (HEVC) also known as H.265 has become the talk of the town in video compression industry for the past two years as it promises to significantly improve the video experience. HEVC is the next generation video compression standard being developed by Joint Collaborative Team - Video coding (JCT-VC) formed by ITU-T VCEG and ISO/IEC MPEG standard bodies. HEVC claims to save 50% bit-rate for same video quality compared to its predecessor AVC. This comes at a huge cost of increased computation power requirement as complexity of HEVC is higher in multiple magnitude, compared to AVC. Computational complexity of HEVC decoder is expected to be twice as that of AVC, but HEVC encoder complexity might be 6x - 8x of AVC complexity.
HEVC is a block based hybrid video codec similar to earlier standards with few new tools to improve coding efficiency which will increase the computational complexity, as configuration of these new tools with appropriate data is required for every block.
This blog is written in two parts. In the first part, we discuss the possible implementation of HEVC encoder on different homogeneous and heterogeneous architectures, by comparing them for Video quality, Execution speed, Power efficiency, Memory requirement and Development cost. Modelling of the HEVC Encoder on various architecture will be discussed in Part 2 of the same blog.
Single core CPU solutions are mainly targeted to achieve best video quality and do not really focus on encoding time. These solutions are used in generating benchmark video sequences and in archiving applications which are mainly PC-based. Taking advantage of sequential processing, feedback from each stage can be utilized in further stages to enhance the video quality. Another advantage with single core solution is limited memory usage as single instances of data structures would suffice. These solutions may not be power efficient as complex algorithms are used to achieve best video quality. There are H.264 video encoders in market which run on single-core that achieves real-time encoding with decent quality but may not generate benchmarking sequences. But when it comes to HEVC, single core real- time solutions are not available in the market at this point of time and are not practical.
HEVC encoders are highly complex due to increased number of combinations in encoding options. Real- time HEVC encoding can be realized in multi-core solutions with trade-off in the video quality. Amount of trade-off purely depends on the type of parallelism implemented. Multi-core implementations offer flexibility to either have data partitioning or task partition. In case of data partitioning, there are high chances of breaking neighbor dependencies which will result in higher trade-off in terms of video quality. HEVC introduced new tool known as TILES in addition to slices, which is targeted for multi-core solutions without much impact on video quality. In task partition kind of design, achieving right load balance among tasks is a challenging job for better core utilization. The performance that could be achieved on multi-core gets limited due to number of cores present in a single-chip. Mutli-core solutions have bigger memory footprint as data structure needs to be replicated for different CPUs. Usage of cache needs to be carefully managed as shared memory can be accessed by all the CPUs. And also, increased number of CPUs will require higher amount of power making the solution power inefficient. Development cost of multicore solutions are relatively high compared to single CPU solutions as it requires complex designs for efficient task scheduling. Multicore solutions though power inefficient, can produce real time performance with decent video quality finding its applications in broadcast domain.
CPU + GPU
Heterogeneous architectures are one of the solutions for achieving real time performance in HEVC encoder. As HEVC encoders are highly complex, using GPU for performing multiple data processing tasks would help in improving the performance by a great margin. This will reduce the load on main CPU thus freeing it for other tasks. Introduction of GPU helps in power optimizing the encoder solution as GPUs are highly power efficient. Usage of GPU for video encoding will further boost the hardware resource utilization as GPUs are generally idle during video processing. All these advantages comes with the cost of video quality. Usage of heterogeneous architecture pose challenges in handling functional dependencies and neighbor data dependencies in block based HEVC codec, which will lead to reduction in video quality compared to sequential execution. Hence, the independent execution nature of many-core GPU architecture might degrade video quality by significant level. Further, usage of GPUs for sequential functionalities -- such as entropy coding -- is inefficient. This posses a greater challenge of syncing between CPU and GPU. Along with syncing issue, heterogeneous systems have another challenge of managing distributed memories. In highly band-width intensive video processing, problems like memory requirements & bandwidth increase with distributed memories. These solutions require in depth knowledge of the distributed memory architecture and frameworks supporting heterogeneous platforms, resulting in increased development cost and time. Most of the SOCs used in consumer electronic devices have GPUs which can be used for video processing.
Application Specific Integrated Circuit (ASIC) are the best way for achieving real time & power-efficient HEVC encoder solutions. In case of ASICs, hardware IPs will be built for different functionalities of HEVC codec. Though hardware solutions are much faster than their software counterparts, functional and neighbor data dependencies required by HEVC codec would limit its capabilities, as it is necessary to build an intelligent pipeline between these hardware modules. These hardware modules will have their own memory there by increasing memory footprint. These limitations will eventually lead to drop in video quality as it compromises the need of sequential execution. Video quality drop can be minimized to a great extent by proper implementation of hardware pipeline and efficient video algorithms. Generally these ASIC solutions are highly power optimized, targeted at consumer electronics with huge volume requirement. Complex designs, verification, validation and silicon manufacturing will increase the development cost, but provides best performing power efficient encoders. Not many ASIC HEVC solutions are expected due to increased complexity of the codec, increased cost of SOC manufacturing with advanced process, lack of VC funding in semiconductor startups and limited volume broadcast market where HEVC is required. Also, it is well known fact that video encoders evolve over the time and ASIC solutions fail to adopt the new algorithms. Because of this particular reason, ASIC solutions ar solutions may be adopted once HEVC encoders achieve a greater maturity in software solutions.
FPGA can be placed in between GPU solutions and ASICs in terms of performance, with similar video quality. FPGA providfigcaptionardware implementation and can adapt to evolving encoder algoritfigcaptionms. With software background, it would be easier to implement HEVC encoder on GPU rather than on FPGA, but greater performance can be achieved using FPGA while maintaining the video quality. Though there is a price advantage in using GPUs, the power consumption is much higher when compared to FPGA. FPGAs require larger die area making it unsuitable for consumer electronics. Cost and time needed for developing FPGA based video encoders are higher than GPU solutions due to hardware programming, but less than ASIC solutions as complex hardware designs are not necessary. FPGA video encoder solutions find its usage in markets with low volume requirement where ASIC are not a cost effective option. Due to right combination of power, performance, video quality and development costs, these solutions are effective for video surveillance and broadcast applications.
Each one of these solutions has its own pros and cons. Single CPU is the best solution especially for archiving applications where real time encoding is not needed. And ASIC SOCs finds the best usage in consumer electronics as it provides power efficient real time performance. Every solution has its own importance based on the application requirements about speed, quality, power consumption and development time & costs. (Stay tuned for the part-2 of this blog…….)