High Efficiency Video Codec is the next generation video codec which promises to cut the bit-rate in half when compared to the H264/AVC codec. HEVC has bigger block sizes, more prediction modes and a whole lot of other algorithms to achieve the said target. However, this comes with substantially higher costs in terms of computational power requirement. HEVC, as it stands today, needs about 8 times more computational power to deliver twice the compression ratio.
Keeping pace with Moore’s law, the general purpose processors have been cramming more logic in lesser space, thanks to the evolving manufacturing processes that are adding more and more computational power. This computational power is available in the form of increased cores per processor. So the entire focus of HEVC encoders is not just having the fastest algorithm with the best quality, but also on achieving the one which can be executed in parallel on multiple cores with a minimal penalty on the quality. Server grade processors have been cramming more than 18-cores into a single socket, increasing the importance of system design in HEVC encoders. This provides a strong new impetus to transform all algorithms and code-flows to maximally utilize the entire available computational power across all the cores.
So how does HEVC fare in a multicore scenario?
HEVC has included many tools which are parallel processing friendly. These multi-core friendly tools are discussed in detail in this whitepaper:
Parallelizing HEVC encoder using slices and tiles is very simple. An input frame is divided into a number of slices, equivalent to the number of cores or threads required to process. This results in completion of the best possible multicore encoder in the shortest duration with a very good multicore scaling factor.
Scaling factor on a multicore system is the speedup achieved over a single core, by using multiple cores for the same job. If a job takes 1s on a single core but takes only 0.5s on an N-core system, the scaling factor in such case is 2. If the N-core is a dual core system, an ideal scaling of 100% is achieved. But if it is a quad core system, only 50% of scaling is achieved. It is almost impossible to achieve ideal scaling unless the job being done contains further independent jobs.
Figure 1 : Slices and Tiles
An HEVC frame, when partitioned into slices and tiles and encoded on different cores, should have an ideal scaling because each slice and tile is independent of each other. But due to the total complexity of the blocks being encoded in the slice or tile, this cannot be achieved. Each slice or tile varies in complexity, and hence, different cores take different amounts of time to encode them. The amount of time that the threads wait after their tasks are done is inversely proportional to the complexity of the slice or tile they encode. And the wait time too is inversely proportional to the scaling factor; i.e., the more the core waits, the less is the scaling. This is further aggravated by the fast algorithms that are present in encoders which let them predict the encoded modes accurately.
Also, by encoding with slices or tiles the frame basically develops into a collection of segments of independently encoded streams which have no interlink between them. This will have a large impact on visual quality with visible compression artefacts at the edges of slices or tiles. These artefacts can be partially avoided by applying de-blocking and the new SAO filters across the slices or tiles. But when encoding a high motion sequence with a challenging bitrate, the artefacts will definitely be noticeable.
The challenge in multicore HEVC encoding is always achieving the best possible scaling while sacrificing the least possible video quality. Performance and quality measures of a video encoder are always in battle with each other, and with multicore the battle gets more ammunition. But diplomacy, which is, Wavefront Parallel Processing (WPP) tries to keep peace to a certain extent.
Figure 2 : Wavefront Parallel Processing