Camera is been in use since one can think of, from taking pictures to motion picture, cameras have also evolved rapidly. With advent of newer and improved camera technology, transmission problem has also increased, with better quality camera the video size is also significantly higher resulting in latency related issues. To overcome this, compression of data is required. However, compression may result in loss of finite details that cannot be seen by human eye but is picked up by computer vision algorithm. Machine vision or computer vision is seeing lot of traction in everyday application. It is next ideal step is to use this in various camera-based application across domains like industrial (production line), surveillance, broadcasting, automotive and more computer vision algorithms have both advantage and disadvantage, wherein their accuracy completely depends on the camera used, it’s quality and throughput for seamless data transmission without any loss. As one cannot afford the delay caused due to bandwidth issue. The trade-off between two is using compression techniques. Compression helps in reducing the size of the video stream that is used by computer vision algorithms.
Compression can be broadly classified into lossless and lossy compression. Lossless compression involves perfect reconstruction of compressed data without any loss of original data. However, lossy compression results in loss of coarse details which that are not so important, so it is difficult to reproduce the original data. But a well-designed lossy compression algorithm can reduce the loss of data during compression. Majority of applications need to run advanced machine learning algorithms to detect complex events in a real time manner. Any loss in data fidelity can impact the algorithm outcome. Hence it is imperative that camera data is transmitted to the central processing unit (which houses computer vision algorithm) in a lossless manner.
Figure 1: Data compression types
Video compression codecs used for general applications like multimedia transmission and archiving uses lossy compression methods where the redundant parts of the captured video stream that is not perceptible to human users are removed. Unlike applications like medical imaging, retaining all the captured content is not necessary for general multimedia applications. In lossy compression domain, the state-of-the-art image and video compression methods like JPEG and High-Efficiency Video Coding (HEVC) relies on quantization and rate control that doesn’t affect the perceptual quality of the human observer. Especially after the widespread usage of vision algorithms for analysis, the future development approaches in this area could be technically classified into three major categories. 1.Bottom-Up Approach 2.Top-Down Approach 3.Hybrid Approach
Figure 2: Three major categories in lossy compression
This is the traditional video compression approach which relies purely on the visual information captured by the video camera and doesn’t make any assumption on the application scenario. Hence, the video compression algorithms developed in this approach are generic and will suit for any kind of video data. In order to be generic, the compression methods in this approach exploit only the limitations of human visual perception and rely on efficient data coding methods to achieve high compression rates. Being generic techniques, these compression methods operate on only “low-level” visual features and reduce only objective redundancy in the visual data. The compression method depends fully on the content of the captured video stream than on the high-level cues the application context provides. Even the methods that utilize application context for choosing the encoding style, bit rate, etc. will not make use of the high-level cues like objects of interest. Recently developed methods in this approach achieve higher compression rates than HEVC, by choosing data quantization and encoding styles suitable for the underlying hardware platforms like FPGA. Application specific customized encoding schemes in the approach will achieve medium compression rates and will affect the quality of different vision algorithms differently.
Advantages:More Generic, Medium Compression.
Disadvantages:Difficult to quantify as they affect accuracy of various vision algorithms differently.
Figure 3: Bottom-up approach of lossy compression
Unlike the traditional methods that rely mostly on the low-level content of the video to achieve compression, the methods in the top-down approach are driven by the “high level” application context. Hence, these methods are highly application specific and cannot be used to compress generic video data. For example, utilizing object detection algorithms (face, people and cars, etc.) to identify the objects of interest and encode them with high bit rate. As vision algorithms like object detection, face detection, and Semantic Segmentation are achieving reliable accuracies in a given application context, the results of these algorithms can be used to drive the quantization and encoding levels of the compression algorithm achieving higher compression ratios for that application. Variable size block encoding available in modern codecs can be used to accommodate the non-geometric Region of Interest’s (ROI) generated by vision algorithms like segmentation.
Advantages: High Compression.
Disadvantages: Application specific methods. Require hardware acceleration.
Figure 4: Top down approach of lossy compression
As explained in the previous sections, either generic bottom-up methods or application specific top-down methods have all the advantages. A trade-off exists between generalness of the compression methods and the achievable compression rates. Though application specific methods can achieve higher compression ratios, they may be highly specific to those use cases depending on the method. The hybrid approach takes a middle path, where “low-level” content-based compression methods are driven using “high-level” cues from the application specific object detectors.
In this approach, “mid-level” visual features like interest points, visual saliency, Histogram of Oriented Gradients (HOG) and neural network encoder features are used to represent the video frames and compressed using various data quantization and encoding methods. These features or visual descriptors which are suitable for a wide range of vision applications are encoded and transmitted instead of the whole frame data. The applicability of the compression methods in this approach will be wider than the methods using a top-down approach, maintaining almost the same level of compression.
MPEG-7 is a multimedia content description standard that defines description schemes for various application scenarios. Similarly, application specific description schemes can be defined for a range of applications that use a common set of features for their analysis.
Figure 5: Hybrid approach of lossy compression
Lossless compression has always played a pivotal role in domains like broadcast and automotive, which revolves around machine vision algorithms and multimedia applications. Lossless compression aides in improving the accuracy of these algorithms and removing redundancies in the spatial and temporal domain, thus transferring more data in less time over any transmission medium. This compression technique results in no loss of input data; hence it is used predominantly for video compression. Some of the lossless compression techniques are:
- Huffman compression- Huffman coding is a statistical coding method; this lossless compression technique involves assigning variable length (Depending on the frequency of occurrence of the character under consideration) codes (bit sequences) to the characters from input data. These variable length codes are assigned in such a way that the code is not the prefix of the other character.
- Run-length encoding/RLE- Run length encoding is one of the more popular lossless compression techniques because of reduced hardware and other resources requirement. This method is based on the idea that series of similar data can be replaced by one shorter sequence. The Major drawback of this method is that it is applicable only if the data under consideration has lot off repetitive data.
- Lempel Ziv/LZW- This technique was predominantly used for gif and other formats but it works seamlessly for text. LZW converts the given data to a table-based look up technique. A dynamic table is created with the data from the original file. The data from the original file is compared to the table. If a match is found the reference of the same is mentioned, while if a match is not found, a new entry in made in the table.
- Shannon Fano- The first step in this lossless method is identifying frequency count for each character in the input data, and the next step is to arrange them in decreasing order of frequency. The first one will be given a value of 0, and the last 1. This process is repeated until all the symbols are split into subgroups.
- Arithmetic coding- One of the simplest method of data compression, this involves replacing each bit with a code, which results in replacement of string of input data into a single floating-point number as an output. Simply put, a batch of input data is replaced by a single code in arithmetic coding
- CAVLC- Context adaptive variable length coding (CAVLC) is a type of entropy coding that is used in H.264. CAVLC can be used as alternative to CABAC, but it is not as effective as CAVLC. CAVLC supports all H.264 formats.
- CABAC- Context based adaptive binary arithmetic coding is again a type of entropy coding that is used in HEVC (High efficiency video coding). CABAC provides better compression than other entropy coding and provides better compression than its predecessor.
As computer vision-based analytics has reached practical accuracies with the advent of deep neural networks, many applications require high processing throughput with low energy consumption while running on the embedded platforms. These kinds of applications need higher compression than the generic state of the art video codecs. The top-Down approach-based video compression methods can be used to achieve required compression levels if it is possible to define the application context accurately. Hybrid approach-based compression techniques can be used, in case of a range of similar applications that typically use a common set of features for analysis.
With early adopters increasing day by day camera technology and video recording is improving drastically. However, the fundamental challenge of data transmission is still there. Thanks to these compression techniques user/ developer can choose depending on the application, the type of compression to be used.