All You Need To Know About Bluetooth MESH Technology

Traditional Bluetooth network was limited in its application with its one-on- one connection. With introduction of a mesh network support from Bluetooth 5.0 resolves this limitation. In this article, we try to introduce about mesh network and how it works.

MESH vs Peer-to- Peer

Any Bluetooth device communicates using Peer-to- peer manner and create own cluster called Pico Net. In this mode, a device can communicate only with node connected to it and not with the other nodes.
Mesh network has many to many topologies. Each device in mesh network can communicate with any device in the mesh network. Thus, the end to end communication is extended beyond the radio range of each individual node.
MESH vs Peer-to- Peer

Mesh Architecture

The devices inside mesh network are called Nodes. And any devices that are outside the mesh are called ‘unprovisioned nodes’. The process of transforming an ‘Unprovisioned node’ to ‘Node’ by adding to the mesh is called ‘Provisioning’.
Unprovisioned Nodes
Figure 1: Unprovisioned Nodes

Messages

The communication between the Nodes is controlled through Messages. The nodes can transmit and receive messages of several types with different op-codes. ‘Ackowledged’ messages require response from receiver. ‘Unacknowledged’ messages require no response from receiver.
3 Types of messages:
  • GET messages request the value of a given state from one or more nodes.
  • STATUS messages is sent in response to GET message, with the state value.
  • SET message change the value of a given state.

Addresses

The Nodes send and receive messages from an address. ‘Virtual’ address is assigned to one or more elements spanning across one or more nodes. They are pre-configured at manufacture. They can be SIG defined or dynamically assigned.
  • ‘Unicast’ address uniquely identifies a single element. It is assigned during provisioning.
  • ‘Group’ address is a multicast address which represents one or more elements.
multicast-messaging unicast-messaging-node-a-to-b

Publish/Subscribe

Sending messages is called ‘publishing’. Nodes are configured to select messages sent to particular addresses to process. This is called subscribing.

States and Properties

Each element contain a value which defines its state. For example, an element Light takes values of type On Off. On value makes the Light switch-on, Off value makes it switch-off.
A property is also a value associated with the element which provides a context for interpreting a value. ‘Manufacturer’ properties are read only. ‘Admin’ properties allows read-write access. Changes from one state to another is called ‘State Transition’. It can be instantaneous or executed over a period of time called transition time. Change in a state that triggers change in other state is called Bound states.

Models

Models define some or all of the functionality of an element as it relates to the mesh network.
There are four types of models:
  • Server model – Defines states, state transitions, state bindings and messages which the element may send or receive.
  • Client model – Defines messages to send or receive in order to GET, SET or acquire the STATUS of the states defined in the corresponding server model.
  • Control – Contains both client and server models.
  • Root model – Model which cannot be extended.

Generics

Semantically equivalent states differ devices are categorized as Generics. Example: GenericOnOff, GenericLevel

Scenes

Scenes are stored collection of states which are recalled and made current by receipt of a special type of message or at a specified time. They allow a series of nodes to be set to complimentary in one coordinated fashion.

Provisioning Process

Process of a device joining mesh network and becoming a node is called provisioning. A device used to drive provisioning is a Provisioner. Ex: mobile app
Provisioning Process Involves 5 steps –
  • Beaconing – Device advertises its availability using a special advertise message
  • Invitation – Provisioner sends an invitation to the device
  • Public keys exchange – Device and provisioner exchange public keys directly or OOB method
  • Authentication – Device and provisioner authenticate each other
  • Provisioned data distribution – Mesh security parameter IV index and a unicast address allocated by provisoner

Nodes

All nodes can transmit and receive messages.
There are 4 types of nodes –
  • Relay Nodes – They retransmit received messages.
  • Low power nodes – They conserve energy by send messages only on need basis and open to receive messages only periodically.
  • Friend nodes – They store messages intended for low power nodes when the latter is not in receiving mode.
  • Proxy nodes – Erstwhile device with no MESH support use GATT protocol to interact with nodes in MESH network.

Software OSI Layer Architecture

Software OSI Layer Architecture

Security

Security is mandatory in MESH, unlike BLE. The following are the security fundamentals in Bluetooth mesh
Encryption and Authentication – All mesh messages are encrypted and authenticated. It uses AES-CCM to authenticate and encrypt access payload in upper transport layer and network parameter like destination address and transport PDU.
Separation of concerns – The network security, application security and device security are all addressed independently. Access payload is encrypted using Application Key or Device Key and network parameters are encrypted using Network Key.
Key Refresh – Security keys can be changed during the life of the mesh network via a Key Refresh procedure.
Message Obfuscation – Message obfuscation makes it difficult to track messages sent within the network providing a privacy mechanism to make it difficult to track nodes.
Replay Attack Protection – Mesh security protects the network against replay attacks. This is achieved by using two network PDU values SEQ (sequence number) and IV Index respectively. Elements increment SEQ value every time they publish a message. Elements within the same node may or may not share the sequence number space with each other. A node discards the message received from an element whose SEQ value is less than or equal to the last valid message from that element. IV Index values within messages from an element must always be greater than or equal to the last valid message from that element. All nodes in a mesh network share the same value of the IV Index and use it for all subnets they belong to.
Secure Device Provisioning – The process by which devices are added to the mesh network to become nodes, is itself a secure process.
Trashcan Attack Protection – Nodes can be removed from the network securely, in a way which prevents trashcan attacks.

MESH vs WiFi

Mesh Wi-Fi
Network Communication Via multiple nodes Messages are broadcasted to all nodes. Thus message arrives at a destination through multiple paths. Central device (router) If router is unavailable, network becomes unavailable.
Battery Life (days) 100-1000+ 0.5-5
Data transmission rate (KB/S) 20-250 11,000+
Transmission Range(Meters) 1-100+ 1-100
Application Monitoring and Control Web, video, entertainment etc.
USP Reliability, power, cost Speed, flexibility

Applications of Bluetooth Mesh

  • Robotics
  • Energy Management
  • Smart City Applications
  • Facilities Management

Parallelizing the HEVC Encoder

High Efficiency Video Codec is the next generation video codec which promises to cut the bit-rate in half when compared to the H264/AVC codec. HEVC has bigger block sizes, more prediction modes and a whole lot of other algorithms to achieve the said target. However, this comes with substantially higher costs in terms of computational power requirement. HEVC, as it stands today, needs about 8 times more computational power to deliver twice the compression ratio.
Keeping pace with Moore’s law, the general purpose processors have been cramming more logic in lesser space, thanks to the evolving manufacturing processes that are adding more and more computational power. This computational power is available in the form of increased cores per processor. So the entire focus of HEVC encoders is not just having the fastest algorithm with the best quality, but also on achieving the one which can be executed in parallel on multiple cores with a minimal penalty on the quality. Server grade processors have been cramming more than 18-cores into a single socket, increasing the importance of system design in HEVC encoders. This provides a strong new impetus to transform all algorithms and code-flows to maximally utilize the entire available computational power across all the cores.

So how does HEVC fare in a multicore scenario?

HEVC has included many tools which are parallel processing friendly. These multi-core friendly tools are discussed in detail in this whitepaper:
https://www.pathpartnertech.com/wp-content/uploads/2014/11/PathPartner_WhitePaper_Analysis-Of-HEVC-Parallel-Tools-1.pdf
Parallelizing HEVC encoder using slices and tiles is very simple. An input frame is divided into a number of slices, equivalent to the number of cores or threads required to process. This results in completion of the best possible multicore encoder in the shortest duration with a very good multicore scaling factor.
Scaling factor on a multicore system is the speedup achieved over a single core, by using multiple cores for the same job. If a job takes 1s on a single core but takes only 0.5s on an N-core system, the scaling factor in such case is 2. If the N-core is a dual core system, an ideal scaling of 100% is achieved. But if it is a quad core system, only 50% of scaling is achieved. It is almost impossible to achieve ideal scaling unless the job being done contains further independent jobs.
slice-tile
Figure 1 : Slices and Tiles
An HEVC frame, when partitioned into slices and tiles and encoded on different cores, should have an ideal scaling because each slice and tile is independent of each other. But due to the total complexity of the blocks being encoded in the slice or tile, this cannot be achieved. Each slice or tile varies in complexity, and hence, different cores take different amounts of time to encode them. The amount of time that the threads wait after their tasks are done is inversely proportional to the complexity of the slice or tile they encode. And the wait time too is inversely proportional to the scaling factor; i.e., the more the core waits, the less is the scaling. This is further aggravated by the fast algorithms that are present in encoders which let them predict the encoded modes accurately.
Also, by encoding with slices or tiles the frame basically develops into a collection of segments of independently encoded streams which have no interlink between them. This will have a large impact on visual quality with visible compression artefacts at the edges of slices or tiles. These artefacts can be partially avoided by applying de-blocking and the new SAO filters across the slices or tiles. But when encoding a high motion sequence with a challenging bitrate, the artefacts will definitely be noticeable.
The challenge in multicore HEVC encoding is always achieving the best possible scaling while sacrificing the least possible video quality. Performance and quality measures of a video encoder are always in battle with each other, and with multicore the battle gets more ammunition. But diplomacy, which is, Wavefront Parallel Processing (WPP) tries to keep peace to a certain extent.

WPP

Wavefront Parallel Processing
Figure 2 : Wavefront Parallel Processing
One of the major serial processing block in any video encoder is the block by block arithmetic coding of the signals and transform coefficients in a raster scan order. Again with slices and tiles, this can be parallelized but with a penalty on the bits taken to encode. With Wavefront Parallel Processing or entropy sync, the arithmetic coding is parallelized with a catch. The bottom row takes the seed context from its top right neighbor before starting its row encoding. This results in a penalty which is lesser than slices or tile encoding.
The same approach to parallelize the mode decision encoder is taken where each CTU block waits for the completion of its top right neighbor.
Parallel row HEVC encoding
Figure 3 : Parallel row HEVC encoding
The above figure shows a quad core encoder, whose pipeline is built up and is operating in a steady state. Before starting the encoding of each CTU, the completion of top right is checked and since the top row is ahead of the current row, the check is almost always positive. This design preserves the data dependencies present in the original encoder and gives the best possible performance while sacrificing least amount of quality. This design has an overhead of pipe up and pipe down stages, but when encoding a huge number of CTUs, they can be neglected. Also, there will be small scaling losses due to differences in CTU encoding times.
The multicore parallelizing schemes presented here scale up with more cores and bigger input resolutions, but with a drop in scaling efficiency. Having cores on more than one socket, the locality of the data used by the encoder contributes a major chunk in the encoder performance. PathPartner’s multicore HEVC encoder is NUMA optimized to achieve best possible scaling on systems with more than one socket.
Pathpartner author Shashikantha
Shashikantha Srinivas
Technical Lead

Microphone Array Beamforming

microphone array system – Beamforming microphone
Figure 1: Typical conference scenario using Microphone Array Beamformer
Beamforming, also known as Spatial Filtering, is a signal processing technique used in case of microphone array processing. Beamforming exploits spatial diversity of the microphones in the array to detect and extract desired source signals and suppress unwanted interference. Beamforming is used to direct and steer the composite microphones’ directivity beam in a particular direction based on the signal source direction. This technique helps to boost the composite range of microphone and increases the signal-to-noise (SNR) ratio. Figure 1 shows how Beamformed microphone array picks signal source of interest. In simple terms, the signals captured by multiple microphones are combined to produce a stronger one. To sum up and list, the advantages of Beamforming are –
  • Microphone array Beamforming provides high quality and intelligible signal from desired source location while attenuating the interfering noise.
  • Beamforming avoids having head-mounted or desk-stand microphones locally per speaker and also avoids any physical movement by both speaker and microphones.
  • Compared to single Directional microphone, microphone array Beamforming enables us to locate and track signal source.
The above advantages render Beamforming very helpful in the field of underwater acoustics, ultrasound diagnostics, seismology, radio communications, etc. However in this particular article, we will talk about Speech enhancement using microphone array beamforming.
Beamformer picking signals of interest
Figure 2: Beamformer picking signals of interest. Noise sources like projector, windows, etc. are not picked by Beamformer
In case of microphone array, there is interference between signals captured by each microphone. The microphones are placed in such a way that the resultant pattern from all the microphones allows signal energy from a particular direction. This is achieved by combining signals from one direction to undergo constructive interference and destructive interference for all other directions. This is the fundamental principle of Beamforming. Needless to say, the Beamforming combines the effect of microphone array to simulate a directional microphone whose directivity pattern can be steered dynamically based on desired signal source.
There are some products where multiple cardioid microphones are used to pick signals from different desired directions. These also provide an option to tune the directions from which we can suppress the signals, like door, window or projector, which are possible noise sources. More details here.
Following are 2 major types of Beamforming techniques –
  1. Fixed Beamformers
  2. Adaptive Beamformers
Fixed Beamformers – is the category of Beamforming techniques where signal source and noise source location is fixed with respect to microphone array. Delay-and-Sum, Filter-and-Sum, Weighted-Sum are some of the examples of Fixed Beamformers.
Adaptive Beamformers – is the category of Beamforming techniques where signal source and noise source can be moving. Making this technique useful for many applications where the Beamformer needs to adapt itself and steer in the direction of signal of interest and attenuate noise from other directions. Generalised Sidelobe Canceller (GSC), Linearly Constrained Minimum Variance (LCMV, Frost) and In situ Calibrated Microphone Array (ICMA) are some of the examples of Adaptive Beamformers.
The optimal Beamforming technique to use in a particular application largely depends on the spectral content of source signal and background noise. However in practice, this assumed information can change and in some cases the information is not available. So there is always a trade-off based on the Beamforming technique used.

Applications of Beamforming

Applications of Beamforming are immense. Beamforming was initially used in military radar applications. Beamforming is now also used to capture high quality and intelligible audio signal with advances in microprocessors and digital signal processing. Beamformed signal can be used in Speech processing, Hearing aid devices, Sonar, Radar, Biomedical, Direction of signal arrival detection and aiming a camera or set of cameras towards the speakers in a video conference and Acoustic surveillance applications.
PathPartner has ported, tuned and optimized Beamforming for real-time applications like Conference speakerphones with microphone array. Challenges are immense in case of conference speakerphones in acoustic environment. Apart from microphones placement in array; wide-band speech, echo, multiple speakers, moving signal source localization and tracking are some of the major challenges to be considered. Since Beamforming focuses on a particular direction of interest, the Acoustic Echo Canceller (AEC) part of the speakerphone benefits by getting majorly signal of interest, excluding the echo paths out of the Beamforming direction as shown in figure 3.
Application of Beamforming
Figure 3: Application of Beamforming along with AEC in Conference Speakerphones
Noise creeps in from all direction and can be easily removed from the signal of interest from a particular direction, thereby increasing the Signal-to-Noise ratio (SNR). Voice Activity Detector (VAD) is one of the modules which benefits from the increased SNR to a large extent.
We have designed and implemented all modules in the speech processing chain as part of the conference speakerphone. Beamformer, Acoustic Echo Canceller, Double Talk Detector, Noise Reduction, Voice Activity Detector, Non-Linear Processing, Comfort Noise Generator adapting to background noise and Automatic Gain Controller are some of the modules readily available working in real-time on Sharc DSP, Hexagon DSP, ARM and x86.
We have expertise in wide variety of other platforms for porting and optimization. Please contact sales@pathpartnertech.com for queries.
Pathpartner author Mahantesh
Mahantesh Belakhindi
Technical Lead
Pathpartner author Sravan
Sravan Kumar J
Software Engineer

Embedded Vision Application – Design approach for real time classifiers

Overview of classification technique

Object detection/classification is a supervised learning process in machine vision to recognize patterns or objects from data or image. It is a major component in Advanced Driver Assistance Systems (ADAS) as it is used commonly to detect pedestrians, vehicles, traffic signs etc.
Offline classifier training process fetches sets of selected data/images containing objects of interest, extract features out of this input and maps them to corresponding labelled classes to generate a classification model. Real time inputs are categorized based on the pre-trained classification model in an online process which finally decides whether the object is present or not.
Feature extraction uses many techniques like Histogram of Oriented Gradients (HOG), Local Binary Pattern (LBP) features extracted using integral image etc. Classifier uses sliding window, raster scanning or dense scanning approach to operate on an extracted feature image. Multiple image pyramids are used (as shown in part A of Figure 1) to detect objects with different sizes.

Computational complexity of classification for real-time applications

Dense pyramid image structures with 30-40 scales are also being used for getting high detection accuracy where each pyramid scale can have single or multiple levels of features depending upon feature extraction method used. Classifiers like AdaBoost, use random data fetching from various data points located in a pyramid image scale. Double or single precision floating points are used for higher accuracy requirements with computationally intensive operations. Also, significantly higher number of control codes are used as part of classification process at various levels. These computational complexities make classifier a complex module to be designed efficiently to achieve real-time performance in critical embedded systems, such as those used in ADAS applications.
Consider a typical classification technique such as AdaBoost (adaptive boosting algorithm) which does not use all the features extracted from sliding window. That makes it computationally less expensive compared to a classifier like SVM which uses all the features extracted from sliding windows in a pyramid image scale. Features are of fixed length in most of the feature extraction techniques like HOG, Gradient images, LBP etc. In case of HOG, features contain many levels of orientation bins. So each pyramid image scale can have multiple levels of orientation bins and these levels can be computed in any order as shown in part B and C of Figure 1.
pyramid image
Figure 1: Pyramid image scales with multiple orientation bins levels

Design constraints for porting classifier to ADAS SoC

Object classification is generally categorized as a high level vision processing use case as it operates on extracted features generated by low and mid-level vision processing. It requires more control codes by nature as it involves comparison process at various levels. Also, as mentioned earlier it involves precision at double/float level. These computational characteristics depict classification as a problem for DSP rather than a problem for vector processors which has more parallel data processing power or SIMD operations.
Typical ADAS processors, such as Texas Instruments’ TDA2x/TDA3x SoC, incorporate multiple engines/processors targeted for high, mid and low level vision processing. TMS320C66x DSP in TDA2x SoC has fixed and floating-point operation support, with maximum 8-way VLIW to issue up to 8 new operations every cycle, with SIMD operations for fixed point and fully-pipelined instructions. It has support for up to 32, 8-bit or 16-bit multiplies per cycle, up to eight, 32-bit multiplies per cycle. EVE processor of TDA2x has 512-bit Vector Coprocessor (VCOP) with built-in mechanisms and vision-specialized instructions for concurrent, low-overhead processing. There are three parallel flat memory interfaces each with 256-bit load-store memory bandwidth providing a combined 768-bit wide memory bandwidth. Efficient management of load/store bandwidth, internal memory, software pipeline, integer precisions are major design constraints for achieving maximum throughput from these processors.
Classifier framework can be redesigned/modified to adapt to vector processing requirements thereby processing more data in one instruction or achieving more SIMD operations.

Addressing major design constraints in porting classifier to ADAS SoC

Load/Store bandwidth management

Each pyramid scale can be rearranged to meet limited internal memory. Functional modules and regions can be selected and arranged appropriately to limit DDR load/store bandwidth to the required level.

Efficient utilization of limited internal memory and cache

Image can be processed on optimum sizes and memory requirements should fit into hardware buffers for efficient utilization of memory and computation resources.

Software pipeline design for achieving maximum throughput

Some of the techniques that can be used to achieve maximum throughput from software pipelining are mentioned below.
  • Loop structure and its nested levels should fit into the hardware loop buffer requirements. For example C66x DSP in TDA3x has restrictions over its SPLOOP buffer such as initiation interval should be
  • Unaligned memory loads/store should be avoided as it is computationally expensive and its computation cycles are twice compared to aligned memory loads/stores in most of the cases.
  • Data can be arranged or compiler directives and options can be set to get maximum SIMD load, store and compute operations.
  • Double precision operations can be converted to floating point or fixed point representations but retraining of offline classifier should be done upon these precision changes.
  • Inner most loop can be made simple without much control codes to avoid register spilling and register pressure issues.
  • Division operations can be avoided with corresponding table loop up multiplication or inverse multiplication operations.

Conclusions

Classification for real-time embedded vision applications is a difficult computational problem due to its dense data processing requirements, floating point precision requirements, multilevel control codes and data fetching requirements. These computational complexities involved in classifier design limits its vector processing power significantly. But classifier framework on a target platform can be redesigned/modified to leverage platform architecture vector processing capability by efficiently utilizing techniques such as load/store bandwidth management, internal memory and cache management and software pipeline design.

References:

TDA2X, A SOC OPTIMIZED FOR ADVANCED DRIVER ASSISTANCE SYSTEMS Dr. Jagadeesh Sankaran, Senior Member Technical Staff, Texas Instruments Incorporated. Dr. Nikolic Zoran, Member Group Technical Staff, Texas Instruments Incorporated. 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP)
Pathpartner author Sudheesh
Sudheesh TV
Technical Lead
Pathpartner author Anshuman
Anshuman S Gauriar
Technical Lead

Unconventional way of speeding up of Linux Boot Time

Fast boot is a requirement on most embedded devices. For good user experience it is important that a hand-held camera or a phone becomes usable seconds after switching it on. Apart from user experience, fastboot time also becomes a non-negotiable hard requirement for some applications. For example, a network camera or a car rear view camera must be accessible as soon as possible after power up.
A brief introduction to Linux boot process:
The boot time for a device consists of:
  • Time taken by bootloader
  • Time taken by Linux kernel initialization up to the launch of first user space process
  • Time required for user space initialization, including launching all required daemons, running all Linux init scripts
Boot loader and Linux kernel init time can be brought down to less than 1 second on modern embedded platforms. Some standard techniques for this have been documented here.
In most boot time optimization problems, more than 80% of the boot time is taken by user space. Optimization of user space on the other hand cannot be standardized, because the user space varies from application to application.
The first user space process that runs on Linux startup is called ‘init’. The init process should take care of all user space initialization required to boot up the system. This includes:
  • Mounting all filesystems
  • Creating device nodes
  • Starting all daemons
  • Performing other functions specific to the use case, like loading of firmware, starting of userspace programs
Reducing user space boot time therefore means optimizing this initialization process. The standard init process for Linux systems is Sysvinit (or busybox init for userspace systems)

Sysvinit:

Both Sysvinit and busybox init read /etc/inittab to find out what needs to be done at boot up. The first line of the init tab file defines the default ‘run level’ of the system. The default process executed by sysvinit is /etc/init.d/rcS, which in turn calls /etc/init.d/rc . The ‘rc’ script basically executes all scripts in /etc/init.d/rcN folder in order (where N is the run level).
Figure2
Figure 1: sysvinit init scripts
 
Figure1
Figure 2 : inittab file
 
The shortcomings of Linux Sysvinit are therefore apparent:
  • The order and dependencies of init scripts must be carefully assigned. As a rule, S01 will be executed before S02 and so on. Listing out the services and enforcing dependency has to be done manually, making the process tedious and error prone.
  • The init.d scripts are run in serial order. This leaves CPU resources underutilized and increases the system boot time

Systemd:

Linux Systemd is an alternative to Sysvinit. Systemd replaces the concept of runlevels in Sysvinit with ‘targets’. A target is a set of Systemd ‘units’. The Systemd units can be roughly understood as ‘services’. Each service is a file which consists of statements that define its behavior, for example, the process it starts, its dependencies, and many other parameters. The dependencies between services can be enforced by means of the ‘Wants’ and ‘Required’ sections. All the services that a service depends on should be listed under ‘Wants’ (good to have) or ‘Requires’ (necessary to have).
Figure3
Figure 3: example Systemd service
 
Systemd has a set of pre-defined targets that get started at system boot. The services that are need at boot up should be linked under the directory ‘/etc/systemd/system/default.target.wants’.
Systemd starts by looking at the basic.target and launches all services required for that target, following the dependency chain. All services required by a particular service are started before that service is started. All services that are not dependent on each other are started in parallel. Thus, Systemd achieves aggressive parallelization during boot up.
Figure4
Figure 4: Adding custom Systemd service to boot target
 
Figure5
Figure 5: Complete list of systemd services and dependencies on a system
 
Systemd reduces boot time in two ways:
  • By way of correct configuration of dependencies and targets, it is possible to do only required init for boot, leaving other activities for after boot. For example, instead of mounting all file systems, we can configure the boot target to be dependent on only root file system. In this way the boot time target will start without waiting for other filesystem mounts. Since filesystem mounting usually takes a long time, this will speed up the boot time. This way, it is very easy to control what gets started at boot up, and also track dependencies in a centralized way.
  • Systemd launches services in parallel unless a serialized dependency is enforced. This speeds up boot process significantly, leading to better CPU utilization during boot.
Thus Systemd optimizes boot time, by allowing fine grained and easy control over dependencies, and then by starting services in parallel. It comes with additional advantages of fine control of service dependencies, cleanly handling service behaviour in case of failure (restart vs fail), handles logging and system watchdog.
Systemd also comes with tools like Systemd-blame and Systemd-analyze that give a pictorial representation of how services were started. These tools make it quite convenient to know what is happening and optimize things accordingly.
It is easy to bring down user space boot time to less than 4 or 5 seconds on an embedded device by use of Systemd. Further optimizations can be squeezed out on a case by case basis. We have achieved a total boot time of 3 seconds on ARM running at 1 GHz,
Systemd is available for download here : https://www.freedesktop.org/software/systemd/ and can be cross-compiled for embedded platforms.
This article provides methodologies for improving boot time in PathPartner solutions which includes complete boot time optimization (bootloader, kernel parameters and userspace) on all embedded platforms.
Pathpartner author Komal
Komal Jaysukh Padia
Technical Lead

Blind Source Separation for Cocktail Party Problem

Blind Source Separation – BSS, as the name suggests, aims to extract original unknown source signals from the mixed ones. This is done by calculating an approximate mixing function using only the available observed mixed signals. “Blind” here means that the mixing function of signals which are recorded by microphones is unknown. BSS techniques do not require any prior knowledge about the mixing function or source signals and do not require any training data.
BSS block diagram
Figure 1: BSS block diagram
To understand BSS further, let us think of a scenario where a person is at a party. There are various disturbances like loud music, people screaming, and a lot of hustle-bustle. Yet, he/she is able to have a conversation with another person by filtering out the unnecessary signals and noise. In similar way, BSS provides a solution by separating and extracting the desired signals from a mixture of unknown signals, which is quiet similar in function to the human ear. This is what we term as “Cocktail Party Effect”.
Cocktail party scene
Image 1: Cocktail party scene
Signal un-mixing using BSS
Figure 2: Signal un-mixing using BSS
It all started around early 80s during which researchers formulated BSS in the framework of neural modelling. It was later adopted in digital signal processing and communications. BSS was initially developed and experimented for linear mixtures of signals and later it evolved for non-linear and convolutive mixtures.
The underlying principle for BSS is based on linear noise-free systems theory according to which, a system with multiple inputs (speech sources) and multiple outputs (microphones or sensors) can be inverted under reasonable assumptions with appropriately chosen filters. (Say convolutive BSS filters in case of convolutive mixtures)
Typical BSS process flow
Figure 3: Typical BSS process flow
As shown in above figure, BSS mainly involves 2 steps.
  • System Identification – where the filter coefficients of the mixing process are calculated, also called Mixing matrix factorization.
  • Separation or Un-mixing – where the sources are separated by filtering using coefficients calculated in step one.
Another consideration for BSS is to have at least as many number of microphones, compared to the number of signal sources under study for separation. If the number of microphones used is less than the number of signal sources, then separation task becomes difficult, if not possible. The paper “Single Microphone Source Separation Using High Resolution Signal Reconstruction” by Trausti Kristjansson, Hagai Attias & John Hershey, offers one such solution.
Blind source separation can exploit linear, temporal, spatial or sparsity properties of signal sources. Based on such properties, we have different approaches for BSS –
  • Principal Component Analysis
  • Independent Component Analysis
  • Spatio-Temporal Analysis
  • Sparse Component Analysis
Many such algorithms have been developed to solve BSS problem by assuming some properties of signal sources or mixing process, suitable for particular applications. When such assumptions are made, we call it “Semi-Blind Source Separation”.

Applications of BSS

BSS can be used for enhancing noisy speech in real world environments and the applications are not just limited to speech/audio processing but also used for image, astronomical, satellite and biomedical signal analysis. Double-talk (DT) issue in Acoustic Echo Cancellation (AEC) can be addressed using BSS algorithm, without using double-talk detector or step size controller.
BSS is also used in microphone array processing (multiple microphones), making it useful in diverse applications.

Pathpartner’s Expertise

– We have in-house BSS implementation for convolutive signal mixtures which is currently integrated with speech processing algorithms. The load on speech processing algorithms is eased as it receives only the signal of interest and hence better quality speech processing is achieved.
Pathpartner author Mahantesh
Mahantesh Belakhindi
Technical Lead

How to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support

This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
If your Real time Image processing applications like Driver Monitoring system on SoC FPGA’s are dependent on Open CV, you have to develop Open CV build environment on the target board. This blog will guide you through the steps to build Linux OS with OpenCV & Camera Driver support for Altera SoC FPGA.
In order to start building the first Linux distribution for the Altera platform, you must first install the necessary libraries and packages. Follow the Initialization steps for setting up the PC host
The required packages to be installed for Ubuntu 12.04 are
$ sudo apt-get update $ sudo apt-get upgrade $ sudo apt-get install sed wget cvs subversion git-core coreutils unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk python-pysqlite2 diffstat help2man make gcc build-essential g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev mercurial autoconf automake groff libtool xterm *

IMPORTANT:

Please note that * indicates that the command is one continuous line of text. Make sure that the command is in one line when you are pasting it.
If the host machine runs 64 bit version of the OS, then you need to install the following additional packages:
$ sudo apt-get install ia32-libs
On Ubuntu 12.04 you will also need to make /bin/sh point to bash instead of dash. You can accomplish this by running: and selecting ‘No’.
sudo dpkg-reconfigure dash
Alternatively you can run:
sudo ln -sf bash /bin/sh
However, this is not recommended, as it may get undone by Ubuntu software updates.

Angstrom Buildsystem for Altera SOC: (Linux OS)

Download the scripts needed to start building the Linux OS. You can download the scripts for the angstrom build system from https://github.com/altera-opensource/angstrom-socfpga/tree/angstrom-v2014.12-socfpga
Unzip the files to angstrom-socfpga folder:
$ unzip angstrom-socfpga-angstrom-v2014.12-socfpga.zip –d angstrom-socfpga $ cd angstrom-socfpga
These are the setup scripts for the Angstrom buildsystem. If you want to (re)build packages or images for Angstrom, this is the thing to use.
The Angstrom buildsystem uses various components from the Yocto Project, most importantly the Openembedded buildsystem, the bitbake task executor and various application/ BSP layers.
Navigate to the source folder, and comment the following line in the layer.txt file
$ cd source $ gedit layers.txt & meta-kd4,https://github.com/Angstrom-distribution/metakde. git,master,f45abfd4dd87b0132a2565499392d49f465d847 * $ cd .. (Navigate back to the head folder)
To configure the scripts and download the build metadata
$ MACHINE=socfpga_cyclone5 ./oebb.sh config socfpga_cyclone5
After the build metadata, you can download the meta-kde4 from the below link and place it in the sources folder, as this was earlier disabled in the layers.txt file http://layers.openembedded.org/layerindex/branch/master/layer/meta-kde4/
Source the environment file and use the below commands to start a build of the kernel/bootlaoder/rootfs:
$. /environment-angstrom $ MACHINE=cyclone5 bitbake virtual/kernel virtual/bootloader console-image
Depending on the type of machine used, this will take a few hours to build. After the build is completed the images can be found in:
Angstrom-socfpga/deploy/cyclone5/
After the build is completed, you will find the u-boot, dtb, rootfs and kernel image files in the above folder.

Adding the OPENCV Image to the rootfs:

To add OpenCV to the console image(rootfs) we need to modify the local.conf file in the conf folder:
$ cd ~/angstrom-socfpga/conf $ gedit local.conf &
In the local.conf file navigate to the bottom of the file and add the following lines and save the file:
IMAGE_INSTALL += “ opencv opencv-samples opencv-dev opencv-apps opencv-samples-dev opencv-static-dev “
Then build the console image again using the following command:
$ cd .. $ MACHINE=cyclone5 bitbake console-image
After the image is built the rootfs will contain all necessary OpenCV libs for development and running opencv based applications.

Enabling Camera Drivers in the Kernel :

The Linux Kernal v3.10 has an in built UCV camera driver which supports a large number of USB cameras. In order to enable it, you need to configure the kernel using the menuconfig option:
$ MACHINE=cyclone5 bitbake virtual/kernel –c menuconfig
The above command opens a config menu window. From the menuconfig window enable the following to enable UVC:
Device Drivers Multimedia support Media USB Adapters [*] USB Video Class [*] UVC input events device support [*]
Save and exit the config menu then execute the following command:
$ MACHINE=cyclone5 bitbake virtual/kernel
The new kernel will be build with the UVC camera drivers enabled and will be available in the /deploy/cyclone5 folder.
For the camera to work, the coherent pool must be set to 4M, this can be done as follows:-

U-Boot Environment Variables

Boot the board, pressing any key to stop at U-Boot console. The messages dispayed on the console will look similar to the following listing:
U-Boot SPL 2013.01.01 (Jan 31 2014 – 13:18:04) BOARD: Altera SOCFPGA Cyclone V Board SDRAM: Initializing MMR registers SDRAM: Calibrating PHY SEQ.C: Preparing to start memory calibration SEQ.C: CALIBRATION PASSED ALTERA DWMMC: 0   U-Boot 2013.01.01 (Nov 04 2013 – 23:53:26)   CPU : Altera SOCFPGA Platform BOARD: Altera SOCFPGA Cyclone V Board DRAM: 1 GiB MMC: ALTERA DWMMC: 0 In: serial Out: serial Err: serial Net: mii0 Warning: failed to set MAC address   Hit any key to stop autoboot: 0 SOCFPGA_CYCLONE5 #

Configuration of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #setenv bootargs console=ttyS0,115200 vmalloc=16M coherent_pool=4M root=${mmcroot} rw rootwait;bootz ${loadaddr} – ${fdtaddr} *

Save of U-Boot Environment Variables

SOCFPGA_CYCLONE5 #saveenv

Boot Kernel

SOCFPGA_CYCLONE5 #boot
Following all the above guidelines, you should be able to build Angstrom Linux Distribution for Altera SoC FPGA with OPEN CV & Camera Driver Support. This build was successfully implemented on Altera Cyclone V SoC.
Pathpartner author Idris Tarwala
Idris Iqbal Tarwala
Sr. VLSI Design Engineer

HEVC without 4K

The consumer electronics industry wants to increase the number of pixels on every device by 4 times. Without HEVC this will quadruple the storage requirements for video content and clog the already limited network bandwidths. To bring 4K or UHD to the consumer, HEVC standard is essential as it can cut these requirements at least by half. There is no doubt about it. But what the consumer electronics industry wants might not essentially be what the consumer wants. Case in point is 3D TVs, which failed to take off due to the lack of interest from the consumers and original content creators. The same thing might happen to 4K. If the 4K/UHD devices fail to take off, where will HEVC be?
Here, we make a case for HEVC.
HEVC without 4k
 
According to the data released by CISCO, video will be the highest consumer of network bandwidth in the years to come and video has the highest growth rate across all segments. With HEVC based solutions these over clogged networks can take a breather.
Let’s look at how deploying HEVC solutions is advantageous to the major segments of the video industry.

1. Broadcast:

Broadcast industry is a juggernaut which moves very slowly. They will have the biggest advantages in cost saving and consumer satisfaction by adopting HEVC. But, they have to make the highest initial investment as well. There are claims that UHD alone does not add much to the visual experience at a reasonable TV viewing distance for a 40-50 inch TV. The industry is pushing for additional video enhancement tools such as Higher Dynamic Range, more color information and better color representation (BT 2020). UHD support in broadcast may not be feasible without upgrading the infrastructure and with the advantage that HEVC can provide to UHD video including it in the infrastructure upgrade is the optimal choice. But if UHD fails to attract the consumer then what will happen to HEVC? Without UHD broadcast becoming a reality, introduction of HEVC into broadcast infrastructure can be delayed heavily. On the other hand contribution encoding can be heavily benefited from HEVC with reasonable change in infrastructure. But whether broadcast companies adapt to HEVC just for contribution encoding without the support of UHD depends purely on cost of adaption and cost saving by the adaption.

2. Video surveillance:

Surveillance applications are increasing each day and now there is an added overhead for backups on the cloud. The advantage with video surveillance applications is that it does not need backward compatibility. Hence HEVC is an ideal solution for the industry to cut the storage costs for the current systems or to keep it at the same level for new generation of systems which can store more video surveillance data at higher resolutions. ASIC developers are already developing HEVC encoders and decoders and it’s just a matter of time video surveillance systems based on HEVC hit the market. But upgrading current video surveillance systems to support HEVC may not be feasible without the hard struggle of making legacy hardware support HEVC.

3. Video Conference:

Video conference applications can be a tricky situation as it needs to be highly interoperable and it needs backward compatibility with the existing systems. Professional video conference systems might have to support both HEVC as well as earlier codecs for working with already available systems. On the other hand, general video conferencing solutions such as gtalk or Skype would have a problem of licensing HEVC. As none of the current day browsers or Operating systems (Except Windows 10 which most probably has just the decoder) has announced support for HEVC. But advantage that HEVC can bring into video conferencing application can be magnificent. Irrespective of advancement in bandwidth availability and with the introduction of high speed 3G and 4G services, quality of video conferencing experience has remained poor. This can be massively improved with the help of HEVC, which has the potential to enable HD video calling on a 3G network. With or without the help of UHD, at least the professional video conferencing systems will adapt HEVC unless another codec (likes of VP9, Daala) promises better advantages.

4. Storage, streaming and archiving:

Advantage of upgrading archived database to use HEVC needs no explanation. Imagine the number of petabytes of memory that can be saved if all archived videos are converted to HEVC. OTT players like Netflix are already on the process of upgrading to HEVC as it will help them to reduce the burden on ISP providers. And it will also help in reducing storage and transmission cost in video streaming application. Converting such huge data base from one format to another will not be easy. OTT and video streaming applications would need scalable HEVC where each video need to be encoded at different resolutions and different data rates. This would need multi instance encoders continuously running to encode these videos at very high quality. However cost saving in these application by adapting to HEVC is very huge and upgrading to HEVC in storage space will become inevitable.

5. End consumer:

The end consumer gets exposed to HEVC at different levels:
Decoders in the latest gadgets viz. TV, mobile phone, tablet, gaming console and web browsers. Encoders which are built in wherever there is a camera, be it in video calling on mobile phone, tablet or laptop and video recording in mobile phone or standalone camera.
It is difficult to convince the less tech savvy end consumer about the advantages of HEVC, but they are a huge market. In the US it costs around 1$ for 10 Mega Bytes of data. HEVC can indeed help the consumers with higher video quality for the same cost or half the cost for the current video quality. Since HEVC is important to the consumers, almost all chip manufacturers are supporting HEVC or it is in their road map. Even there are consumer electronic products currently in the market with HEVC encoders built in. Definitely HEVC support will be a differentiating factor for the end consumer looking for a new gadget.
Deploying HEVC based solutions will definitely yield gains in the long term after the profits due to the reduction in bandwidth overtake the loss of the initial investments. This is true with each of the segments which we have discussed. With 4K or UHD this initial investment can be merged to higher quality offerings and the costs could be offset. But without any change in the resolution or any other feature, the returns on HEVC investments are definitely high. When the entire video industry adopts the newer and better codec standard HEVC, the gains will multiply.
Pathpartner author Prashanth
Prashanth NS
Technical Lead
Pathpartner author Shashikantha
Shashikantha Srinivas
Technical Lead

Future trends in IVI solutions

In the immediate future we shall witness the next big phase of technology convergence of Automotive Industry with Information technology around epicenter of ‘Connectivity’. There are several reasons for this revolutionary fusion. The first one being: exponential rise in consumer demand. Connectivity on the road is not a luxury anymore for our internet savvy generation; who would be 40% of new automobile users. They are expecting more than just in vehicle GPS navigation. Some developed countries are mandating potential safety features like DSRC on Wi-Fi, 360 degree NLOS, V2V exchange info etc., to reduce the number of accidents and casualties. Inclusion of these safety features on the dashboard is another reason why IVI systems and solutions are the current hottest sell in automotive industry. With these triggering points, automobile manufacturers are now bridging the gap between non IVI technologies and existing dashboards solutions.
The current dashboard solutions are provided by software giants Apple and Google who are very well ahead in providing SDKs for IVI solutions. A few other companies also demonstrated dashboards in CES 2015 with IVI solutions, but CarPlay and Android Auto have got the world’s attention with their adaptability, portability and expandability for various car market segments. Market experts are predicting the key IVI solutions would definitely be Android Auto and CarPlay. These current IVI solutions are only the beginning of a fast evolving technology.
The experts are forecasting the evolution in IVI features is based on maturity of architecture. In primary phase, Smartphone attached dashboard models using connectivity models like USB and Bluetooth continue to be widely used. By 2017, around 60% vehicles are expected to get connected by smartphones. The reuse of smartphone computation power and mobile integrated technologies like telephony, Multimedia, GPS, storage, power management and voice based systems would be in focus. Following next is the progression in direction of connecting beyond car. The conjunction with IoT and cloud technologies would lead involvement of ecosystems like OEM services, HealthCare, education, smart energy systems. Further into future, external control and integration of car ecosystem would be possible including On Board Diagnostics (OBD) systems, usage of SoC for smarter, reliable and better performing features on dashboard, augmented reality and many more options.
The current IVI systems’ architecture is designed to be flexible in adapting future upgrades (such as changing your Mobile phone model, upgrade of OS etc.). IVI solution core can be broadly divided into three categories: Hardware, underlying Operating System and User experience enhancing applications. The architecture mainly constitutes of horizontal and vertical layers.
The vertical layer is made of:
  • Proprietary protocols.
  • Multimedia rendering solutions.
  • Core stack
  • Security and UI
Whereas the Horizontal layer includes:
  • Hardware
  • OS
  • Security Module
Proprietary Portable Protocols are rudiments of IVI connectivity modules and the UI development of IVI head units can be deployed with core engines of Java, Qt, and Gtk. For user connectivity and a better User experience Wi-Fi, LBTE, advance HMI, OBD tools and protocols help, which are still under development.
From a car manufacturer’s point of view centralized upgradable dashboard with all IVI solutions and distinct non IVI technologies integrated would be the prime factor in choosing a car. Here the middleware of dash board needs to play crucial support, integration for multiple IVI solutions providing device independence experience. The flexibility in amending the upcoming IVI technology like data interfaces for application developers are also deciding factors.
Although IVI core solutions are fundamental differentiating factors for end user, car OEMs need satisfactory single roof vendor owning expertise in various IVI, non IVI technologies including system architecture, OS development, protocol, hardware acquaintance and applications. Pathpartner, a product engineering service provider for Automotive Embedded Systems, embraces all these requirements very well. PathPartner with visionary leadership has already crossed the threshold into Automotive Infotainment industry, with a strong business model, which comprise of development activities like porting on multiple platforms such as Freescale, Texas Instruments on infotainment systems based on Android, Linux, and QNX, customized middleware for Android Auto, MirrorLink and CarPlay.
We at PathPartner, with a dedicated team of engineers equipped with niche expertise in embedded products, successfully delivered the certified applications from platforms such as iOS and Windows and continue to work on other IVI solutions for renowned OEM’s.
Pathpartner author Kaustubh
Kaustubh D Joshi

HEVC Compression Gain Analysis

HEVC, the future video codec needs no introduction as it is slowly being injected into multimedia market. Irrespective of very high execution complexity compared to its earlier video standards, there are few software implementations proven to be real time on multi-core architectures. While OTT players like Netflix are already on the process of migrating to HEVC, 4k TV sales are increasing rapidly as HEVC promises to make UHD resolution a reality. ASIC HEVC encoders and decoders are being developed by major SOC vendors that will enable HEVC in hand held battery operated devices in the near future. All these developments are motivated by one major claim

‘HEVC achieves 50% better coding efficiency compared to its predecessor AVC’.

HEVC compression gain
Initial experimental results justifies this claim and shows on an average 50% improvement using desktop encoders and approximately 35% improvement using real time encoders compared to AVC. This improvement is mainly achieved by set of new tools and modification introduced in HEVC but, is not limited to higher block sizes, higher transform sizes, extended intra modes and SAO filtering. You need to note that, there is 50% improvement on an average when results for a set of input contents are accounted. This does not ensure half the bit rate with same quality for every input content. Now, we will tear down this ‘average’ part of the claim and discuss in which situation HEVC is more efficient considering four main factors.
  1. Resolution
  2. Bit rate
  3. Content type
  4. Delay configuration

1. Resolution:

HEVC is expected to enable higher resolution video and to substantiate this, results prove higher coding efficiency gains at higher resolutions. At 4k resolutions, compression gains can be more than 50%. This makes it possible to encode 4k resolution contents with 30 frames per second at bit rates of 10 to 15 Mbps with decent quality. Reason behind such behavior is a feature/tool introduced in HEVC that contributes more than half the coding efficiency gains i.e. higher coding block sizes. At high resolution, larger block sizes leads to better compression as neighboring pixels have higher correlation among them. It is observed that 1080p sequences have on an average 3-4% better compression gains compared to their 720p counterparts.

2. Bitrate:

Encoder results indicate better compression gains at lower and mid-range bit rates compared to very high bit rates. At low QPs, transform coefficients contribute more than 80% of the bits. Having larger coding unit helps in saving bits used for MVD and other block header coding. Bigger transform blocks will have better compression gains as it has large number of coefficients for energy compaction. But at high bit rate, header bits takes a small percentage of the bit stream suppressing the gains from higher block sizes. And low QPs reduce the number of non-zero coefficients which limit the gains possible from high transform blocks.
We have observed that, for ParkJoy sequence BD-rate gains were 8% better at QP range of 32-40 compared to QP range of 24 – 32. Similar behavior has been found in most of the sequences.

3. Content type:

Video frames with uniform content has shown better compression gains as it aids higher block sizes. Encoder tend to select lower block sizes for frames with very high activity which makes the block division similar to AVC, reducing efficiency of HEVC. Lower correlation between pixels limits compression gains produced by larger transform blocks. Streams such as BlueSky having significant uniform content in a frame produced 10% better gains compared to high activity streams like ParkJoy.
On the other hand, videos with stationary contents or low motion videos produced better gains as higher block sizes are chosen in inter frames for such contents.

4 Delay Configuration:

Video encoders can be configured to use different GOP structures based on application needs. Here, we are going to analyze compression gains of different delay configurations w.r.t AVC. Firstly, all intra frame cases (used for low delay, high quality, error robust broadcast application) produce nearly 30 – 35% gain deviating from 50% claims of HEVC. Higher block sizes are not very effective in intra frames and gains produced with the help of other new tools such as additional intra modes, SAO etc., limits the average gain to 35%. Hence HEVC-Intra will not be 50% better than AVC-Intra. It is also observed that, Low delay IPP GOP configuration produces slightly better gains (Approximately 3-4% on an average) compared to Random Access GOP configuration with B frames. This could be purely due to the implementation differences in HM reference software.
Thus 50% gains cannot be achieved for all video contents but many encoder implementations have proven nearly 50% or more average gains compared to AVC. Such compression efficiency can have a major impact on broadcast and OTT applications where half the bandwidth consumption can be reduced, thus half the costs. Though other emerging video compression standards such as VPx, Daala claim similar gains, they are yet to be proven. It would be interesting to see how VPx, Daala can change the future of HEVC. But right now one thing is sure, HEVC is the future video codec and it is here!
Pathpartner author Prashanth
Prashant N S
Technical Lead