Back to our Insights
Back to our Insights

How we Approach Occlusion, Latency, FoV and Battery Consumption (2/2)


Thomas Amilien

As discussed in Part I, hand tracking and gesture recognition are some of the most advanced types of new HMI (human-machine interface) software, made possible by breakthrough technologies such as computer vision and machine learning.

This however, would not be the case without addressing the challenges that these types of technology frequently face, which include occlusion and field of view, accuracy, battery consumption, and latency. Here, we directly address these common challenges and discuss how we problem-solve for these situations.  

Occlusion and Field of View 

Occlusion happens when one object is blocking another from view. When using hand tracking and gesture recognition software, occlusion is a problem seen frequently, especially when running tracking through narrow FoV cameras. 

From interacting with kiosks to heads-up-displays (HUDs) in Augmented Reality or Virtual Reality, the range where engagements take place is within an area that is predetermined by device hardware, but often plays a large role in software performance and capability. Fortunately, we are able to run high fidelity, machine learning based tracking out of multiple types of cameras. This includes cameras with large fields of view, such as monochrome cameras where hands are less likely to occlude, enabling us to provide a solution with minimal occlusion.

Tracking Accuracy and Latency 

Hardware Accuracy vs. Software Accuracy

Accuracy has become a catch-all term and as we discuss accuracy, we want to draw a distinction between hardware accuracy (the lens) and software accuracy, that refers to the software responsible for hand tracking and gesture recognition. 

First, hardware accuracy refers to the ability of the camera sensor to identify an object’s positioning, and then accurately render it on a display. 

Then, software accuracy, in relation to hand tracking and gesture recognition, refers to the ability to recognize and identify a hand and gesture at each frame, and how precise the position of the model’s points on the annotated hand. As these models are based on machine learning algorithms, the software accuracy is measured in a percentage score. 

Performance Measurements 

The key performance indicators of hand tracking and gesture recognition machine learning models, such as accuracy, (a subset of KPIs referring to perceived precision and hand recognition success), latency (a subset of KPIs referring to perceived speed) and efficiency (CPU and GPU load), are measured independently from the device in which the solution is embedded. The machine learning model KPIs values will be affected by the hardware specifications such as: 

  • a low frame rate and bandwidth between the computing box and the camera can affect the perceived latency
  • inaccurate lens calibration can affect the perceived accuracy of the hand tracking

To address tracking accuracy and latency, we will next look at how to optimize both software and hardware accuracy for high performance. 

pose recognition sample test snapshot
Pose recognition test sample: A snapshot of 1 out of 100 tests Clay AIR runs to measure our gesture recognition ML model’s performance.
For each model, 100 tests are performed, which each consists of presenting the model 100 different gestures and backgrounds. The pose recognition rate measures the ability of the model to recognize the presented gesture.

How to Optimize Hand-Tracking Accuracy 

With real-time data input, our software is able to instantly adapt to the ever changing environment. On a technical level, this is managed with a scoring approach that is both analytical and machine learning based, applied to every incoming frame. For each frame the competing layers in the image are analysed with proprietary techniques that also make the software incredibly battery efficient. 

As a result, the highest performing hand tracking and gesture recognition technology is achieved with up to 99% accuracy on simple gestures. With a ready-to-go library of 40 pre-determined hand gestures, it is also possible for users to add entirely customisable controls that are optimized for a specific user experience. 


Finally, the speed at which hand tracking and gesture recognition is reactive to an individual’s input determines the end-user’s satisfaction with the product and its usefulness in real world applications. 

With1 frame as an app cycle, Clay AIR’s software is able to keep latency lower than the human eye can perceive: 16ms at 60 FPS, 8ms at 120 FPS. By refreshing at every frame, real-time tracking and analysis is performed that results in the most accurate performance of hand tracking and gesture recognition. 

Research Driven Innovation

Located in Bordeaux, France, Clay AIR’s R&D Center is where the company’s founding roots were grown, and where the ongoing development of Clay AIR’s hand tracking and gesture recognition technology is centered, to bring the most advanced solutions to the market.

As a software product, hand tracking and gesture recognition is easy to integrate, providing accessible cutting edge interactivity. If you would like to learn more about our products, ClayReality for Augmented and Virtual Reality, ClayDrive for automotive experience, and ClayControl for digital interfaces, our team is here to answer your questions. 



Bringing natural interaction to the virtual and augmented world.