Optimized hand tracking and gesture recognition technology creates seamless interactions with digital environments such as Augmented Reality and Virtual Reality. The key focus of this technology is to optimize the end-user experience, and enhance the level of immersion, and intuitiveness in new virtual spaces for high-performance augmented reality and virtual reality use cases.
Previously on our Blog: Clay AIR SDK Latency, Accuracy and Performance
In our previous post we discussed KPIs of hand tracking and gesture recognition, highlighting the difference in end-to-end user perception and Clay AIR’s software development kit (SDK) key performance indicators (KPIs). We discussed these independently of other components in the stack and identified three gesture performance indicators that allow us to monitor and improve the UX in Augmented Reality and Virtual Reality with hand tracking and gesture recognition. The factors identified were perceived latency, perceived accuracy, and performance.
We also explored the integrated tech stack, a hardware and software ecosystem that includes cameras, displays, operating systems, computing boxes, and SDKs, where many KPIs can be measured independently, affecting the overall UX.
The Technology Stack’s Impact on Overall User Experience
In this post, we will take a deep dive into the technology stack in Augmented Reality and Virtual Reality and its impact on the overall user experience.
We will look at how other components in the tech stack will affect the end-to-end latency (perceived speed), perceived accuracy, and performance on Augmented Reality and Virtual Reality devices.
Clay AIR’s Software in the Tech Stack
First, let’s look at the process of what happens during an interaction with hand tracking and gesture recognition.
When a user interacts with a 3D object in AR or VR with our hand tracking and/or gesture recognition software, the following process occurs:
- First, Clay AIR’s machine learning model receives an input from the sensors each time the camera refreshes.
- Then, we process the input using machine learning and/or computer vision and the result is a data output. This data can also interact with other software in a seamless ecosystem.
The end-user’s experience depends on the optimization of all components such as hardware, software, and external conditions.
Hardware and Software Specs that Affect End-to-End Latency (Perceived Speed)
The software configurations that can affect end-to-end user experience KPIs that are independent from Clay AIR’s software include the following:
The Camera’s Frame Rate: the slower the camera’s capture capabilities, the higher the end-to-end latency. This means that when the camera has a low frame rate, it’s not able to keep up with a user’s movements as effectively.
The Image Signal Processor (ISP) Set Up: this varies depending on the device, and can affect Clay AIR’s performance. For example, blur or autofocus require more computation to process the image.
The Bandwidth Between Components: with tethered devices, cables or USB port bandwidth can affect end-to-end latency. Additionally, the bandwidth available between the camera, computing box and display can affect the end-to-end latency.
Stack Architecture: the computer vision stack (6DoF, SLAM, development platforms, etc) using the image sent from the ISP affects end-to-end latency as well as the sequence in which the data is processed through the stack.
Example: End-to-End latency Versus Clay AIR’s Machine Learning Model Latency
Let’s look at end-to-end latency which in this example comes to 99ms, versus Clay AIR’s machine learning model latency at12ms.
Our machine learning model can run at 12ms, faster than human perception, but when applied in the tech stack ecosystem, certain hardware and software components can slow down the process.
Clay AIR’s software refreshes at every frame, which means that if the camera runs at 30fps it slows the model to 33ms in latency.
Therefore, while our machine learning model has capabilities that are far superior to the majority of others in the same space, it is highly affected by frame rate, a factor that is determined by camera selection.
Physical Factors that Affect the Perceived End-to-End Accuracy
The hardware specifications that can affect end-to-end user experience that are independent of Clay AIR’s key performance indicators include the lens calibration, display, and width between the user’s eyes. These specs affect what the user sees on the Augmented Reality or Virtual Reality display.
Even if Clay AIR’s machine learning model performs to its greatest potential, the overlaid representation of the hand might not perfectly match the placement of the real hand, creating a perceived inaccuracy.
Additional factors that affect end-to-end accuracy also include light conditions, user errors and the device’s field of view (FoV).
How Clay AIR Mitigates the Impact of Hardware and Software Specs to Optimize End-to-End User Experience
Our software is hardware agnostic, and therefore our team has experience integrating our software into a wide range of devices with unique specifications, such as the consumer-focused Nreal Light and the enterprise-focused Lenovo ThinkReality A6. Each device maker we partner with presents us with new hardware, meaning we optimize our software to work with varying FoVs, computational requirements, camera positions and more.
Some of the ways that Clay AIR is able to mitigate the limitations of pre-existing hardware and software specifications include reducing the image resolution and therefore the processing speed and reducing the CPU/GPU consumption by primarily leveraging the DSP.
Additional options involve optimizing camera calibration, developing corrective patchings, consulting on hardware, and fixing optical and physical distortions, among further physical factor optimization methods.
If you would like to learn more about how Clay AIR can be integrated into your product don’t hesitate to reach out to our team here.