Hand tracking and gesture recognition technology is usually integrated into a complex stack— a hardware and software ecosystem that includes cameras, displays, operating systems, computing boxes, and SDKs — where many key gesture performance indicators (KPIs) can be measured that are interdependent, affecting the overall user experiences (UX).
The key point of this article is to highlight the difference in end-to-end user perceptions and the Clay AIR SDK KPI’s independently of other components of the stack.
With the intent of debunking the KPI definitions, we will discuss three gesture performance indicators that allow us to monitor and improve the UX in Augmented Reality and Virtual Reality hand tracking and gesture recognition: perceived speed, and perceived accuracy and performance.
“My virtual hand doesn’t move as quickly as my real hand”
This dynamic is defined by end-to-end latency, or the delay between the user’s action and the system’s response. It’s experienced as the delay from the real hand being interpreted by the onboard sensors and Clay AIR’s software, and the 3D rendering to the display of the glasses. The latency is calculated in frames per second (FPS) or in milliseconds (ms).
End-to-end latency is dependent on other elements of the stack as detailed above, which includes Clay AIR’s SDK latency.
There are two types of SDK latency that we consider in this measurement.
Clay AIR’s SDK latency: insertion latency and tracking latency
- First, Insertion Latency, which is the delay between when the user inserts their hand into the field of view of the device’s cameras, versus when the system shows the hand in the display of the VR or AR headset. This type of latency is calculated in validation frames— the number of frames that are congruent, with a default of 2 frames, per millisecond.
- Second, is Tracking Latency, which is the delta (timing measurement of the hardware and network responsiveness) between the display of the skeleton hand model over a users hand, and then that hand on the raw image.
In summary, Perceived Speed is key to the UX as it reflects a user’s perception that the system reacts in real time to their movement, and therefore relays their actions in the virtual reality or augmented reality experience, whether it be a in game, engineering visualisation experience or training program that requires the highest standard of interactivity.
“The gesture I perform is accurately recognized” or “I can interact with virtual objects”
When a user performs a gesture or interacts with content, this must be successfully interpreted and carried out in the Augmented Reality or Virtual Reality experience.
Clay AIR’s SDK Gesture Accuracy
Formula: (# gestures recognized successfully)/(# gestures performed in the FoV)
In the case of gesture recognition technology, this KPI is measured as the percentage of hands or gestures recognized out of the number of hands or gestures that are in the field of view of the device’s sensors.
Clay AIR’s SDK Hand Tracking Accuracy, or Tracking Success Rate
In the case of hand tracking technology, the KPI of perceived precision is hand tracking accuracy, which refers to the position, on a pixel by pixel basis, of 22 key points of the rendered model skeleton on the hand in the raw image. It’s the probability of the model to guess the position of these points of the hand in space.
Impact on the UX
Perceived precision is key to a user’s sense of immersion. If the gesture accuracy is low, the software will interpret the hand to be somewhere else in space than it actually is. The user may not be able to interact with virtual objects with virtual touch, leading to an uncomfortable user experience.
Clay AIR’s machine learning model gesture accuracy ranges between 94% and 99% on gestures such as victory, call, swipe, pinch and grab.
What does 99% gesture recognition accuracy mean from a user experience standpoint?
It means that if a user puts their hand in front of the success zone of a 30fps camera, after 3.3 seconds, the hand might not have been recognized during 1/30 second (one image), which is almost imperceptible to the human eye.
The reason for this is that a 30fps camera’s system refreshes every 33ms. As Clay AIR SDK refreshes at every frame, within 3.3 seconds, Clay AIR SDK has processed 100 input images.
A 99% gesture recognition accuracy means that of one out of 100 input images, one will not be interpreted successfully. The assumption here is that the user also performs the gestures clearly and in optimal conditions, in a well lit room, and in the success zone of the camera’s field of view.
“My battery runs low” / “My battery heats up”
Finally, the third KPI we will discuss for UX in Augmented Reality and Virtual Reality is performance, referring to the CPU (computer processing unit) and DSP (digital signal processor) load.
Hand tracking and gesture recognition is based on computer vision and powered by machine learning, making it necessary for the SDK to make use of the device’s power.
Optimizing the CPU/DSP load for a more comfortable UX
Optimizing the CPU/DSP load preserves the battery from overheating or draining too quickly, resulting in more comfort from a cooler device and extended battery life.
Clay AIR’s software minimizes its load, enabling it to coexist with other SDKs on the device that run on the same processing unit.
Additional Specs Affecting KPIs
Each of these KPIs are interdependent on the hardware specifications and software of the augmented reality or virtual reality device that is being used. In an upcoming article, we will talk about this interdependence and additional factors that affect UX.
If you have any questions you can also reach our team anytime here.