Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


In the new study, Apple taught an AI model to recognize hand gestures that were not part of its original training data set. Here are the details.
Apple has published a new study on its Machine Learning Research blog, called EMBridge: Improving Gesture Generalization from EMG Signals Using Multimodal Representation Learning. This study will be presented at the ICLR Conference 2026 in April.
In it, the researchers explain how they trained an AI model to recognize hand gestures, even when those specific gestures were not part of their original data set.
To achieve this, they developed EMBridge, “a cross-modal representation learning framework that bridges the modality gap between EMG and pose.”
EMG, or electromyography, measures the electrical activity generated by muscles during contraction. Its practical applications range from medical diagnosis and physiotherapy to the control of limb prostheses.
More recently (although this is definitely not a new area), it has been more widely explored in wearable devices and AR/VR systems.
Meta’s Ray-Ban Display glasses, for example, use EMG technology in the form of what Meta calls the Neural Band, a wrist-worn device that “interprets muscle signals to navigate the functions of the Meta Ray-Ban Display,” according to the company’s description.
In the Apple study, the EMG signals used for training were not detected by a wrist-worn device. Instead, the researchers used two sets of data:
- emg2pose: “(…) a large-scale open source EMG dataset containing 370 hours of sEMG and synchronized hand posture data from 193 consenting users, 29 different behavioral groups including a wide range of discrete and continuous hand movements, such as making a fist or counting to five. Hand posture labels are generated using a high-resolution motion capture system. The complete dataset contains over 80 million hand posture labels. postures and is similar in scale to larger computer vision equivalents. Each user completed four recording sessions per gesture category, each with a different EMG band location. Each session lasted between 45 and 120 s, during which users repeatedly performed a combination of 3 to 5 similar gestures or unconstrained free-form movements.
- Nina Pro DB2: “We use two NinaPro EMG data sets for a more comprehensive evaluation of EMBridge. Specifically, Ninapro DB2 is used for pre-training, which includes paired EMG posture data from 40 subjects. It contains 49 hand gestures (including basic finger flexions, functional grips, and combined movements) performed by 40 healthy subjects. EMG signals are recorded from 12 electrodes placed on the forearm at a frequency of 2 kHz sampling, along with hand kinematic data captured by a data glove. For subsequent gesture classification, we used NinaPro DB7, which contains data from 20 non-amputees collected with the same EMG device and gesture set as DB2.
All that said, it’s easy to see how Apple’s EMBridge could pave the way for a future Apple Watch model (or other wearables) to control devices like the Apple Vision Pro, Mac, iPhone, and other wearables, including its rumored upcoming smart glasses.
In practice, from new interaction methods to accessibility improvements, the possibilities could be significant.
Of course, the study itself obviously doesn’t mention any specific Apple products or apps, but it does state the following:
A possible practical application of our framework is human-laptop interaction. In
In scenarios such as VR/AR and prosthetic control applications, a wrist-worn device must continuously infer hand gestures from EMG to drive a virtual avatar or robotic hand.
EMBridge was the way the researchers bridged the gap between real EMG muscle signals and structured hand posture data.
The model, trained using a cross-modal framework, was first pretrained with EMG and hand posture data separately.
The researchers then aligned the two representations so that the EMG encoder could learn from the pose encoder. This allowed EMBridge to learn to recognize gesture patterns from EMG signals.
Once this was done, they trained the system using masked pose reconstruction, hiding parts of the pose data and asking the model to reconstruct it using only the information extracted from the EMG signals.

The result, as the researchers explain:
“To the best of our knowledge, EMBridge is the first cross-modal representation learning framework that achieves zero-shot gesture classification from wearable EMG signals, showing potential toward real-world gesture recognition on wearable devices.”
To reduce training errors caused by similar gestures being treated as negative, the researchers taught the model to recognize when postures represent similar hand configurations, allowing it to generate smooth targets for those postures instead of treating them as unrelated.
This helped structure the model’s representation space, improving its ability to generalize gestures it had never seen before.

The authors evaluated EMBridge on two benchmarks, emg2pose and NinaPro, and found that it consistently outperformed existing methods, particularly in zero-shot (or never-before-seen) gesture recognition. Importantly, it did this with only 40% of the training data.

A major limitation noted in the paper is that the model is based on data sets containing EMG signals and synchronized hand posture data. This means that your training still relies on specialized data sets that can be difficult to collect.
Still, the study is interesting, particularly at a time when EMG-based device control appears to be on the rise.
For full technical details on EMBridge, including its Q-Former, MPRL and CASCLe components, follow this link.
FTC: We use automatic affiliate links that generate income. Further.