Wearable Sensors for Affective Computing: A Comprehensive Review
The field of affective computing, envisioned by Picard in 1997, aims to empower computational systems with the ability to recognize, interpret, and respond to human emotions. Early studies relied heavily on behavioral cues such as facial expressions and voice tone to model affective states. However, the landscape of affective computing has evolved significantly, with wearable devices now playing a pivotal role.
In a recent review published in Intelligent Sports and Health, researchers from the Department of Psychological and Cognitive Sciences at Tsinghua University provide an in-depth analysis of the current state of data processing, multimodal fusion strategies, and model architectures in affective computing. They also delve into the development status and challenges of deep learning methods within this domain.
Dan Zhang, a co-author of the review, highlights the importance of both public datasets and self-collected data in affective computing studies. These datasets exhibit high consistency in terms of modality, device, signal length, number of subjects, and label acquisition. For instance, they both utilize common physiological modalities like EDA (electrodermal activity) and HR (heart rate) and primarily rely on commercially available wearable devices, such as the Empatica E4, for measurements.
Interestingly, most labels are obtained through self-assessment tools like the SAM scales, and the number of subjects typically ranges from tens of people. Zhang further explains that some self-collected data incorporate sports-related scenarios, such as walking simulations, which record multimodal signals (including EDA, ACC, HR, etc.) to capture affective changes during physical activity. These data hold immense potential for applications in sports, such as monitoring emotional fatigue during training or assessing athletes' emotional regulation under competitive pressure.
The review also explores the concept of multimodal fusion in affective computing, which can be implemented at various stages of the modeling pipeline, including feature-level, model-level, and decision-level fusion. Feature-level fusion offers a straightforward and user-friendly approach, model-level fusion captures cross-modal interactions within the network structure, and decision-level fusion enables independent processing of each modality. The selection of fusion strategies depends on factors such as temporal characteristics, complementarity, reliability, and the complexity of the classification task.
Furthermore, the authors emphasize the role of deep learning methods in extracting and modeling feature representations from input data. These models include CNNs (convolutional neural networks) for local feature extraction, LSTMs (long short-term memory networks) for capturing temporal dependencies, and transformers for addressing long-range temporal dependencies through self-attention mechanisms.
This comprehensive review provides valuable insights into the current state and future directions of wearable sensors in affective computing, offering a wealth of information for researchers and practitioners in the field.