Abstract
This paper investigates the recognition of expressed emotion from speech and facial expressions for the speaker-dependent task. The experiments were performed to develop a baseline system for the audio-visual emotion classification, and to investigate different ways of combining the audio and visual information to achieve better emotion classification. The extracted features were composed of 106 audio and 240 visual features. The audio features consisted of pitch, energy, duration and MFCC features, whereas the visual features were related to positions of the 2D marker coordinates. The Plus l-Take Away r algorithm was used for feature selection based on the Mahalanobis distance, Bhattacharyya distance, and KL-divergence as selection criteria. The feature selection was followed by feature reduction using the PCA and LDA, and classification using the Gaussian classifier. Both unimodal and bimodal approaches were used for emotion classification. The audio-visual fusion was investigated at two different levels: featurelevel and decision-level. The emotion classification results comparable to human performance were achieved on the SAVEE database

Tariqullah Jan, Sanaul Haq, Asiya Jehangir, Muhammad Asif, Amjad Ali, Naveed Ahmad. (2015) Bimodal Human Emotion Classification in the Speaker-Dependent Scenario, , Volume 52, Issue 1.
  • Views 224
  • Downloads

Article Details

Volume
Issue
Type
Language