Affective-interaction in computer games is a novel area with several new challenges, such as detecting players facial expressions robustly. Many of the existing facial expression datasets are composed of a set of posed face images not captured in a realistic affective-interaction setting. The contribution of this paper is an affective-interaction dataset captured while users were playing a game that reacted to their facial-expressions. This dataset was the result of a framework designed for gathering affective-interaction data and annotating this data with high-quality labels. The first part of the framework is a computer game  planned to elicit a particular facial expressions that directly control the game outcome. Thus, the game creates a true and engaging affective-interaction scenario where facial-expressions data were captured. The proposed dataset is composed of a series of sequential video frames where faces were detected while users interacted with a game with their facial expressions. The second part of the framework is a crowdsourcing process designed to ask annotators to identify the facial-expression present in a given face image. Each face image was annotated with a facial-expression: happy, anger, disgust, contempt, sad, fear, surprise, and neutral. We examined how the annotators performance was affected by multiple variables, e.g., reward, judgment limits, golden questions. Once these parameters were tuned, we gathered 229,584 annotations for the whole 42,911 images. Statistical consensus techniques were then used to merge the annotators judgments and produce high-quality image-labels. Finally, we compared different classifiers trained on both ground-truth (expert) labels and crowdsourcing labels: we observed no differences in classification accuracy, which confirms the quality of the produced labels. Thus, we conclude that the proposed affective-interaction dataset provides a unique set of images of people playing games with their facial expressions and labels with a quality similar to that of expert labels (differences are less than 9%).
- Realistic facial expressions