Multimodal Human Activity Recognition for Industrial Manufacturing Processes in Robotic Workcells

Alina Roitberg, Nikhil Somani, Alexander Perzylo, Markus Rickert und Alois Knoll

Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), pp. 259–266

November 2015 · Seattle, WA, USA · DOI:10.1145/2818346.2820738


We present an approach for monitoring and interpreting human activities based on a novel multimodal vision-based interface, aiming at improving the efficiency of human-robot interaction (HRI) in industrial environments. Multi-modality is an important concept in this design, where we combine inputs from several state-of-the-art sensors to provide a variety of information, e.g. skeleton and fingertip poses. Based on typical industrial workflows, we derived multiple levels of human activity labels, including large-scale activities (e.g. assembly) and simpler sub-activities (e.g. hand gestures), creating a duration- and complexity-based hierarchy. We train supervised generative classifiers for each activity level and combine the output of this stage with a trained Hierarchical Hidden Markov Model (HHMM), which models not only the temporal aspects between the activities on the same level, but also the hierarchical relationships between the levels.

Stichworte:robotics, smerobotics