About me

My name is Mickael Cormier and I’m a research assistant at the Karlsruhe Institute of Technology, working in close cooperation with the Fraunhofer Institute for Optronics, System Technologies and Image Exploitation IOSB in Karlsruhe (Germany), which is one of the largest research institutes in the field of image acquisition, processing and analysis in Europe.

The focus of my work is centered around the two topics:

  • Machine Learning / Deep Learning
  • Computer Vision

Open Positions for Students

I’m always looking for students supporting me with my research. Some currently available topics are listed below. Topics for thesis or “Hiwi” positions are not limited to those below, so don’t hesitate to ask me for other related positions.


Image and Video Annotation (German) (Hiwi)
Implementation of Complex Annotation Processes in Web Backend (German) (BA/MA)
Interactive Annotation in Multi-Camera System for 3D Human Pose Estimation (German) (MA)
Web Frontend / Fullstack Development for a Deep Learning Annotations Tool (German) (Hiwi)
Web Backend / Fullstack Development for a Deep Learning Annotations Tool (German) (Hiwi)
Internship Web Frontend / Fullstack Development for a Deep Learning Annotations Tool (German) (Praxissemester / BA)
Internship Web Backend / Fullstack Development for a Deep Learning Annotations Tool (German) (Praxissemester / BA)

Research Topics

Based on these general interests, I’m working in four main fields: Crowd Pose Estimation, Video Activity Recognition and Deep Learning aided Data Annotation. I try to develop and apply deep learning methods in order to tackle those tasks. You can find more information about the fields below.

Ad: You are a student in Computer Science, Mathematics, Physics or other related fields looking for a research topic for your thesis? You are motivated to work on challenging and interesting tasks? Feel free to contact me.

Person Detection and Pose Estimation within large Crowds

Video-based Action Recognition in surveillance videos remains a challenging task, even with large annotated datasets. In order to comprehend a given situation, a human operator will usually focus on specific details such as body part motions. A group of given key points will move specifically together resulting in unique actions. In order to integrate such temporal context and common-sense knowledge into automatic systems, models need to be able to estimate and detect the 2D position of body parts from person within large crowds. Moreover, estimating 3D key points over time provides useful information such as the velocity, acceleration ratio and motion direction for a given time period. Here, I presently focus on two main aspects:
→ estimation and detection of the 2D position of body keypoints from pedestrians in crowds in near-real-time.
→ use of 2D human body skeletons for video action recognition.

Video Activity Recognition

Action recognition in surveillance video footage is an important research topic in the computer vision community, due to the importance of its applications. Surveillance cameras are increasingly used; however, the monitoring capacity of law enforcement agencies can’t keep up with the huge amount of data being produced. Therefore, automatic or computer assisted surveillance plays a key role for security in public areas such as market places, shops, airports, etc. However, the task of tracking, understanding and reacting to what is happening in long video sequence is really challenging.

In my work, I focus on two main aspects:
→ generation of qualitative annotations for training deep neural networks, using techniques such as Video Description and Captioning, Video Summarization and Generative Adversarial Networks (GAN).
→ investigation of Deep Multitask Architectures for Action Recognition in Surveillance Videos using 3D Convolutions, Long Short-Term Memories (LSTM), multiple Frame-Rate Analysis and Graph Neural Networks (GNN).

Deep Learning aided Data Annotation

Annotation is the process of manually defining regions in an image or video and creating text-based descriptions of these regions. This is a critical first step in building the ground truth for training computer vision models. There are a variety of use cases for image annotation, such as object, activity, emotion or anomaly detection. Scientists rely on millions of annotations like image captions or bounding boxes up to keypoints and pixelwise class annotation. In the research group Video-based Safety and Assistance Systems we are developing a web-based Deep Learning Annotations Tool to accelerate the annotation process using intuitive UI & design and pre-processing of deep learning methods and to improve the quality of the annotations.