Coding Modules

Explore our collection of hands-on, didactic modules designed for working with multimodal data.
Each coding module introduces a specific topic — from automatic gesture detection and speech analysis to 3D motion tracking — providing a step-by-step guide to the tools, libraries, and techniques used in multimodal research.

For questions about a specific module, please contact the module’s authors directly.
If you’d like to contribute a new module yourself, please reach out to w.pouw@tilburguniversity.edu or other members of the core team.

All Modules

EnvisionObjectAnnotator (Python App)

This module provides a Python desktop app that leverages SAM2 to automatically track any object in videos and detect spatial overlaps between a specified target and scene objects.
By Davide Ahmar, Babajide Owoyele & Wim Pouw

Audio Processing & Speech Analysis Suite (Python)

Comprehensive toolkit for audio analysis, speech transcription, and speaker identification using multiple Python libraries.
by Marianne de Heer Kloots

Audio Analysis with Parselmouth

Extract and visualize speech features using Praat's Python interface.

Speaker Diarization with pyannote

Identify who spoke when using pyannote-audio toolkit.

Speech-to-Text with Whisper

Generate automatic transcriptions using OpenAI's Whisper model.

EnvisionHGdetector Package Suite (Python)

Complete gesture detection and analysis toolkit for research and real-time applications using machine learning.
by Wim Pouw, Bosco Yung, Sharjeel Shaikh, James Trujillo, Gerard de Melo, Babajide Owoyele

Automatic Gesture Analysis & Visualization

Automatically annotate hand gesture stroke events, analyze kinematics, and create dashboard visualizations.

Real-Time Gesture Detection

Detect gestures in real-time from webcam feed using LightGBM.

SPUDNIG-PYTHON Motion-detection Assisted Gesture Annotation (Python)

Motion detection to aid automatic gesture annotation (based on SPUDNIG).
By James Trujillo (based on Ripperda, Drijvers & Holler)

Full‑body tracking, masking/blurring & movement tracing (Python)

Track face, hands, and body using MediaPipe, with masking/blur and tracing.
By Wim Pouw & Sho Akamine

Post‑synchronizing video/audio from separate devices (Python)

Multi‑purpose pipeline for post‑synchronizing video and audio recordings from different devices.
by Hamza Nalbantoğlu & Šárka Kadavá

Automatic Stimuli Creation with Blocked Facial Information (Python)

Scripts for systematically masking facial information at different intensities.
by Wim Pouw & Annika Schiefner

Quantifying Interpersonal Synchrony (Python)

Calculating interpersonal movement synchrony, time lags, and pseudo‑pairs.
by James Trujillo

Multi‑person tracking with YOLO & social proximity (Python)

YOLO pose tracking for multiple persons + interpersonal distance calculation.
By Wim Pouw, Arkadiusz Białek, and James Trujillo

ViCOM tutorial with exercises: kinematic feature pipeline (Python)

Kinematic feature extraction pipeline with exercises for communicative motion analysis.
By Wim Pouw

Behavioral Classification Using CNNs (Python)

Train a model to automatically annotate bodily gestures.
By Wim Pouw

Decision Tree‑Based Classification Algorithms (R)

Use decision trees to make sense of high‑dimensional data.
By Alexander Kilpatrick

Multimodal annotation distances (Python & R)

Compare overlap between ELAN annotations using the multimodal‑annotation‑distance tool.
By Camila Antônio Barros

Video‑embedded time series animations (Python)

Create movement‑sound time series animations embedded in a video.
By Wim Pouw

Turn‑Taking Dynamics and Entropy (Python)

Calculate gaps/overlaps and entropy in conversations with 2+ speakers.
By James Trujillo

Gesture networks and DTW (Python)

Implement gesture networks and spaces using dynamic time warping.
By Wim Pouw

OpenPose with 3D tracking via Pose2Sim (Python)

Python pipeline for OpenPose tracking and 3D triangulation with Pose2Sim.
By Šárka Kadavá & Wim Pouw

Dynamic visualization dashboard (Python)

Example of a dynamic dashboard displaying audio‑visual and static data.
By Wim Pouw

Head rotation tracking (Python)

Track head directions using MediaPipe, alongside face/hand/body tracking.
By Wim Pouw

Running OpenPose in batches (Batch script)

Use batch scripting to run OpenPose on a set of videos.
By James Trujillo & Wim Pouw

Recording from multiple cameras synchronously while streaming to LSL

Record synchronously from multiple cameras—handy for DIY 3D motion tracking.
By Šárka Kadavá & Wim Pouw

3D tracking from 2D videos (Anipose + DeepLabCut, Python)

Set up a 3D motion tracking system with multiple 2D cameras.
By Wim Pouw

Aligning & pre‑processing multiple data streams (R)

Wrangle and preprocess multiple streams into a single time‑series dataset.
By Wim Pouw

Aligning & pre‑processing multiple data streams (Python)

Create a unified long time‑series dataset from multiple modalities.
By Wim Pouw

Extracting a smoothed amplitude envelope (R)

Extract a smoothed amplitude envelope from a sound file.
By Wim Pouw

Motion tracking analysis: Kinematic feature extraction (Python)

Example analysis of motion tracking data using kinematic features.
By James Trujillo

Feature extraction for classification & practice dataset SAGA (R)

Set up kinematic and speech‑acoustic features to train classifiers for gesture types.
By Wim Pouw

Cross‑Wavelet Analysis of Speech‑Gesture Synchrony (R)

Measure temporal synchrony of speech and gesture (or other visual signals).
By James Trujillo