YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis

University of Illinois Urbana-Champaign
NeurIPS Datasets and Benchmarks Track, 2023

*Indicates Equal Contribution

Abstract

The healthcare and AI communities have witnessed a growing interest in the development of AI-assisted systems for automated diagnosis of Parkinson's Disease (PD), one of the most prevalent neurodegenerative disorders. However, the progress in this area has been significantly impeded by the absence of a unified, publicly available benchmark, which prevents comprehensive evaluation of existing PD analysis methods and the development of advanced models. This work overcomes these challenges by introducing YouTubePD -- the first publicly available multimodal benchmark designed for PD analysis. We crowd-source existing videos featured with PD from YouTube, exploit multimodal information including in-the-wild videos, audio data, and facial landmarks across 200+ subject videos, and provide dense and diverse annotations from a clinical expert. Based on our benchmark, we propose three challenging and complementary tasks encompassing both discriminative and generative tasks, along with a comprehensive set of corresponding baselines. Experimental evaluation showcases the potential of modern deep learning and computer vision techniques, in particular the generalizability of the models developed on our YouTubePD to real-world clinical settings, while revealing their limitations. We hope our work paves the way for future research in this direction.

Teaser Image

Data Collection Pipeline

Our dataset collection and annotation pipeline. First, we compile a list of public figures who have publicly confirmed their PD diagnosis. We then source their videos from YouTube. From these videos, we handpick clips that are informative for PD detection. A clinical expert then reviews these clips, providing both video-level and region-level annotations, detailing the severity of their PD, and highlighting specific symptoms of the condition.

Data Collection Pipeline Figure

Dataset Statistics

Comparison of statistics between datasets used in prior work and our benchmark. Our YouTubePD is the first open-access and multimodal benchmark for PD analysis

Data Collection Pipeline Figure

Tasks

Our dataset contains in-the-wild videos, audios, and facial landmarks. We show the three tasks on our benchmark: facial-expression-based PD classification, multimodal PD classification, and PD progression synthesis.

Data Collection Pipeline Figure

Experiment Results

Binary/multiclass classification results on YouTubePD in the multimodal setting. Multimodal fusion further improves the performance over unimodal baselines, even when additional modalities have lower performance than the primary modality of facial expression.

Experiment Results Figure

BibTeX

@inproceedings{YouTubePD2023,
    author = {Zhou, Andy and Li, Samuel and Sriram, Pranav and Li, Xiang and Dong, Jiahua and Sharma, Ansh and Zhong, Yuanyi and Luo, Shirui and Jaromin, Maria and Kindratenko, Volodymyr and Heintz, George and Zallek, Christopher and Wang, Yu-Xiong},
    title = {YouTubePD: A Multimodal Benchmark for Parkinson\’s Disease Analysis},
    booktitle = {Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
    year = {2023},
  }