← Back to portfolio

COVID-19 Trial Leadership Equity

Publication

Analyzed 7,500+ US COVID-19 clinical trials to quantify gender gaps in study leadership.

Role: Principal Data Scientist & Senior Author

Focus: Automated Data Pipeline · Clinical Data Mining · Clinical Trial Leadership · Equity Analytics · Gender Gap · NLP

At a Glance

  • Analyzed 7,500+ US COVID-19 clinical trials to quantify gender gaps in study leadership.
  • Found women made up ~37% of trial principal investigators, rising to ~49% by 2022.
  • Highlighted underrepresentation of women vs ~50% in other diseases, prompting equity awareness efforts.

The Problem

  • Urgent pandemic trials initially favored established male-led networks, limiting women’s opportunities.
  • Some research fields had historically fewer women leaders, likely exacerbating COVID-19 leadership gaps.
  • Lack of data left the gender imbalance anecdotal, hindering recognition and action to address it.

The Solution

  • Compiled a comprehensive dataset of ~8,000 COVID-19 trials (2020–2022) and comparable non-COVID trials.
  • Applied an ML-based tool to infer each investigator’s gender from their name with high confidence.
  • Analyzed gender proportions over time and versus other diseases, using statistical tests to validate differences.

Architecture Overview

  • Automated Data Pipeline: Gathered trial records from ClinicalTrials.gov for COVID-19 and baseline disease areas.
  • Gender Inference Module: Used a name-based ML service to probabilistically assign investigator gender.
  • Data Filtering: Excluded trials lacking listed investigators or without confident gender classification.
  • Analytic Engine: Computed female-leadership percentages by time period and ran cross-disease comparisons.
  • Advanced Analysis: Linked leader gender to trial participant gender and segmented outcomes by trial type.

Results and Impacts

  • Women constituted ~37% of COVID-19 trial leaders overall, climbing to ~49% by 2022.
  • Female-led trials tended to enroll more female participants, suggesting leadership influences inclusivity.
  • The study spotlighted gender gaps and urged institutions to support women in research leadership roles.

Skills and Tools Used

Technique/SkillTools/Implementation
Data PipelineAutomated trial data extraction (Python, APIs)
ML Tools Name-based gender classifier service (Genderize API)
Statistical Analysis Chi-square tests and regression modeling (R, Python)

Cross-Project Capabilities

  • Analytical Rigor: Demonstrated a robust data-analysis approach applicable to diverse research questions.
  • Data Insight: Combined domain expertise with data science, a skill applied across projects.
  • Equity Analysis: Experience quantifying representation gaps informs efforts to address bias in other domains.

Published Papers/Tools

  • Findings published in Lancet Digital Health (2023), raising awareness of gender disparities in clinical trials.
  • Reproducible analysis pipeline documented for sharing, enabling further research on trial diversity.