The Problem
- Urgent pandemic trials initially favored established male-led networks, limiting women’s opportunities.
- Some research fields had historically fewer women leaders, likely exacerbating COVID-19 leadership gaps.
- Lack of data left the gender imbalance anecdotal, hindering recognition and action to address it.
The Solution
- Compiled a comprehensive dataset of ~8,000 COVID-19 trials (2020–2022) and comparable non-COVID trials.
- Applied an ML-based tool to infer each investigator’s gender from their name with high confidence.
- Analyzed gender proportions over time and versus other diseases, using statistical tests to validate differences.
Architecture Overview
- Automated Data Pipeline: Gathered trial records from ClinicalTrials.gov for COVID-19 and baseline disease areas.
- Gender Inference Module: Used a name-based ML service to probabilistically assign investigator gender.
- Data Filtering: Excluded trials lacking listed investigators or without confident gender classification.
- Analytic Engine: Computed female-leadership percentages by time period and ran cross-disease comparisons.
- Advanced Analysis: Linked leader gender to trial participant gender and segmented outcomes by trial type.
Results and Impacts
- Women constituted ~37% of COVID-19 trial leaders overall, climbing to ~49% by 2022.
- Female-led trials tended to enroll more female participants, suggesting leadership influences inclusivity.
- The study spotlighted gender gaps and urged institutions to support women in research leadership roles.
Skills and Tools Used
| Technique/Skill | Tools/Implementation |
|---|---|
| Data Pipeline | Automated trial data extraction (Python, APIs) |
| ML Tools | Name-based gender classifier service (Genderize API) |
| Statistical Analysis | Chi-square tests and regression modeling (R, Python) |
Cross-Project Capabilities
- Analytical Rigor: Demonstrated a robust data-analysis approach applicable to diverse research questions.
- Data Insight: Combined domain expertise with data science, a skill applied across projects.
- Equity Analysis: Experience quantifying representation gaps informs efforts to address bias in other domains.
Published Papers/Tools
- Findings published in Lancet Digital Health (2023), raising awareness of gender disparities in clinical trials.
- Reproducible analysis pipeline documented for sharing, enabling further research on trial diversity.