← Back to portfolio

Youth E-cig Hotspot Detection

Publication Policy

Identified geographic hotspots of e-cigarette–related tweets across the US.

Role: Principal Investigator, Lead Data Scientist, System Architect & First Author

Focus: Behavior Monitoring · Geospatial Analytics · Hotspot Detection · ML Classification · NLP · Public Health Intervention · Social Media Analytics · Youth Tobacco Prevention

Outcome: Policy recommendations on social media oversight for e-cig exposure, spatiotemporal hotspot detection and analysis tools for social media data, and a Thesis chapter.Thesis Link

At a Glance

  • Identified geographic hotspots of e-cigarette–related tweets across the US.
  • Found that most hotspots had high pro-vaping sentiment and many underage participants.
  • Highlighted West Coast regions where youth engagement with vaping messages was particularly high.

The Problem

  • Earlier e-cigarette Twitter studies had small samples and missed broader patterns.
  • No prior work had spatially mapped e-cig tweet clusters to find significant hotspots.
  • Differentiating genuine public tweets from overwhelming commercial promotions was challenging.

The Solution

  • Collected two years of geotagged e-cigarette tweets (~83,000) nationwide.
  • Removed commercial spam via machine learning classifiers to focus on genuine user content.
  • Applied spatiotemporal scan statistics to detect clusters with unusually high e-cig tweet volumes.
  • Analyzed each hotspot’s sentiment (pro vs. anti vaping) and fraction of underage users.

Architecture Overview

  • Twitter Streaming API captured a 1% sample of geotagged tweets (Oct 2012–Oct 2014).
  • Filtered by e-cigarette keywords, yielding 62,894 US geotagged e-cig tweets.
  • Used an SVM classifier to separate non-commercial e-cig tweets from advertising content.
  • Employed SaTScan-like spatiotemporal scanning to find statistically significant hotspots.
  • Computed sentiment scores and age metrics for each detected cluster.

Results and Impacts

  • Discovered multiple e-cigarette tweet hotspots, mostly on the US West Coast.
  • About 75% of hotspots had above-average pro-vaping sentiment and higher youth participation.
  • Identified regions with intense pro-vaping influence among youth, underscoring need for targeted monitoring.

Skills and Tools Used

Technique/Skill Tools/Implementation
Geospatial analysis Spatiotemporal scan statistics (hotspot detection)
Big data processing Twitter streaming and filtering (~83K tweets)
Machine learning Text classification to filter commercial posts
Sentiment and demographic analysis NLP plus age inference for user demographics
Data visualization Mapping hotspots and trends on dashboards

Cross-Project Capabilities

  • Adapted the flexible Twitter surveillance pipeline to a new topic (e-cigarettes).
  • Applied spatial analysis of social data, a technique transferable to other public health signals.
  • Content filtering methods here (distinguishing organic vs. commercial posts) informed other contagion analyses.

Published Papers/Tools

  • PhD Thesis (Chapter 5): Find and Analyze Hotspots of E-cigarette-related Tweets. Thesis Link
  • Developed spatiotemporal hotspot detection and analysis tools for social media data.
  • Produced the first spatial analysis of vaping discourse, cited in later health social media studies.