The Problem
- Earlier e-cigarette Twitter studies had small samples and missed broader patterns.
- No prior work had spatially mapped e-cig tweet clusters to find significant hotspots.
- Differentiating genuine public tweets from overwhelming commercial promotions was challenging.
The Solution
- Collected two years of geotagged e-cigarette tweets (~83,000) nationwide.
- Removed commercial spam via machine learning classifiers to focus on genuine user content.
- Applied spatiotemporal scan statistics to detect clusters with unusually high e-cig tweet volumes.
- Analyzed each hotspot’s sentiment (pro vs. anti vaping) and fraction of underage users.
Architecture Overview
- Twitter Streaming API captured a 1% sample of geotagged tweets (Oct 2012–Oct 2014).
- Filtered by e-cigarette keywords, yielding 62,894 US geotagged e-cig tweets.
- Used an SVM classifier to separate non-commercial e-cig tweets from advertising content.
- Employed SaTScan-like spatiotemporal scanning to find statistically significant hotspots.
- Computed sentiment scores and age metrics for each detected cluster.
Results and Impacts
- Discovered multiple e-cigarette tweet hotspots, mostly on the US West Coast.
- About 75% of hotspots had above-average pro-vaping sentiment and higher youth participation.
- Identified regions with intense pro-vaping influence among youth, underscoring need for targeted monitoring.
Skills and Tools Used
| Technique/Skill | Tools/Implementation |
|---|---|
| Geospatial analysis | Spatiotemporal scan statistics (hotspot detection) |
| Big data processing | Twitter streaming and filtering (~83K tweets) |
| Machine learning | Text classification to filter commercial posts |
| Sentiment and demographic analysis | NLP plus age inference for user demographics |
| Data visualization | Mapping hotspots and trends on dashboards |
Cross-Project Capabilities
- Adapted the flexible Twitter surveillance pipeline to a new topic (e-cigarettes).
- Applied spatial analysis of social data, a technique transferable to other public health signals.
- Content filtering methods here (distinguishing organic vs. commercial posts) informed other contagion analyses.
Published Papers/Tools
- PhD Thesis (Chapter 5): Find and Analyze Hotspots of E-cigarette-related Tweets. Thesis Link
- Developed spatiotemporal hotspot detection and analysis tools for social media data.
- Produced the first spatial analysis of vaping discourse, cited in later health social media studies.
