287104
Topic discovery using discussion posts in an online cancer community
Lior Rokach, Ph D,
Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Nir Ofek,
Information Systems Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel
Yafei Wang,
College of Information Sciences and Technology, Pennsylvania State University, University Park, PA
Prakhar Biyani,
College of Information Sciences and Technology, University Park, PA
Siddhartha Banerjee,
College of Information Sciences and Technology, College of Information Sciences and Technology, University Park, PA
Prasenjit Mitra, Ph.D,
College of Information Sciences and Technology, College of Information Sciences and Technology, University Park, PA
We examine online peer-to-peer cancer community discussion boards to learn about issues of importance to people with cancer and cancer caregivers. The ACS Cancer Survivors Network(SM)(reference CSN), launched in 2000, is the oldest and largest online peer support community for cancer survivors and caregivers with over 160,000 registered members and 85,063 discussion board posts between 2008 and 2012. Text from forum posts are processed to support topic model analysis based on the assumption that each post is associated with one or more underlying latent topics. A Bayesian estimation algorithm is used to discover these latent topics and assign to each post posterior probabilities of it being related to each topic. Practical issues concerning the use and calibration of topic models are discussed as well as insight gained about the optimal number of topic classes. Topic models are applied to initiating posts from the CSN breast cancer and colorectal cancer discussion forums. The two most frequent topics initiated in the breast cancer forum are decisions after treatment (7.7%) and surgery/mastectomy/reconstruction decisions (6.4%). The most frequent topics initiated in the colorectal cancer forum were drugs used in colon cancer treatment (6.3%) and lung scan results (6.4%). Changes in topics over time and the entropy of topic distributions are also discussed.
Learning Areas:
Assessment of individual and community needs for health education
Social and behavioral sciences
Learning Objectives:
Describe topic model analysis using an analysis of ACS Cancer Survivors Network discussion forum posts to illustrate the technique and explore its value in understanding cancer patient, survivor and caregiver needs.
Keyword(s): Peer Information Network, Assessments
Presenting author's disclosure statement:Qualified on the content I am responsible for because: I have been the principal or co-principal investigator of multiple federally funded grants. Among my scientific interests has been the development of automated text analysis and network analysis methodology for improving our understanding about the impacts of online forums to the wellbegings of cancer survivors.
Any relevant financial relationships? No
I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines,
and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed
in my presentation.