Online Program

Big data in public health – key technologies and tools for the public health practitioner

Monday, November 4, 2013 : 2:30 p.m. - 2:50 p.m.

Matthew Dollacker, PMP, InductiveHealth Informatics, Atlanta, GA
Data analytics is foundational to the practice of public health, but the rapid advances in data storage and processing technologies that have swept through industry are still under-utilized in the public health arena. These big data technologies underpin the dramatic leaps in machine learning capabilities that have enabled such huge advances as the Jeopardy-winning Watson and services such as Google Now.

These new ‘Big Data' technologies are enabling use-cases that were difficult to impossible to achieve and extremely costly to maintain only a few years ago. Within the past few years, new open source tools (HBase, Voldemort, MongoDB) and commercial services (EC2, DynamoDB, Google Prediction API) have emerged that provide the ability to leverage these innovations simply and cheaply, revolutionizing the way industry and academia work with data.

These new technologies require new thinking in how to structure and analyze data, but have the ability to open new frontiers for the discipline of public health informatics. These frontiers revolve around two main dimensions: 1) the ability to efficiently store and query immense data sets of terabytes or even petabytes in size, 2) the ability to perform computationally intense algorithms on this data, including artificial intelligence and machine learning techniques such as cluster analysis, pattern mining, and Bayesian analysis. This talk will focus on exploring the important considerations in using these technologies for novel applications in the public health arena.

Learning Areas:

Administration, management, leadership
Other professions or practice related to public health
Systems thinking models (conceptual and theoretical models), applications related to public health

Learning Objectives:
Identify the key emerging tools and technologies to work with very large data sets Explain the differences and trade-offs for using big data tools versus conventional relational tool sets Describe the applications machine learning technologies enable in the public health domain

Keyword(s): Disease Data, Data/Surveillance

Presenting author's disclosure statement:

Qualified on the content I am responsible for because: I have been the principal investigator for research into distributed data analytics frameworks geared to the healthcare arena. In addition, I have led four major business intelligence and data warehousing programs for major healthcare and public health institutions. As a key technology and strategy leader in CSC's Federal Health practice, I have advised clients on the applicability of Big Data strategies to their businesses and mission areas.
Any relevant financial relationships? No

I agree to comply with the American Public Health Association Conflict of Interest and Commercial Support Guidelines, and to disclose to the participants any off-label or experimental uses of a commercial product or service discussed in my presentation.