Aetna Health Insurance Plans and Services Overview

Speakers
Dennis Fusaro
Lead Infrastructure Developer
Daniel Markwat
Infrastructure Developer

Helping people live healthier lives
About 46 million people rely on us to help them
make decisions about their health care and their
health care spending. Every day, we work to
make the system easier and more convenient for
our customers.
Our health insurance plans and services include:
• Medical, pharmacy and dental plans
• Life and disability plans
• Medicaid services
• Behavioral health programs
• Medical management
Aetna membership:
We proudly serve*
•23.7 million medical members
•Approximately 15.5 million dental members
•Approximately 15.4 million pharmacy benefit
management services members
Aetna health care network:
Our network stretches across the country and
across much of the globe:
•More than 1.1 million health care professionals
•More than 674,000 primary care doctors and
specialists
•5,589 hospitals
*information as of March 31, 2015

Use Cases
Finding Data : Data scientists spend too much time finding
correct columns for variable selection.
• On average, column investigation takes 80% of a data scientist’s time
• Time taken was spent meeting with Subject Matter Experts (SMEs)
Shape of Data : Reduce the number of ad-hoc profiling
queries run.
• ~78% of the queries run on the cluster are profiling queries
Tracking Transformations : Data scientists would like to
understand how data sets are derived.
• Transformations are only tracked at a high-level in documentation

Finding Data: Challenges
Hive requires manual traversal of the schema to find tables
or columns
HDFS requires traversal of the directory listing to find a
file
External documentation of the locations of data become
stale and unreliable as data changes
No practical means to add additional metadata

Finding Data: Solutions
Capture Hive & HDFS metadata during
runtime and store in a repository
Provide an API to interactively search &
query the metadata
Provide an API to enrich the logical metadata
with business context

Metadata Repository
Physical Metadata
Business Metadata

Metadata Repository
Physical Metadata
Business Metadata
HDFS
Sqoop
Hive

Metadata Repository
Physical Metadata
Business Metadata
HDFS
Sqoop
Hive
Apache
Atlas

Use Cases
Finding Data : Data scientists spend too much time finding
correct columns for variable selection.
• On average, column investigation takes 80% of a data scientist’s time
• Time taken was spent meeting with Subject Matter Experts (SMEs)
Shape of Data : Reduce the number of ad-hoc profiling queries
run.
• ~78% of the queries run on the cluster are profiling queries
Tracking Transformations : Data scientists would like to
understand how data sets are derived.
• Transformations are only tracked at a high-level in documentation

Production Query Breakdown
4%
18%
78%
Average Daily Queries
Production
Exploratory
Profiling

Shape of Data: Challenges
Constantly accessing hive metastore for
basic stats was affecting production running
jobs
The limited number of stats in the default
Metastore was not sufficient to make an
accurate assessment of the shape of the data

Shape of Data: Solutions
Create a system to store profiling data that
can be cross referenced with the physical
and business metadata
Create an extensible framework for data
scientists to create and add new profiling

Tracking Transformations: Challenges
Documenting transformations is a manual
task and cannot be done at scale
No mechanism for auditing data
pipelines
Identifying data quality and provenance
is a manual effort

Tracking Transformations: Solutions
Leverage the metadata captured for search
to construct the flow of the transformations
Provide an API for interrogating
transformation executions
Provide a means for visualizing
transformations from source to current state

Mosaic
Mosaic simplifies the big data environment by providing a familiar search experience.
 Search
 If you know how to search Google or Amazon you can search Mosaic.
 Search returns your most relevant results found in Hive or HDFS and displays them in
an easy to understand format.
 Get the right data, right away by refining your results using suggested filters.
 See business definitions and comments from other users to bring clarity to the data.
 Data Profiling
 Profiling stores metrics about the data you are browsing (i.e. max, min, and the
distribution of a column).
 Lineage
 Sometimes where you’re going depends on where you’ve been. Explore the lineage
tabs to see where your data came from, including if it came from external systems.
 Pull back the covers on derived tables and see the transformation logic that built
them.

Powered By:
We want your feedback!
Please Rate & Review on the Hadoop Summit App

Aetna Health Insurance Plans and Services Overview

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Aetna Health Insurance Plans and Services Overview

Similar to Aetna Health Insurance Plans and Services Overview (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Aetna Health Insurance Plans and Services Overview