SlideShare a Scribd company logo
1 of 2
Download to read offline
august2012 47© 2012 The Royal Statistical Society
Big data and the future
The Human Genome Project began in 1990. By
2003 it had produced a map of the approxi-
mately 20 000–25 000 genes that make the
human species what it is. It was a tremendous
effort, a defining project in humankind’s un-
derstanding of itself. The collection, analysis,
and interpretation of that huge amount of data
took 12 years of concentrated effort by the
best human minds and the best human-made
machines.
The human mind has not changed since then,
but the human-made machines have. So has
what they can do. The first “big data” research
area I was exposed to as a young undergraduate
student was genomics. Steven Salzberg, Profes-
sor of Medicine and Biostatistics at Johns Hop-
kins University, summarised today’s genomics
landscape for me: “Next-generation sequencing
technology can now generate more data in a
single day than the entire Human Genome Pro-
ject generated in 12 years. It has transformed
biomedical science.” It was apparent to us as
novices that working as a statistician in genom-
ics would require more than a deep understand-
ing of the biology involved; it would need also
a special arsenal of statistical, computational,
and technological tools. There is just too much
data for the old ways to handle.
Even just looking at the data brings prob-
lems. “Simply moving this data around presents
major challenges to many scientists and institu-
tions: their networks just aren’t fast enough”, he
says. And that is before you start working on it:
“Analysing the data is a much bigger problem.
With such large data sets, it is all too easy to find
rare statistical anomalies and to confuse them
with real phenomena.” When there are millions
of data points, and many tests, false positives
are much more likely.
“Big data” has become a hot topic across
many disparate fields, and for good reason.
One burgeoning new area that generates very
large data sets is the study of brain images. Ani
Eloyan, a postdoctoral fellow at Johns Hopkins
Bloomberg School of Public Health, is part of
the team that won a recent prediction con-
test examining attention deficit hyperactivity
disorder (ADHD), the 2011 ADHD-200 Global
Competition. They used neuroimaging data and
other information to categorise subjects into
neurotypical, ADHD primary inattentive type, or
At the beginning of her career Sherri Rose discusses big data and stands amazed at its potential.
© iStockphoto.com/Pgiam
With such large data sets, it is
all too easy to find rare statistical
anomalies and to confuse them
with real phenomena
august201248
ADHD combined type diagnoses. Understanding
the unique complexities of imaging data is no
small task. “Brain imaging data mostly consist
of collections of three-dimensional arrays (of, for
example, intensities) collected over time result-
ing in a four-dimensional array for each subject”,
she says (for a related example, see Feigelson
and Babu’s image from astronomy on page 23).
“The first major issue in analysing these data is
the simple fact that our brains are very differ-
ent in size, shape and so on. In many cases the
transformation of the matrices into a common
space – a form in which they can be compared
to each other – is still an open problem which is
hindering the analysis of the data.”
The analysis of large comprehensive medical
databases is another area where statisticians
are lending their skills. These databases are
even making appearances in the mainstream
media, thanks to a new competition. Follow-
ing in the footsteps of the $1 million Netflix
Prize (see page 40), where teams developed
algorithms to improve upon the content pro-
vider’s existing recommendation system for
movies, is an even larger big-data prize that
is perhaps more socially useful: the $3 million
Heritage Health Prize Competition. Its goal is
to predict future hospitalisations using exist-
ing high-dimensional patient data. A behemoth
example of a massive clinical database is the US
Food and Drug Administration’s Sentinel Initia-
tive, which aims to monitor drugs and medical
devices for safety over time. The end result of
this programme will be a national electronic
system, and the new system already has access
to 100 million people and their medical records.
Consider the volume of medical data that one
person can accumulate over a very few years:
repeated measurements of blood pressure, lung
function, antibody concentrations, digitalised
X-rays and scans and the rest. Multiply that by
100 million and you get an idea of the size of
the database.
As one can imagine, the sheer scale of this
project and its longitudinal nature – its following
of patients through time – provide interesting
statistical challenges. One complexity is ac-
curately defining the data. In this case, among
other considerations, it involves acknowledging
that subjects “drop out” and are not followed for
the entire time period scientists are studying. The
model has to include a mechanism that generates
these missing values, as subjects do not drop out
at random: issues such as drug toxicity could be
leading to drop out – which is a bit of a problem
if drug toxicity is the very thing you are trying to
study. The traditional assumptions of parametric
modelling are not likely to be supported by what
is known about how the data was generated.
Mark van der Laan, Professor of Biostatistics and
Statistics at the University of California, Berkeley,
and authority in statistical learning with missing
data, is working on the Sentinel Initiative. “We
need to use the state of the art in estimation
without relying on restrictive assumptions; we
need methods that aim to learn from these large
data sets as much as the data allow.”
Electronic medical records are only part of
big data. They are being linked with other big
data sets to study, for example, environmental
issues such as air quality. Examining the health
effects of air quality brings in the additional
component of geography: different regions have
different particles in the air. Cory Zigler is a
postdoctoral fellow at Harvard School of Public
Health. He has recently investigated the causal
effects of the 1990 Clean Air Act amendments
on millions of Medicare beneficiaries, with appli-
cations to health and environmental policy. Dr.
Zigler illuminated the many issues environmental
biostatisticians face:
There are satellites measuring markers of
ambient air quality at increasingly fine
spatial and temporal resolutions. Couple
that with the wealth of health data being
collected in administrative databases,
and we’re faced with a big data challenge
in environmental epidemiology. But all
the data in the world won’t change some
of the salient issues such as the fact that
people who live near one another share
many things in common in addition to
the air they breathe. Teasing out the
health effects of air pollution from other
factors requires thoughtful statistical
reasoning throughout the entire process:
you must define the right question,
choose the right spatial and temporal
resolution of the data, ultimately apply
the right analytical methods and interpret
them correctly. This must be a combined
effort from people with a wide array of
quantitative skills.
Disentangling the impact of the community,
neighbourhood, and household effects is a far-
reaching challenge found in many other fields as
well.
What will the big data allow, once we have
learned how to handle it properly? Alessio Fasa-
no, Director of the Center for Celiac Research at
the University of Maryland School of Medicine, is
an expert on gluten-related disorders. His vision
of the future is inspiring:
Imagine that you have in your hands
the ability to unveil the secrets of hu-
man biology, to establish how the human
host interacts and communicates with
the “parallel civilisation” of bacteria liv-
ing in symbiosis with us, to understand
the yin and yang between tolerance and
immune response, and the ability to
turn on and off autoimmune diseases at
will. Imagine, in other words, that you
have the power to decipher the secrets
of complex diseases, so that innovative
preventive and therapeutic interventions
can be developed. All this is theoretically
possible with celiac disease, the only
autoimmune disease for which the en-
vironmental trigger is known. However,
these goals are achievable only if robust
statistical methodologies are applied to
elaborate the enormous amount of data
that we have recently acquired, thanks
to advances in our knowledge about ce-
liac disease pathogenesis. Trying to make
sense of the complexity of celiac disease
without fundamentals in statistics is like
trying to decipher Egyptian hieroglyph-
ics without having the key to interpret
them.
Dr. Fasano is leading innovative new projects
studying the introduction of gluten in infants
and their microbial environment, science that
will advance human knowledge radically, but
only with the collaboration of medical, scientific
– and statistical – experts.
As a recent statistical trainee, a freshly-
minted (bio)statistics PhD, I believe we as
young researchers have a responsibility. Our
new projects, will be interdisciplinary. We must
ground them in the fundamentals and principles
of statistics. They will involve many new col-
leagues and disciplines, and will increasingly be
our future. Small wonder that young researchers
are getting excited about big data.
Sherri Rose is an NSF postdoctoral fellow in biostatis-
tics at the Johns Hopkins Bloomberg School of Public
Health.  She recently coauthored the book Targeted
Learning: Causal Inference for Observational and Ex-
perimental Data for the Springer Series in Statistics.
Imagine you have in your
hands the ability to understand
the yin and yang between
tolerance and immune response,
the power to decipher the
secrets of complex diseases

More Related Content

What's hot

From Big Data to Precision Medicine
From Big Data to Precision Medicine From Big Data to Precision Medicine
From Big Data to Precision Medicine Year of the X
 
Exponential Technology in Health Care
Exponential Technology in Health Care Exponential Technology in Health Care
Exponential Technology in Health Care Joseph Haslam
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data WorldCloudera, Inc.
 
Sharing of data leads to progress on alzheimers.doc
Sharing of data leads to progress on alzheimers.docSharing of data leads to progress on alzheimers.doc
Sharing of data leads to progress on alzheimers.docStudioHOF
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015Thomas Wilckens
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Philip Bourne
 
March 2016 Connectivity of Diagnostic Technologies
March 2016 Connectivity of Diagnostic TechnologiesMarch 2016 Connectivity of Diagnostic Technologies
March 2016 Connectivity of Diagnostic TechnologiesSystemOne
 
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and BeyondTowards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and BeyondDeborah McGuinness
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthIda Sim
 
The shared value of personal and population data
The shared value of personal and population dataThe shared value of personal and population data
The shared value of personal and population dataWessel Kraaij
 
Future of world
Future of worldFuture of world
Future of worldFaizNoor4
 
techRHEUM (edit)
techRHEUM (edit)techRHEUM (edit)
techRHEUM (edit)KMRebi
 
fall-2014-big-data-challenges
fall-2014-big-data-challengesfall-2014-big-data-challenges
fall-2014-big-data-challengesTim Michaelis
 
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Katarzyna Wac & The QoL Lab
 
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...Elaine Martin
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11Jihad Obeid
 
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"UCSF Informatics Day 2014 - Michael Blum, "Digital Health"
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"CTSI at UCSF
 

What's hot (19)

From Big Data to Precision Medicine
From Big Data to Precision Medicine From Big Data to Precision Medicine
From Big Data to Precision Medicine
 
Exponential Technology in Health Care
Exponential Technology in Health Care Exponential Technology in Health Care
Exponential Technology in Health Care
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
Sharing of data leads to progress on alzheimers.doc
Sharing of data leads to progress on alzheimers.docSharing of data leads to progress on alzheimers.doc
Sharing of data leads to progress on alzheimers.doc
 
Pattern diagnostics 2015
Pattern diagnostics 2015Pattern diagnostics 2015
Pattern diagnostics 2015
 
Research
ResearchResearch
Research
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
March 2016 Connectivity of Diagnostic Technologies
March 2016 Connectivity of Diagnostic TechnologiesMarch 2016 Connectivity of Diagnostic Technologies
March 2016 Connectivity of Diagnostic Technologies
 
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology:CHEAR to HHEAR and BeyondTowards an Environmental Health Sciences Ontology:CHEAR to HHEAR and Beyond
Towards an Environmental Health Sciences Ontology: CHEAR to HHEAR and Beyond
 
BYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealthBYO App: Announcing Linq from Open mHealth
BYO App: Announcing Linq from Open mHealth
 
Genomics privacy
Genomics privacyGenomics privacy
Genomics privacy
 
The shared value of personal and population data
The shared value of personal and population dataThe shared value of personal and population data
The shared value of personal and population data
 
Future of world
Future of worldFuture of world
Future of world
 
techRHEUM (edit)
techRHEUM (edit)techRHEUM (edit)
techRHEUM (edit)
 
fall-2014-big-data-challenges
fall-2014-big-data-challengesfall-2014-big-data-challenges
fall-2014-big-data-challenges
 
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
Multimodal Machine Learning for Quality of Life Assessment: Throwing Data at ...
 
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...
Digital Access to the World's Literature: A Blueprint to Integrate Evidence w...
 
Obeid generic_2017-11
Obeid generic_2017-11Obeid generic_2017-11
Obeid generic_2017-11
 
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"UCSF Informatics Day 2014 - Michael Blum, "Digital Health"
UCSF Informatics Day 2014 - Michael Blum, "Digital Health"
 

Viewers also liked

UI Design Patterns for the Web, Part 2
UI Design Patterns for the Web, Part 2UI Design Patterns for the Web, Part 2
UI Design Patterns for the Web, Part 2Lewis Lin 🦊
 
1 Page Salary Negotiation Cheat Sheet
1 Page Salary Negotiation Cheat Sheet1 Page Salary Negotiation Cheat Sheet
1 Page Salary Negotiation Cheat SheetLewis Lin 🦊
 
Best of email swipe file
Best of email swipe fileBest of email swipe file
Best of email swipe fileLewis Lin 🦊
 
PM interview questions book pdf
PM interview questions book pdfPM interview questions book pdf
PM interview questions book pdfLewis Lin 🦊
 
Marketing Case Interview: Cheat Sheet
Marketing Case Interview: Cheat SheetMarketing Case Interview: Cheat Sheet
Marketing Case Interview: Cheat SheetLewis Lin 🦊
 
UI Design Patterns for the Web, Part 1
UI Design Patterns for the Web, Part 1UI Design Patterns for the Web, Part 1
UI Design Patterns for the Web, Part 1Lewis Lin 🦊
 
A gentle introduction to algorithm complexity analysis
A gentle introduction to algorithm complexity analysisA gentle introduction to algorithm complexity analysis
A gentle introduction to algorithm complexity analysisLewis Lin 🦊
 
Google's Official Note to Product Management Candidates
Google's Official Note to Product Management CandidatesGoogle's Official Note to Product Management Candidates
Google's Official Note to Product Management CandidatesLewis Lin 🦊
 
Headline hacks 03-10-2012
Headline hacks 03-10-2012Headline hacks 03-10-2012
Headline hacks 03-10-2012Lewis Lin 🦊
 
Qvartz Case Interview Handbook
Qvartz Case Interview HandbookQvartz Case Interview Handbook
Qvartz Case Interview HandbookLewis Lin 🦊
 
Uxpin color theory_in_web_ui_design
Uxpin color theory_in_web_ui_designUxpin color theory_in_web_ui_design
Uxpin color theory_in_web_ui_designLewis Lin 🦊
 
Uxpin web ui_design_for_the_human_eye_2
Uxpin web ui_design_for_the_human_eye_2Uxpin web ui_design_for_the_human_eye_2
Uxpin web ui_design_for_the_human_eye_2Lewis Lin 🦊
 
Uxpin mastering the_power_of_nothing
Uxpin mastering the_power_of_nothingUxpin mastering the_power_of_nothing
Uxpin mastering the_power_of_nothingLewis Lin 🦊
 
Interview Evaluation Sheet: Operations Question
Interview Evaluation Sheet: Operations QuestionInterview Evaluation Sheet: Operations Question
Interview Evaluation Sheet: Operations QuestionLewis Lin 🦊
 
Marketplace Handbook-11-08-2015
Marketplace Handbook-11-08-2015Marketplace Handbook-11-08-2015
Marketplace Handbook-11-08-2015Lewis Lin 🦊
 
Performance Based Interviewing (PBI) Questions
Performance Based Interviewing (PBI) QuestionsPerformance Based Interviewing (PBI) Questions
Performance Based Interviewing (PBI) QuestionsLewis Lin 🦊
 
Google product manager interview questions answers
Google product manager interview questions answersGoogle product manager interview questions answers
Google product manager interview questions answersSweta Singh
 
Product roadmap-guide-by-product plan
Product roadmap-guide-by-product planProduct roadmap-guide-by-product plan
Product roadmap-guide-by-product planLewis Lin 🦊
 
100 product management interview questions and answers pdf
100 product management interview questions and answers pdf100 product management interview questions and answers pdf
100 product management interview questions and answers pdfProductManager88
 
McKinsey Slides Examples
McKinsey Slides ExamplesMcKinsey Slides Examples
McKinsey Slides ExamplesLewis Lin 🦊
 

Viewers also liked (20)

UI Design Patterns for the Web, Part 2
UI Design Patterns for the Web, Part 2UI Design Patterns for the Web, Part 2
UI Design Patterns for the Web, Part 2
 
1 Page Salary Negotiation Cheat Sheet
1 Page Salary Negotiation Cheat Sheet1 Page Salary Negotiation Cheat Sheet
1 Page Salary Negotiation Cheat Sheet
 
Best of email swipe file
Best of email swipe fileBest of email swipe file
Best of email swipe file
 
PM interview questions book pdf
PM interview questions book pdfPM interview questions book pdf
PM interview questions book pdf
 
Marketing Case Interview: Cheat Sheet
Marketing Case Interview: Cheat SheetMarketing Case Interview: Cheat Sheet
Marketing Case Interview: Cheat Sheet
 
UI Design Patterns for the Web, Part 1
UI Design Patterns for the Web, Part 1UI Design Patterns for the Web, Part 1
UI Design Patterns for the Web, Part 1
 
A gentle introduction to algorithm complexity analysis
A gentle introduction to algorithm complexity analysisA gentle introduction to algorithm complexity analysis
A gentle introduction to algorithm complexity analysis
 
Google's Official Note to Product Management Candidates
Google's Official Note to Product Management CandidatesGoogle's Official Note to Product Management Candidates
Google's Official Note to Product Management Candidates
 
Headline hacks 03-10-2012
Headline hacks 03-10-2012Headline hacks 03-10-2012
Headline hacks 03-10-2012
 
Qvartz Case Interview Handbook
Qvartz Case Interview HandbookQvartz Case Interview Handbook
Qvartz Case Interview Handbook
 
Uxpin color theory_in_web_ui_design
Uxpin color theory_in_web_ui_designUxpin color theory_in_web_ui_design
Uxpin color theory_in_web_ui_design
 
Uxpin web ui_design_for_the_human_eye_2
Uxpin web ui_design_for_the_human_eye_2Uxpin web ui_design_for_the_human_eye_2
Uxpin web ui_design_for_the_human_eye_2
 
Uxpin mastering the_power_of_nothing
Uxpin mastering the_power_of_nothingUxpin mastering the_power_of_nothing
Uxpin mastering the_power_of_nothing
 
Interview Evaluation Sheet: Operations Question
Interview Evaluation Sheet: Operations QuestionInterview Evaluation Sheet: Operations Question
Interview Evaluation Sheet: Operations Question
 
Marketplace Handbook-11-08-2015
Marketplace Handbook-11-08-2015Marketplace Handbook-11-08-2015
Marketplace Handbook-11-08-2015
 
Performance Based Interviewing (PBI) Questions
Performance Based Interviewing (PBI) QuestionsPerformance Based Interviewing (PBI) Questions
Performance Based Interviewing (PBI) Questions
 
Google product manager interview questions answers
Google product manager interview questions answersGoogle product manager interview questions answers
Google product manager interview questions answers
 
Product roadmap-guide-by-product plan
Product roadmap-guide-by-product planProduct roadmap-guide-by-product plan
Product roadmap-guide-by-product plan
 
100 product management interview questions and answers pdf
100 product management interview questions and answers pdf100 product management interview questions and answers pdf
100 product management interview questions and answers pdf
 
McKinsey Slides Examples
McKinsey Slides ExamplesMcKinsey Slides Examples
McKinsey Slides Examples
 

Similar to Big Data and the Future by Sherri Rose

Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Philip Bourne
 
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로의료의 미래, 디지털 헬스케어: 신약개발을 중심으로
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로Yoon Sup Choi
 
Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Carlos Rodarte
 
HUMAN GENOME PROJECT_Dr.Sonia.pdf
HUMAN GENOME PROJECT_Dr.Sonia.pdfHUMAN GENOME PROJECT_Dr.Sonia.pdf
HUMAN GENOME PROJECT_Dr.Sonia.pdfsoniaangeline
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
Big Data and US healthcare
Big Data and US healthcareBig Data and US healthcare
Big Data and US healthcareVincent Laban
 
The A3BC, Biobanks and the future of precision medicine
The A3BC, Biobanks and the future of precision medicine The A3BC, Biobanks and the future of precision medicine
The A3BC, Biobanks and the future of precision medicine - A3BC -
 
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Nils Gehlenborg
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health careAravindharamanan S
 
Modern Genomics in conjunction with Patients IP on the value chain
Modern Genomics in conjunction with Patients IP on the value chainModern Genomics in conjunction with Patients IP on the value chain
Modern Genomics in conjunction with Patients IP on the value chainJohn Peter Mary Wubbe
 
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docx
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docxMAKING SENSE OFSTATISTICSWhat statistics tell you an.docx
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docxsmile790243
 
Deep Learning in Medicine
Deep Learning in MedicineDeep Learning in Medicine
Deep Learning in Medicineijtsrd
 
Epidemiological studies
Epidemiological studiesEpidemiological studies
Epidemiological studiesjayarajgr
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
 
Cancer and Work Design v1.01
Cancer and Work Design v1.01Cancer and Work Design v1.01
Cancer and Work Design v1.01James Repenning
 

Similar to Big Data and the Future by Sherri Rose (20)

Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
Big Data and the Promise and Pitfalls when Applied to Disease Prevention and ...
 
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로의료의 미래, 디지털 헬스케어: 신약개발을 중심으로
의료의 미래, 디지털 헬스케어: 신약개발을 중심으로
 
Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018
 
HUMAN GENOME PROJECT_Dr.Sonia.pdf
HUMAN GENOME PROJECT_Dr.Sonia.pdfHUMAN GENOME PROJECT_Dr.Sonia.pdf
HUMAN GENOME PROJECT_Dr.Sonia.pdf
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
Big Data and US healthcare
Big Data and US healthcareBig Data and US healthcare
Big Data and US healthcare
 
The A3BC, Biobanks and the future of precision medicine
The A3BC, Biobanks and the future of precision medicine The A3BC, Biobanks and the future of precision medicine
The A3BC, Biobanks and the future of precision medicine
 
Advancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHMAdvancing-OSHMS High-Performance WS in OHM
Advancing-OSHMS High-Performance WS in OHM
 
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...Visualization Tools for the Refinery Platform - Supporting reproducible resea...
Visualization Tools for the Refinery Platform - Supporting reproducible resea...
 
Sun==big data analytics for health care
Sun==big data analytics for health careSun==big data analytics for health care
Sun==big data analytics for health care
 
Modern Genomics in conjunction with Patients IP on the value chain
Modern Genomics in conjunction with Patients IP on the value chainModern Genomics in conjunction with Patients IP on the value chain
Modern Genomics in conjunction with Patients IP on the value chain
 
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docx
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docxMAKING SENSE OFSTATISTICSWhat statistics tell you an.docx
MAKING SENSE OFSTATISTICSWhat statistics tell you an.docx
 
Deep Learning in Medicine
Deep Learning in MedicineDeep Learning in Medicine
Deep Learning in Medicine
 
Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research  Reaching out to collaborators and crowdsourcing for pharmaceutical research
Reaching out to collaborators and crowdsourcing for pharmaceutical research
 
Epidemiological studies
Epidemiological studiesEpidemiological studies
Epidemiological studies
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
Nature_Moffitt_Sellers
Nature_Moffitt_SellersNature_Moffitt_Sellers
Nature_Moffitt_Sellers
 
The role of big data in medicine
The role of big data in medicineThe role of big data in medicine
The role of big data in medicine
 
Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014Univ of Miami CTSI: Citizen science seminar; Oct 2014
Univ of Miami CTSI: Citizen science seminar; Oct 2014
 
Cancer and Work Design v1.01
Cancer and Work Design v1.01Cancer and Work Design v1.01
Cancer and Work Design v1.01
 

More from Lewis Lin 🦊

Gaskins' memo pitching PowerPoint
Gaskins' memo pitching PowerPointGaskins' memo pitching PowerPoint
Gaskins' memo pitching PowerPointLewis Lin 🦊
 
P&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementP&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementLewis Lin 🦊
 
Jeffrey Katzenberg on Disney Studios
Jeffrey Katzenberg on Disney StudiosJeffrey Katzenberg on Disney Studios
Jeffrey Katzenberg on Disney StudiosLewis Lin 🦊
 
Carnegie Mellon MS PM Internships 2020
Carnegie Mellon MS PM Internships 2020Carnegie Mellon MS PM Internships 2020
Carnegie Mellon MS PM Internships 2020Lewis Lin 🦊
 
Gallup's Notes on Reinventing Performance Management
Gallup's Notes on Reinventing Performance ManagementGallup's Notes on Reinventing Performance Management
Gallup's Notes on Reinventing Performance ManagementLewis Lin 🦊
 
Twitter Job Opportunities for Students
Twitter Job Opportunities for StudentsTwitter Job Opportunities for Students
Twitter Job Opportunities for StudentsLewis Lin 🦊
 
Facebook's Official Guide to Technical Program Management Candidates
Facebook's Official Guide to Technical Program Management CandidatesFacebook's Official Guide to Technical Program Management Candidates
Facebook's Official Guide to Technical Program Management CandidatesLewis Lin 🦊
 
Performance Management at Google
Performance Management at GooglePerformance Management at Google
Performance Management at GoogleLewis Lin 🦊
 
Google Interview Prep Guide Software Engineer
Google Interview Prep Guide Software EngineerGoogle Interview Prep Guide Software Engineer
Google Interview Prep Guide Software EngineerLewis Lin 🦊
 
Google Interview Prep Guide Product Manager
Google Interview Prep Guide Product ManagerGoogle Interview Prep Guide Product Manager
Google Interview Prep Guide Product ManagerLewis Lin 🦊
 
Skills Assessment Offering by Lewis C. Lin
Skills Assessment Offering by Lewis C. LinSkills Assessment Offering by Lewis C. Lin
Skills Assessment Offering by Lewis C. LinLewis Lin 🦊
 
How Men and Women Differ Across Leadership Traits
How Men and Women Differ Across Leadership TraitsHow Men and Women Differ Across Leadership Traits
How Men and Women Differ Across Leadership TraitsLewis Lin 🦊
 
Product Manager Skills Survey
Product Manager Skills SurveyProduct Manager Skills Survey
Product Manager Skills SurveyLewis Lin 🦊
 
Uxpin Why Build a Design System
Uxpin Why Build a Design SystemUxpin Why Build a Design System
Uxpin Why Build a Design SystemLewis Lin 🦊
 
30-Day Google PM Interview Study Guide
30-Day Google PM Interview Study Guide30-Day Google PM Interview Study Guide
30-Day Google PM Interview Study GuideLewis Lin 🦊
 
30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study GuideLewis Lin 🦊
 
36-Day Amazon PM Interview Study Guide
36-Day Amazon PM Interview Study Guide36-Day Amazon PM Interview Study Guide
36-Day Amazon PM Interview Study GuideLewis Lin 🦊
 
McKinsey's Assessment on PM Careers
McKinsey's Assessment on PM CareersMcKinsey's Assessment on PM Careers
McKinsey's Assessment on PM CareersLewis Lin 🦊
 
Five Traits of Great Product Managers
Five Traits of Great Product ManagersFive Traits of Great Product Managers
Five Traits of Great Product ManagersLewis Lin 🦊
 

More from Lewis Lin 🦊 (20)

Gaskins' memo pitching PowerPoint
Gaskins' memo pitching PowerPointGaskins' memo pitching PowerPoint
Gaskins' memo pitching PowerPoint
 
P&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand ManagementP&G Memo: Creating Modern Day Brand Management
P&G Memo: Creating Modern Day Brand Management
 
Jeffrey Katzenberg on Disney Studios
Jeffrey Katzenberg on Disney StudiosJeffrey Katzenberg on Disney Studios
Jeffrey Katzenberg on Disney Studios
 
Carnegie Mellon MS PM Internships 2020
Carnegie Mellon MS PM Internships 2020Carnegie Mellon MS PM Internships 2020
Carnegie Mellon MS PM Internships 2020
 
Gallup's Notes on Reinventing Performance Management
Gallup's Notes on Reinventing Performance ManagementGallup's Notes on Reinventing Performance Management
Gallup's Notes on Reinventing Performance Management
 
Twitter Job Opportunities for Students
Twitter Job Opportunities for StudentsTwitter Job Opportunities for Students
Twitter Job Opportunities for Students
 
Facebook's Official Guide to Technical Program Management Candidates
Facebook's Official Guide to Technical Program Management CandidatesFacebook's Official Guide to Technical Program Management Candidates
Facebook's Official Guide to Technical Program Management Candidates
 
Performance Management at Google
Performance Management at GooglePerformance Management at Google
Performance Management at Google
 
Google Interview Prep Guide Software Engineer
Google Interview Prep Guide Software EngineerGoogle Interview Prep Guide Software Engineer
Google Interview Prep Guide Software Engineer
 
Google Interview Prep Guide Product Manager
Google Interview Prep Guide Product ManagerGoogle Interview Prep Guide Product Manager
Google Interview Prep Guide Product Manager
 
Skills Assessment Offering by Lewis C. Lin
Skills Assessment Offering by Lewis C. LinSkills Assessment Offering by Lewis C. Lin
Skills Assessment Offering by Lewis C. Lin
 
How Men and Women Differ Across Leadership Traits
How Men and Women Differ Across Leadership TraitsHow Men and Women Differ Across Leadership Traits
How Men and Women Differ Across Leadership Traits
 
Product Manager Skills Survey
Product Manager Skills SurveyProduct Manager Skills Survey
Product Manager Skills Survey
 
Uxpin Why Build a Design System
Uxpin Why Build a Design SystemUxpin Why Build a Design System
Uxpin Why Build a Design System
 
Sourcing on GitHub
Sourcing on GitHubSourcing on GitHub
Sourcing on GitHub
 
30-Day Google PM Interview Study Guide
30-Day Google PM Interview Study Guide30-Day Google PM Interview Study Guide
30-Day Google PM Interview Study Guide
 
30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide30-Day Facebook PM Interview Study Guide
30-Day Facebook PM Interview Study Guide
 
36-Day Amazon PM Interview Study Guide
36-Day Amazon PM Interview Study Guide36-Day Amazon PM Interview Study Guide
36-Day Amazon PM Interview Study Guide
 
McKinsey's Assessment on PM Careers
McKinsey's Assessment on PM CareersMcKinsey's Assessment on PM Careers
McKinsey's Assessment on PM Careers
 
Five Traits of Great Product Managers
Five Traits of Great Product ManagersFive Traits of Great Product Managers
Five Traits of Great Product Managers
 

Recently uploaded

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

Big Data and the Future by Sherri Rose

  • 1. august2012 47© 2012 The Royal Statistical Society Big data and the future The Human Genome Project began in 1990. By 2003 it had produced a map of the approxi- mately 20 000–25 000 genes that make the human species what it is. It was a tremendous effort, a defining project in humankind’s un- derstanding of itself. The collection, analysis, and interpretation of that huge amount of data took 12 years of concentrated effort by the best human minds and the best human-made machines. The human mind has not changed since then, but the human-made machines have. So has what they can do. The first “big data” research area I was exposed to as a young undergraduate student was genomics. Steven Salzberg, Profes- sor of Medicine and Biostatistics at Johns Hop- kins University, summarised today’s genomics landscape for me: “Next-generation sequencing technology can now generate more data in a single day than the entire Human Genome Pro- ject generated in 12 years. It has transformed biomedical science.” It was apparent to us as novices that working as a statistician in genom- ics would require more than a deep understand- ing of the biology involved; it would need also a special arsenal of statistical, computational, and technological tools. There is just too much data for the old ways to handle. Even just looking at the data brings prob- lems. “Simply moving this data around presents major challenges to many scientists and institu- tions: their networks just aren’t fast enough”, he says. And that is before you start working on it: “Analysing the data is a much bigger problem. With such large data sets, it is all too easy to find rare statistical anomalies and to confuse them with real phenomena.” When there are millions of data points, and many tests, false positives are much more likely. “Big data” has become a hot topic across many disparate fields, and for good reason. One burgeoning new area that generates very large data sets is the study of brain images. Ani Eloyan, a postdoctoral fellow at Johns Hopkins Bloomberg School of Public Health, is part of the team that won a recent prediction con- test examining attention deficit hyperactivity disorder (ADHD), the 2011 ADHD-200 Global Competition. They used neuroimaging data and other information to categorise subjects into neurotypical, ADHD primary inattentive type, or At the beginning of her career Sherri Rose discusses big data and stands amazed at its potential. © iStockphoto.com/Pgiam With such large data sets, it is all too easy to find rare statistical anomalies and to confuse them with real phenomena
  • 2. august201248 ADHD combined type diagnoses. Understanding the unique complexities of imaging data is no small task. “Brain imaging data mostly consist of collections of three-dimensional arrays (of, for example, intensities) collected over time result- ing in a four-dimensional array for each subject”, she says (for a related example, see Feigelson and Babu’s image from astronomy on page 23). “The first major issue in analysing these data is the simple fact that our brains are very differ- ent in size, shape and so on. In many cases the transformation of the matrices into a common space – a form in which they can be compared to each other – is still an open problem which is hindering the analysis of the data.” The analysis of large comprehensive medical databases is another area where statisticians are lending their skills. These databases are even making appearances in the mainstream media, thanks to a new competition. Follow- ing in the footsteps of the $1 million Netflix Prize (see page 40), where teams developed algorithms to improve upon the content pro- vider’s existing recommendation system for movies, is an even larger big-data prize that is perhaps more socially useful: the $3 million Heritage Health Prize Competition. Its goal is to predict future hospitalisations using exist- ing high-dimensional patient data. A behemoth example of a massive clinical database is the US Food and Drug Administration’s Sentinel Initia- tive, which aims to monitor drugs and medical devices for safety over time. The end result of this programme will be a national electronic system, and the new system already has access to 100 million people and their medical records. Consider the volume of medical data that one person can accumulate over a very few years: repeated measurements of blood pressure, lung function, antibody concentrations, digitalised X-rays and scans and the rest. Multiply that by 100 million and you get an idea of the size of the database. As one can imagine, the sheer scale of this project and its longitudinal nature – its following of patients through time – provide interesting statistical challenges. One complexity is ac- curately defining the data. In this case, among other considerations, it involves acknowledging that subjects “drop out” and are not followed for the entire time period scientists are studying. The model has to include a mechanism that generates these missing values, as subjects do not drop out at random: issues such as drug toxicity could be leading to drop out – which is a bit of a problem if drug toxicity is the very thing you are trying to study. The traditional assumptions of parametric modelling are not likely to be supported by what is known about how the data was generated. Mark van der Laan, Professor of Biostatistics and Statistics at the University of California, Berkeley, and authority in statistical learning with missing data, is working on the Sentinel Initiative. “We need to use the state of the art in estimation without relying on restrictive assumptions; we need methods that aim to learn from these large data sets as much as the data allow.” Electronic medical records are only part of big data. They are being linked with other big data sets to study, for example, environmental issues such as air quality. Examining the health effects of air quality brings in the additional component of geography: different regions have different particles in the air. Cory Zigler is a postdoctoral fellow at Harvard School of Public Health. He has recently investigated the causal effects of the 1990 Clean Air Act amendments on millions of Medicare beneficiaries, with appli- cations to health and environmental policy. Dr. Zigler illuminated the many issues environmental biostatisticians face: There are satellites measuring markers of ambient air quality at increasingly fine spatial and temporal resolutions. Couple that with the wealth of health data being collected in administrative databases, and we’re faced with a big data challenge in environmental epidemiology. But all the data in the world won’t change some of the salient issues such as the fact that people who live near one another share many things in common in addition to the air they breathe. Teasing out the health effects of air pollution from other factors requires thoughtful statistical reasoning throughout the entire process: you must define the right question, choose the right spatial and temporal resolution of the data, ultimately apply the right analytical methods and interpret them correctly. This must be a combined effort from people with a wide array of quantitative skills. Disentangling the impact of the community, neighbourhood, and household effects is a far- reaching challenge found in many other fields as well. What will the big data allow, once we have learned how to handle it properly? Alessio Fasa- no, Director of the Center for Celiac Research at the University of Maryland School of Medicine, is an expert on gluten-related disorders. His vision of the future is inspiring: Imagine that you have in your hands the ability to unveil the secrets of hu- man biology, to establish how the human host interacts and communicates with the “parallel civilisation” of bacteria liv- ing in symbiosis with us, to understand the yin and yang between tolerance and immune response, and the ability to turn on and off autoimmune diseases at will. Imagine, in other words, that you have the power to decipher the secrets of complex diseases, so that innovative preventive and therapeutic interventions can be developed. All this is theoretically possible with celiac disease, the only autoimmune disease for which the en- vironmental trigger is known. However, these goals are achievable only if robust statistical methodologies are applied to elaborate the enormous amount of data that we have recently acquired, thanks to advances in our knowledge about ce- liac disease pathogenesis. Trying to make sense of the complexity of celiac disease without fundamentals in statistics is like trying to decipher Egyptian hieroglyph- ics without having the key to interpret them. Dr. Fasano is leading innovative new projects studying the introduction of gluten in infants and their microbial environment, science that will advance human knowledge radically, but only with the collaboration of medical, scientific – and statistical – experts. As a recent statistical trainee, a freshly- minted (bio)statistics PhD, I believe we as young researchers have a responsibility. Our new projects, will be interdisciplinary. We must ground them in the fundamentals and principles of statistics. They will involve many new col- leagues and disciplines, and will increasingly be our future. Small wonder that young researchers are getting excited about big data. Sherri Rose is an NSF postdoctoral fellow in biostatis- tics at the Johns Hopkins Bloomberg School of Public Health.  She recently coauthored the book Targeted Learning: Causal Inference for Observational and Ex- perimental Data for the Springer Series in Statistics. Imagine you have in your hands the ability to understand the yin and yang between tolerance and immune response, the power to decipher the secrets of complex diseases