SlideShare a Scribd company logo
1 of 22
EXABYTE
PETABYTE
TERABYTE
GIGABYTE
KILOBYTE
Management Issues
Career Direction
Technical Resources
Speaker:
Dr. Kang Mun Arturo Tan
Assistant Professor
Management Information Systems
Management Sciences Department
Date: Dec 23, 2012
Time: 12:20 – 13:10 pm
Place: Room 75
Yanbu University College
Report Highlights and Units
The McKinsey Global Institute Report
Capturing Big Data Value
What is Big Data?
Various Dimensions of Big Data
Tools and Technologies
Additional Tools and Technologies
How can we benefit from big data?
Transforming the organization
Talent Specifications
Educational Courses / Training
References
Cloudera Distribution including Hadoop
Conclusion
The Units
Multiples of bytes
SI decimal prefixes
Name
(Symbol)
Value
kilobyte(kB) 103
megabyte(MB) 106
gigabyte(GB) 109
terabyte(TB) 1012
petabyte(PB) 1015
exabyte
(EB)
1018
zettabyte(ZB) 1021
yottabyte(YB) 1024
To_Main
Talk Highlights - - - - - - - - - -
CERN generates 40 TB/sec of data
In 2009, nearly all sectors in the US economy
have 200 TB of data on the average.
One Exabyte approximately equals 4000 X
Information stored in the US Library of
Congress
235 Terabytes data were collected by the US
Library of Congress in April 2011
15 out of 17 firms in the US have more data
stored per company than the US Library of
Congress
By 2018, US will have shortage of 140,000 to
190,000 deep data analysts and 1.5 million
more data savvy managers.
Training on big data is still in its infancy.
The McKinsey Global Institute (MGI) Report
 MGI, May 2011, issued a report about organizations
being deluged with data.
 This large amount of data is generally referred to as
“Big Data.”
 The use/analysis of Big Data will be the basis of
innovation, competition and productivity.
 Corporations will be using “Data Science” to properly
manage and utilize “Big Data.”
 Data Scientists are the elite and specialized class of
highly-compensated data cleaning, analysis and
visualization experts.
To_MainNext_Capturing_its_Value
Capturing its Value
 $300 Billion/year - Potential annual value to US Health
Care
 €250 Billion/year – Potential annual value to the European
government administration
 $600 Billion – Potential annual consumer surplus from
using personal location data globally
 60% potential increase in retailer’s operating margins
possible with big data
 However, USA alone needs 140,000 to 190,000 more
deep analytical talent positions, and
 1.5 million more data-savvy managers needed to take
full advantage of big data.
To_MainNext_What_is_Big_Data
What is Big Data?
 Large data sets which are impossible to manage with
conventional database tools.
 Size is relative. What is big today will be small
tomorrow.
 In 2011, our global output of data was estimated at 1.8
zettabytes. Big Data consists of
 Structured, machine-friendly information
 Unstructured, human-friendly information (email,
social media, video, audio, click-streams and images.)
To_MainNext_Various_Dimension
Various Dimensions
 Volume – terabytes … petabytes of information
 Variety – extends well beyond structured data: text,
audio, video, click streams, log files, etc.
 Velocity – frequently time-sensitive, big data must be
used with its stream into the enterprise in order to
maximize its value. (Example: static average:
 7:00AM: 1,3 (4/2 ->Avg = 2) 10:00AM: 1,3,5 (9/3->3)
 Dynamic: 1,3(_, 4,2,2) (5, 9,3,3)
To_MainNext_Tools_Technologies
Tools and Technologies
 Hadoop – is a free, Java-based programming
framework that supports the processing large data sets
in a distributed computing environment.
 Facebook, LinkedIn, Twitter, eBay use Hadoop.
 Hadoop is at the center of this decade’s Big Data
revolution.
 In 2011, five major companies embraced Hadoop:
EMC, IBM, Informatica, Microsoft and Oracle.
To_MainNext_Additional_TechnologiesJump_CDH
Additional Technologies
– a scalable multi-master database with no single
points of failure
 Chukwa – data collection system for managing large distributed
systems
 Hbase – a scalable, distributed database that supports structured
data storage for large tables
 Hive – a data warehouse infrastructure that provides data
summarization and ad hoc querying
 Mahout – a scalable machine learning and data mining library
 Pig – a high-level data-flow language and execution framework
for parallel computation
 Zookeeper – a high-performance coordination service for
distributed applications
To_MainNext_Commercial_Technology
Commercial Technology (CDH)
CDH (Cloudera Distribution including Hadoop)
File System Mount
(Fuse-DPS)
UI Framework/SDK
(Hue)
Data Mining
(Apache Mahout)
Workflow
(Apache Oozie)
Scheduling
(Apache Oozie)
Metadata
(Apache Hive)
Data Integration
(Apache FLUME,
Apache SQOOP)
Languages/Compilers
(Apache Pig, Apache Hive) Fast Read/ Write
Access (Apache
Hbase)
Hadoop
Coordination (Apache Zookeeper)
SCM Express (Installation Wizard)
To_MainNext_How_To_Benefit_from_Big_DataGoTo_Hadoop
How to benefit from Big Data?
 Choose the right data
 Data should be in line with corporate objectives.
 Build models that predict and optimize outcomes
 Hypothesis-based model building is better.
 Transform your company’s capabilities
 Data Science is not a replacement for human judgment.
To_MainTransforming_Your_Company
Transforming your company
 Leadership – companies succeed because they have leadership
teams that set clear goals, define what success looks like and ask
the right questions.
 Talent Management – companies need to manage a unique breed
of individuals who are scientists but who are comfortable with
the language of business.
 Technology – The tools to handle the volume, velocity and
variety of big data are always a necessary component of big data
strategy.
 Decision Making – An effective organization puts information
and the relevant decision rights in the same location.
 Company Culture – Companies should NOT ask “What do we
think?” but should ask “What do we know?”
To_MainNext_Talent_Specifications
Talent Specifications
 Hybrid of data hacker, communicator and trusted
adviser
 Universal skill: ability to write code
 Can communicate in a language that his stakeholders
understand
 Can tell story with data, whether verbally, visually or
both
 Many of the brightest data scientists are PhD in
esoteric fields like ecology and systems biology
To_MainNext_Talent_Specifications_2
Talent Specifications - 2
 Roumeliotis, PhD in Astrophysics, Head of Data Science
Team at Intuit in Silicon Valley begins his search for
candidates by:
 asking the candidate if they can develop prototypes in any
mainstream programming language, like Java.
 seeking a skill set consisting of: Mathematics, Statistics,
Probability and Computer Science and a certain habits of
the mind (curiosity, inventiveness, discipline, endurance?).
 looking for people with a feel for business issues and
empathy for customers.
 immersing the candidate with on-the-job training with
occasional course in a particular technology
To_MainNext_Talent_Specifications_3
Talent Specifications - 3
 Many of the data scientists working in business today
were formally trained in computer science,
mathematics, statistics or economics.
 They can emerge from any field that has a strong data
and computational focus.
 Hal Varian, the chief economist at Google, is known to
have said, “The next sexy job in the next 10 years
will be statisticians.”
To_MainNext_Courses
Courses
 There are only few formal courses being offered right
now.
 Data Science is at the center of:
 Computer Science, Operations Research, Statistics and
Business
To_Main
Statistics
Data
Science
Business
Operations
Research
Computer
Science
Next_Schools_Offering_Data_Science
Schools offering Data Science
 Master of Science in Analytics (MSA)
 Institute for Advanced Analytics
 North Carolina State University
 = = =
 The class of 2012 has the following job statistics:
 -15 interviews per student
 - Average base salary offer with professional experience
$99,600
 $65,000 to $160,000 for candidates with experience
 $60,000 to $100,000 for candidates with no experience
To_MainNext_Schools_offering_Data_Science_2
Schools offering Data Science -2
 Insight Data Science Fellows Program
 - a postdoctoral fellowship designed by Jake Klamka ( a
High-Energy Physicist by training) takes scientists
from academia and in six weeks prepares them to
succeed as data scientists
 Syracuse University’s School of Information Studies
(iSchool)
 Rensselaer Polytechnic’s Data Science Research Center
To_MainNext_Conclusion
Conclusion
 Big Data is now a reality with a huge profit potential.
 Tools and Technologies are available through Open-Source.
 Each one of us can benefit from working with Big Data
(dynamic) in its pure form or in its traditional form
(static).
 Data Science is the path towards the full utilization of Big
Data.
 Schools are in the process of offering Data Science
programs.
 Students could pursue a career on Data Science programs.
 Doing statistical interpretation is the everyday work
routine of Data Science. (Many commercial
implementations exist.)
To_MainNext_to_Reference
Commercial Implementations
 SAP Hana – Metscale
 Microsoft Parallel Data warehouse
 Exadata Database Machine (Oracle)
 Exalytics In-Memory Machine (Oracle)
 Greenplum Data Computing Appliance (EMC)
 Netezza Data Warehouse Appliance (IBM)
 Vertica Analytics Platform (HP)
 SolidDB (IBM)
 Teracotta BigMemory (Software AG) …
To_Conclusion To_Main
References
 1. McKinsey Global Institute Report 2011
 2. A Simple Introduction to Data Science
 - Noreen Burlingame and Lars Nielsen – 2012
 3. Big Data Now
 - Allen Noren, 2011 (O’Reilly Radar Team)
 4. What is Data Science
 - Mike Loukides, 2011 (O’Reilly Media)
 5. Big Data: The Management Revolution
 - Andrew McAfee and Erik Brynjolfsson (Harvard Bus Rev – Oct 2012)
 6. Data Scientist: The Sexiest Job of the 21st Century
 - Thomas H. Davenport and D.J. Patil (Harvard Bus Rev –Oct 2012)
 7. Making Advance Analytics Work for You
 - Dominic Barton and David Court (Harvard Bus Rev –Oct 2012)
 8. Various YouTube Materials / Hadoop - Stanford University
To_MainNext_To_ThankYou
To_Main

More Related Content

What's hot

Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzaniaijsrd.com
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social mediaSupriya Radhakrishna
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET Journal
 
A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...IJSRD
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Scienceijtsrd
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fataSuraj Sawant
 

What's hot (20)

Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of TanzaniaHadoop and Big Data Readiness in Africa: A Case of Tanzania
Hadoop and Big Data Readiness in Africa: A Case of Tanzania
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
Big data road map
Big data road mapBig data road map
Big data road map
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Big data privacy issues in public social media
Big data privacy issues in public social mediaBig data privacy issues in public social media
Big data privacy issues in public social media
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
IRJET- A Scenario on Big Data
IRJET- A Scenario on Big DataIRJET- A Scenario on Big Data
IRJET- A Scenario on Big Data
 
A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE Theorem
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
A Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data ScienceA Review Paper on Big Data and Hadoop for Data Science
A Review Paper on Big Data and Hadoop for Data Science
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Map Reduce in Big fata
Map Reduce in Big fataMap Reduce in Big fata
Map Reduce in Big fata
 

Viewers also liked

Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Ashok Royal
 
Social Big Data in Government
Social Big Data in GovernmentSocial Big Data in Government
Social Big Data in GovernmentAdegboyega Ojo
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Ashok Royal
 
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and Consciousness
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and ConsciousnessDan Faggella - TEDx Slides 2015 - Artificial intelligence and Consciousness
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and ConsciousnessDaniel Faggella
 
On Digital Transformation - 10 Observations
On Digital Transformation - 10 ObservationsOn Digital Transformation - 10 Observations
On Digital Transformation - 10 ObservationsMike Arauz
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBernard Marr
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkVolker Hirsch
 

Viewers also liked (11)

Big Data
Big Data Big Data
Big Data
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 
Social Big Data in Government
Social Big Data in GovernmentSocial Big Data in Government
Social Big Data in Government
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)
 
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and Consciousness
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and ConsciousnessDan Faggella - TEDx Slides 2015 - Artificial intelligence and Consciousness
Dan Faggella - TEDx Slides 2015 - Artificial intelligence and Consciousness
 
On Digital Transformation - 10 Observations
On Digital Transformation - 10 ObservationsOn Digital Transformation - 10 Observations
On Digital Transformation - 10 Observations
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should KnowBig Data - 25 Amazing Facts Everyone Should Know
Big Data - 25 Amazing Facts Everyone Should Know
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
TEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of WorkTEDx Manchester: AI & The Future of Work
TEDx Manchester: AI & The Future of Work
 

Similar to On Big Data

Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life RevolutionCapgemini
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIBig Data Week
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataSitaram Kotnis
 

Similar to On Big Data (20)

Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data
Big data Big data
Big data
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Complete-SRS.doc
Complete-SRS.docComplete-SRS.doc
Complete-SRS.doc
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
AIS 3 - EDITED.pdf
AIS 3 - EDITED.pdfAIS 3 - EDITED.pdf
AIS 3 - EDITED.pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAIMAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
MAKING SENSE OF IOT DATA W/ BIG DATA + DATA SCIENCE - CHARLES CAI
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Big data mining
Big data miningBig data mining
Big data mining
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 

More from arttan2001

Dax Power Pivot: Introductory Level
Dax Power Pivot: Introductory Level Dax Power Pivot: Introductory Level
Dax Power Pivot: Introductory Level arttan2001
 
Introduction to Excel VBA/Macros
Introduction to Excel VBA/MacrosIntroduction to Excel VBA/Macros
Introduction to Excel VBA/Macrosarttan2001
 
Uber210 slide share
Uber210 slide shareUber210 slide share
Uber210 slide sharearttan2001
 
Logistics Systems Design for the Yangtze River Delta Region
Logistics Systems Design for the Yangtze River Delta RegionLogistics Systems Design for the Yangtze River Delta Region
Logistics Systems Design for the Yangtze River Delta Regionarttan2001
 
Summarizing Siegel's Predictive Analytics
Summarizing Siegel's Predictive Analytics Summarizing Siegel's Predictive Analytics
Summarizing Siegel's Predictive Analytics arttan2001
 
Linux for Education
Linux for EducationLinux for Education
Linux for Educationarttan2001
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Softwarearttan2001
 

More from arttan2001 (7)

Dax Power Pivot: Introductory Level
Dax Power Pivot: Introductory Level Dax Power Pivot: Introductory Level
Dax Power Pivot: Introductory Level
 
Introduction to Excel VBA/Macros
Introduction to Excel VBA/MacrosIntroduction to Excel VBA/Macros
Introduction to Excel VBA/Macros
 
Uber210 slide share
Uber210 slide shareUber210 slide share
Uber210 slide share
 
Logistics Systems Design for the Yangtze River Delta Region
Logistics Systems Design for the Yangtze River Delta RegionLogistics Systems Design for the Yangtze River Delta Region
Logistics Systems Design for the Yangtze River Delta Region
 
Summarizing Siegel's Predictive Analytics
Summarizing Siegel's Predictive Analytics Summarizing Siegel's Predictive Analytics
Summarizing Siegel's Predictive Analytics
 
Linux for Education
Linux for EducationLinux for Education
Linux for Education
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 

Recently uploaded

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一F La
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证nhjeo1gg
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Recently uploaded (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
办理(UC毕业证书)英国坎特伯雷大学毕业证成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
在线办理WLU毕业证罗瑞尔大学毕业证成绩单留信学历认证
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

On Big Data

  • 1. EXABYTE PETABYTE TERABYTE GIGABYTE KILOBYTE Management Issues Career Direction Technical Resources Speaker: Dr. Kang Mun Arturo Tan Assistant Professor Management Information Systems Management Sciences Department Date: Dec 23, 2012 Time: 12:20 – 13:10 pm Place: Room 75 Yanbu University College
  • 2. Report Highlights and Units The McKinsey Global Institute Report Capturing Big Data Value What is Big Data? Various Dimensions of Big Data Tools and Technologies Additional Tools and Technologies How can we benefit from big data? Transforming the organization Talent Specifications Educational Courses / Training References Cloudera Distribution including Hadoop Conclusion
  • 3. The Units Multiples of bytes SI decimal prefixes Name (Symbol) Value kilobyte(kB) 103 megabyte(MB) 106 gigabyte(GB) 109 terabyte(TB) 1012 petabyte(PB) 1015 exabyte (EB) 1018 zettabyte(ZB) 1021 yottabyte(YB) 1024 To_Main Talk Highlights - - - - - - - - - - CERN generates 40 TB/sec of data In 2009, nearly all sectors in the US economy have 200 TB of data on the average. One Exabyte approximately equals 4000 X Information stored in the US Library of Congress 235 Terabytes data were collected by the US Library of Congress in April 2011 15 out of 17 firms in the US have more data stored per company than the US Library of Congress By 2018, US will have shortage of 140,000 to 190,000 deep data analysts and 1.5 million more data savvy managers. Training on big data is still in its infancy.
  • 4. The McKinsey Global Institute (MGI) Report  MGI, May 2011, issued a report about organizations being deluged with data.  This large amount of data is generally referred to as “Big Data.”  The use/analysis of Big Data will be the basis of innovation, competition and productivity.  Corporations will be using “Data Science” to properly manage and utilize “Big Data.”  Data Scientists are the elite and specialized class of highly-compensated data cleaning, analysis and visualization experts. To_MainNext_Capturing_its_Value
  • 5. Capturing its Value  $300 Billion/year - Potential annual value to US Health Care  €250 Billion/year – Potential annual value to the European government administration  $600 Billion – Potential annual consumer surplus from using personal location data globally  60% potential increase in retailer’s operating margins possible with big data  However, USA alone needs 140,000 to 190,000 more deep analytical talent positions, and  1.5 million more data-savvy managers needed to take full advantage of big data. To_MainNext_What_is_Big_Data
  • 6. What is Big Data?  Large data sets which are impossible to manage with conventional database tools.  Size is relative. What is big today will be small tomorrow.  In 2011, our global output of data was estimated at 1.8 zettabytes. Big Data consists of  Structured, machine-friendly information  Unstructured, human-friendly information (email, social media, video, audio, click-streams and images.) To_MainNext_Various_Dimension
  • 7. Various Dimensions  Volume – terabytes … petabytes of information  Variety – extends well beyond structured data: text, audio, video, click streams, log files, etc.  Velocity – frequently time-sensitive, big data must be used with its stream into the enterprise in order to maximize its value. (Example: static average:  7:00AM: 1,3 (4/2 ->Avg = 2) 10:00AM: 1,3,5 (9/3->3)  Dynamic: 1,3(_, 4,2,2) (5, 9,3,3) To_MainNext_Tools_Technologies
  • 8. Tools and Technologies  Hadoop – is a free, Java-based programming framework that supports the processing large data sets in a distributed computing environment.  Facebook, LinkedIn, Twitter, eBay use Hadoop.  Hadoop is at the center of this decade’s Big Data revolution.  In 2011, five major companies embraced Hadoop: EMC, IBM, Informatica, Microsoft and Oracle. To_MainNext_Additional_TechnologiesJump_CDH
  • 9. Additional Technologies – a scalable multi-master database with no single points of failure  Chukwa – data collection system for managing large distributed systems  Hbase – a scalable, distributed database that supports structured data storage for large tables  Hive – a data warehouse infrastructure that provides data summarization and ad hoc querying  Mahout – a scalable machine learning and data mining library  Pig – a high-level data-flow language and execution framework for parallel computation  Zookeeper – a high-performance coordination service for distributed applications To_MainNext_Commercial_Technology
  • 10. Commercial Technology (CDH) CDH (Cloudera Distribution including Hadoop) File System Mount (Fuse-DPS) UI Framework/SDK (Hue) Data Mining (Apache Mahout) Workflow (Apache Oozie) Scheduling (Apache Oozie) Metadata (Apache Hive) Data Integration (Apache FLUME, Apache SQOOP) Languages/Compilers (Apache Pig, Apache Hive) Fast Read/ Write Access (Apache Hbase) Hadoop Coordination (Apache Zookeeper) SCM Express (Installation Wizard) To_MainNext_How_To_Benefit_from_Big_DataGoTo_Hadoop
  • 11. How to benefit from Big Data?  Choose the right data  Data should be in line with corporate objectives.  Build models that predict and optimize outcomes  Hypothesis-based model building is better.  Transform your company’s capabilities  Data Science is not a replacement for human judgment. To_MainTransforming_Your_Company
  • 12. Transforming your company  Leadership – companies succeed because they have leadership teams that set clear goals, define what success looks like and ask the right questions.  Talent Management – companies need to manage a unique breed of individuals who are scientists but who are comfortable with the language of business.  Technology – The tools to handle the volume, velocity and variety of big data are always a necessary component of big data strategy.  Decision Making – An effective organization puts information and the relevant decision rights in the same location.  Company Culture – Companies should NOT ask “What do we think?” but should ask “What do we know?” To_MainNext_Talent_Specifications
  • 13. Talent Specifications  Hybrid of data hacker, communicator and trusted adviser  Universal skill: ability to write code  Can communicate in a language that his stakeholders understand  Can tell story with data, whether verbally, visually or both  Many of the brightest data scientists are PhD in esoteric fields like ecology and systems biology To_MainNext_Talent_Specifications_2
  • 14. Talent Specifications - 2  Roumeliotis, PhD in Astrophysics, Head of Data Science Team at Intuit in Silicon Valley begins his search for candidates by:  asking the candidate if they can develop prototypes in any mainstream programming language, like Java.  seeking a skill set consisting of: Mathematics, Statistics, Probability and Computer Science and a certain habits of the mind (curiosity, inventiveness, discipline, endurance?).  looking for people with a feel for business issues and empathy for customers.  immersing the candidate with on-the-job training with occasional course in a particular technology To_MainNext_Talent_Specifications_3
  • 15. Talent Specifications - 3  Many of the data scientists working in business today were formally trained in computer science, mathematics, statistics or economics.  They can emerge from any field that has a strong data and computational focus.  Hal Varian, the chief economist at Google, is known to have said, “The next sexy job in the next 10 years will be statisticians.” To_MainNext_Courses
  • 16. Courses  There are only few formal courses being offered right now.  Data Science is at the center of:  Computer Science, Operations Research, Statistics and Business To_Main Statistics Data Science Business Operations Research Computer Science Next_Schools_Offering_Data_Science
  • 17. Schools offering Data Science  Master of Science in Analytics (MSA)  Institute for Advanced Analytics  North Carolina State University  = = =  The class of 2012 has the following job statistics:  -15 interviews per student  - Average base salary offer with professional experience $99,600  $65,000 to $160,000 for candidates with experience  $60,000 to $100,000 for candidates with no experience To_MainNext_Schools_offering_Data_Science_2
  • 18. Schools offering Data Science -2  Insight Data Science Fellows Program  - a postdoctoral fellowship designed by Jake Klamka ( a High-Energy Physicist by training) takes scientists from academia and in six weeks prepares them to succeed as data scientists  Syracuse University’s School of Information Studies (iSchool)  Rensselaer Polytechnic’s Data Science Research Center To_MainNext_Conclusion
  • 19. Conclusion  Big Data is now a reality with a huge profit potential.  Tools and Technologies are available through Open-Source.  Each one of us can benefit from working with Big Data (dynamic) in its pure form or in its traditional form (static).  Data Science is the path towards the full utilization of Big Data.  Schools are in the process of offering Data Science programs.  Students could pursue a career on Data Science programs.  Doing statistical interpretation is the everyday work routine of Data Science. (Many commercial implementations exist.) To_MainNext_to_Reference
  • 20. Commercial Implementations  SAP Hana – Metscale  Microsoft Parallel Data warehouse  Exadata Database Machine (Oracle)  Exalytics In-Memory Machine (Oracle)  Greenplum Data Computing Appliance (EMC)  Netezza Data Warehouse Appliance (IBM)  Vertica Analytics Platform (HP)  SolidDB (IBM)  Teracotta BigMemory (Software AG) … To_Conclusion To_Main
  • 21. References  1. McKinsey Global Institute Report 2011  2. A Simple Introduction to Data Science  - Noreen Burlingame and Lars Nielsen – 2012  3. Big Data Now  - Allen Noren, 2011 (O’Reilly Radar Team)  4. What is Data Science  - Mike Loukides, 2011 (O’Reilly Media)  5. Big Data: The Management Revolution  - Andrew McAfee and Erik Brynjolfsson (Harvard Bus Rev – Oct 2012)  6. Data Scientist: The Sexiest Job of the 21st Century  - Thomas H. Davenport and D.J. Patil (Harvard Bus Rev –Oct 2012)  7. Making Advance Analytics Work for You  - Dominic Barton and David Court (Harvard Bus Rev –Oct 2012)  8. Various YouTube Materials / Hadoop - Stanford University To_MainNext_To_ThankYou