SlideShare a Scribd company logo
1 of 34
Dr. Brian J. Spiering
Practical Tips On
Handling Big Data
hi, brian.
Data Science Faculty @GalvanizeU
@BrianSpiering
Roadmap
Defining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
Defining “Big Data”
(aka, you probably don’t have Big Data)
1
What is Big Data?
“Data sets with sizes beyond the ability of
commonly used software tools to capture,
curate, manage, and process data within a
tolerable amounts of time.”
What is Big Data?
“Data sets with sizes beyond the ability of
commonly used software tools to capture,
curate, manage, and process data within a
tolerable amounts of time.”
Data that doesn’t find on a single machine.
What is not Big Data?
How to avoid Big Data
(and associated problems)
2
Handling Medium Data
Cache
RAM
Disk
Data Center
Big Data Gotcha!
Scaling Out
1. Single Local Machine < 10s GB*
2. Single Cloud Machine < 2 TB*
3. Cloud Cluster of Machines > 2 TB*
* Summer 2016
Matrix Multiplication
Matrix Multiplication:
Imperative Implementation
Matrix Multiplication:
Functional Implementation
Matrix Multiplication
Head, Torso, Tail:
Separate models (and hardware)
Okay, I really have Big Data.
What should I do?
3
“But my data is more than 5TB!
(and I need it in memory)”
“But my data is more than 5TB!
(and I need it in memory)”
Your life sucks now…
You are stuck with
distributed computing
map reduce
Big Data is functional
What to do:
1. Learn some math tricks (linear algebra)
2. Learn how to optimize your code
3. Learn how to use cloud compute
4. Learn a Big Data Framework
Where have we been?
Defining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
Thank You!
Questions?
Practical Tips On Handling Big Data

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media MarketsMichael Driscoll
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applicationspanoratio
 
What do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerWhat do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerSahil Kumar
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAAdam Doyle
 
CeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsCeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsAsgar Mammadli
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Makinggregoryg
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningKatarzyna Mrowca
 
The Walking Data
The Walking DataThe Walking Data
The Walking DataJESS3
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
 
Hadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampHadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampJim Argeropoulos
 

What's hot (20)

Making Sense of Data
Making Sense of DataMaking Sense of Data
Making Sense of Data
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media Markets
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applications
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
What do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerWhat do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlinger
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
 
R & Data mining in action
R & Data mining in actionR & Data mining in action
R & Data mining in action
 
CeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsCeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web Insights
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data mining
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
The Walking Data
The Walking DataThe Walking Data
The Walking Data
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
 
Hadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampHadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code Camp
 

Similar to Practical Tips On Handling Big Data

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big OpportunitiesArimo, Inc.
 
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Reportjosnapv
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 
Big data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnBig data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnCathy McKnight
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAWim Van Leuven
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionGigaScience, BGI Hong Kong
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 

Similar to Practical Tips On Handling Big Data (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction Big data
Introduction Big data  Introduction Big data
Introduction Big data
 
Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big Opportunities
 
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Report
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
GADLJRIET850691
GADLJRIET850691GADLJRIET850691
GADLJRIET850691
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
Big data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnBig data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nn
 
"Big Data Dreams"
"Big Data Dreams""Big Data Dreams"
"Big Data Dreams"
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBA
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 

Recently uploaded

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 

Recently uploaded (20)

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 

Practical Tips On Handling Big Data

Editor's Notes

  1. Good Evening! Tonight, I’m going to sharing a couple Practical Tips on Handling Big Data I’m …
  2. I have been working in Big Data for the last couple of years. About 1 year ago joined Galvanize Galvanize is education company - build learning communities GalvanizeU is MSDS Teach NLP, Big Data, and Deep Learning
  3. Many people think they have big data but
  4. Here is popular quote. Does this sound reasonable
  5. I’m more precisely define what I mean by a single machine Compute (RAM) Storage (Disk)
  6. You can load hundreds of megabytes into memory in an efficient vectorized format. Tell story- I working at SaaS company my intern fitting a random forest for churn, 1mm rows / 1K attributes About R (8 hours), python (1 hour) Spark (10 minutes)
  7. I was working for a company doing competitive intelligence… In a data frame 5 GBs on my laptop. realtime <100ms Wes McKinney projects to scale out Pandas - ibis / arrow Single “computer”
  8. redefine “machine”
  9. 2TB of RAM 2,000 GB In memory DB limited roll out / use but it’s the future
  10. bigger, cheaper, faster, easier
  11. [Walk through slowly] http://www.theregister.co.uk/2016/04/04/memory_and_storage_boundary_changes/
  12. Remember doing competitive intelligence project. It took 5 minutes to load into RAM. “The difference between RAM and cache is its performance, cost, and proximity to the CPU. Cache is faster, more costly, and closest to the CPU. Due to the cost there is much less cache than RAM. The most basic computer is a CPU and storage for data. The structure we have these days is to give us the most bang for the buck. Generally faster is more expensive. For best performance the faster more expensive storage is closer to the CPU. The relation is like this: CPU-L1 cache-L2 cache-RAM-Hard drive-backup media(tape). The CPU itself has its registers for storing data. The cost per bit of storage goes down from the CPU out.”
  13. Stay local or stay in the cloud I was storing the data Moore’s Law: the number of transistors in a dense integrated circuit doubles approximately every two years 60% annual growth rate- printer will smaller font, more information on each sheet "Kryder's Law” A 2005 Scientific American article, titled observed that magnetic disk areal storage density was then increasing very quickly.[2] The pace was then much faster than the two-year doubling time of semiconductor chip density posited by Moore's law. Nielsen's Law of Internet bandwidth states that: a high-end user's connection speed grows by 50% per year
  14. These numbers are going to change - Both in value - Relative tipping point What is your preference?
  15. Alex Smola - Carggie Melon now at leading AWS machine learning offerings
  16. http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture1/lecture1.html regression (GLM) PCA / eigenvalue
  17. Vanilla Python very clear & logical very slow
  18. functional programming is an API call: what, not how less code functional * hides optimizations we can swap out the underlying code
  19. optimization distribute/parrelize by row send each row to a worker (core or cluster member )
  20. Power Law - The internet 101 Chris Anderson Movies - a few blockbuster, many in the middle of the pack, youtube/vimeo has enable MANY amateur cinampthoers
  21. Power Law - The internet 101 Head, Torso, Tail for recommenders Keep: Head in Cache Torse in RAM Tail on Disk
  22. - Learn spark 1st then go back to Hadoop Spark, just works better and easy to understand Beyond the scope of the talk, DataBricks Cloud
  23. Get out the data center as quickly as possible Simple ETL into aggregate Competitive intelligence project. I would ETL on the cloud and fit arggeaget data locally
  24. inputs and output Hadoop / MapReduce / Spark extends but is still functional practice on simple problems then extend to data
  25. Keep The Goal, The Goal. I love delight people, especially customers What are trying to do with your data? Properly spec’d then not big data Data Density