Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Concepts, use cases
and principles to build
big data systems
http://www.bigdatavietnam.org
https://www.facebook.com/bigdat...
Key Contents
1. Introduction to the key Big Data concepts
○ The Origins of Big Data
○ What is Big Data ?
○ Why is Big Data...
Introduction to the
key Big Data
concepts
○ The Origins of Big Data
○ What is Big Data ?
○ Why is Big Data so
important ?
...
The Origins of Big Data
https://www.kdnuggets.com/2017/02/origins-big-data.html
What is Big Data ?
What is Big Data ?
What is Big Data ?
Why is Big Data So Important ?
Why is Big Data So Important ?
Source: https://internetofthingsagenda.techtarget.com/definition/Internet-of-Things-IoT
How Is Big Data Used In Practice ?
How Is Big Data Used In Practice ?
Why is Big Data So Important ?
How Is Big Data Used In Practice ?
Device Analytics
Which device is most
popular used ?
How Is Big Data Used In Practice ?
Time-series Analytics
The peak hours of system
How Is Big Data Used In Practice ?
GeoLocation Heatmap Analytics
Introduction to the
key principles of
Big Data Systems
○ How to design Data
Pipeline in 6 steps
○ Using Lambda
Architectur...
How to design Data Pipeline Systems
Collecting → Storing → Processing → Analyzing → Learning → Visualizing
Data engineerin...
Data Engineer Tasks Data Analyst Tasks
Big Data Analytics Lifecycle
Collecting
Storing
Processing
Analyzing
Learning
Visua...
(Collecting) → Storing → Processing → Analyzing
→ Learning → Reacting
Collecting
Collecting tools
Batch collecting: Apache Sqoop ( from DBMS to Apache Hadoop)
Real-time collecting: Log Collector with Apa...
Collecting → (Storing) → Processing → Analyzing
→ Learning → Reacting
Storing Concepts
● Clusters
● Scale-Up vs Scale-Out
● File Systems and Distributed File Systems
● NoSQL
● Sharding
● Repli...
Clusters
Scale-Up vs Scale-Out
Database in Big Data
NoSQL
NoSQL
Sharding
Replication (Master-Slave)
Replication (Peer-to-Peer)
CAP Theorem
Collecting → Storing → (Processing) → Analyzing
→ Learning → Reacting
Processing concepts
● Parallel Data Processing
● Distributed Data Processing
● Hadoop
● Processing Workloads
● Cluster
● P...
Parallel Data Processing
Distributed Data Processing
Hadoop
Hadoop is a versatile framework that provides both processing and
storage capabilities
Batch processing (offline processing)
Transactional processing
Cluster
Map and Reduce Tasks
Processing in Realtime Mode
When standard relational database
(Oracle,MySQL, ...) is not good enough
the “analytic system” MySQL database from a start...
3 common problems in Big Data System
1. Size: the volume of the datasets is a critical factor.
2. Complexity: the structur...
Key ideas of Lambda Architecture in Big Data System
Practical case
study Chat bot with Video
Recommendation Engine
Problem
● A company want to develop a chat bot for
news recommendation
● They want to classify data into standard
categori...
Solution Diagram
Big Data
is here
Author @tantrieuf31
Problem: Topic Classification for News
Solution Diagram
FAQ for students
How to learn Big Data ?
Job Opportunity
Ref resources
How to learn Big Data ?
1. Have lots of passion, curiosity with data
2. Knowledge about data structure, statistics and bas...
Big Data Job Market is really hot
https://www.class-central.com/subject/big-data
Some good books for self-learning
● http://sachvui.com/ebook/du-lieu-lon-big-data.281.html
● https://drive.google.com/open...
Free MOOC
https://www.class-central.com/subject/big-data
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
Upcoming SlideShare
Loading in …5
×

5

Share

Download to read offline

Concepts, use cases and principles to build big data systems (1)

Download to read offline

1) Introduction to the key Big Data concepts
1.1 The Origins of Big Data
1.2 What is Big Data ?
1.3 Why is Big Data So Important ?
1.4 How Is Big Data Used In Practice ?

2) Introduction to the key principles of Big Data Systems
2.1 How to design Data Pipeline in 6 steps
2.2 Using Lambda Architecture for big data processing

3) Practical case study : Chat bot with Video Recommendation Engine

4) FAQ for student

Related Books

Free with a 30 day trial from Scribd

See all

Concepts, use cases and principles to build big data systems (1)

  1. 1. Concepts, use cases and principles to build big data systems http://www.bigdatavietnam.org https://www.facebook.com/bigdatavn Compiled by Nguyễn Tấn Triều
  2. 2. Key Contents 1. Introduction to the key Big Data concepts ○ The Origins of Big Data ○ What is Big Data ? ○ Why is Big Data So Important ? ○ How Is Big Data Used In Practice ? 2. Introduction to the key principles of Big Data Systems ○ How to design Data Pipeline in 6 steps ○ Using Lambda Architecture for big data processing 3. Practical case study ○ Chat bot with Video Recommendation Engine 4. FAQ for student
  3. 3. Introduction to the key Big Data concepts ○ The Origins of Big Data ○ What is Big Data ? ○ Why is Big Data so important ? ○ How Is Big Data used in practice ?
  4. 4. The Origins of Big Data
  5. 5. https://www.kdnuggets.com/2017/02/origins-big-data.html
  6. 6. What is Big Data ?
  7. 7. What is Big Data ?
  8. 8. What is Big Data ?
  9. 9. Why is Big Data So Important ?
  10. 10. Why is Big Data So Important ?
  11. 11. Source: https://internetofthingsagenda.techtarget.com/definition/Internet-of-Things-IoT How Is Big Data Used In Practice ?
  12. 12. How Is Big Data Used In Practice ?
  13. 13. Why is Big Data So Important ?
  14. 14. How Is Big Data Used In Practice ? Device Analytics Which device is most popular used ?
  15. 15. How Is Big Data Used In Practice ? Time-series Analytics The peak hours of system
  16. 16. How Is Big Data Used In Practice ? GeoLocation Heatmap Analytics
  17. 17. Introduction to the key principles of Big Data Systems ○ How to design Data Pipeline in 6 steps ○ Using Lambda Architecture for big data processing
  18. 18. How to design Data Pipeline Systems Collecting → Storing → Processing → Analyzing → Learning → Visualizing Data engineering process: 3 tasks 1. Collecting a. Concepts b. Technology 2. Storing a. Big Data Storage Concepts b. Big Data Storage Technology 3. Processing a. Big Data Processing Concepts b. Big Data Processing Technology Data Science/Machine Learning process: 3 tasks 4) Analyzing → 5) Learning → 5) Visualizing
  19. 19. Data Engineer Tasks Data Analyst Tasks Big Data Analytics Lifecycle Collecting Storing Processing Analyzing Learning Visualizing
  20. 20. (Collecting) → Storing → Processing → Analyzing → Learning → Reacting
  21. 21. Collecting
  22. 22. Collecting tools Batch collecting: Apache Sqoop ( from DBMS to Apache Hadoop) Real-time collecting: Log Collector with Apache Kafka
  23. 23. Collecting → (Storing) → Processing → Analyzing → Learning → Reacting
  24. 24. Storing Concepts ● Clusters ● Scale-Up vs Scale-Out ● File Systems and Distributed File Systems ● NoSQL ● Sharding ● Replication ● Sharding and Replication ● CAP Theorem
  25. 25. Clusters
  26. 26. Scale-Up vs Scale-Out
  27. 27. Database in Big Data
  28. 28. NoSQL
  29. 29. NoSQL
  30. 30. Sharding
  31. 31. Replication (Master-Slave)
  32. 32. Replication (Peer-to-Peer)
  33. 33. CAP Theorem
  34. 34. Collecting → Storing → (Processing) → Analyzing → Learning → Reacting
  35. 35. Processing concepts ● Parallel Data Processing ● Distributed Data Processing ● Hadoop ● Processing Workloads ● Cluster ● Processing in Batch Mode ● Processing in Realtime Mode
  36. 36. Parallel Data Processing
  37. 37. Distributed Data Processing
  38. 38. Hadoop Hadoop is a versatile framework that provides both processing and storage capabilities
  39. 39. Batch processing (offline processing)
  40. 40. Transactional processing
  41. 41. Cluster
  42. 42. Map and Reduce Tasks
  43. 43. Processing in Realtime Mode
  44. 44. When standard relational database (Oracle,MySQL, ...) is not good enough the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...
  45. 45. 3 common problems in Big Data System 1. Size: the volume of the datasets is a critical factor. 2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor. 3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
  46. 46. Key ideas of Lambda Architecture in Big Data System
  47. 47. Practical case study Chat bot with Video Recommendation Engine
  48. 48. Problem ● A company want to develop a chat bot for news recommendation ● They want to classify data into standard categories (26 categories) for user-friendly query ● The engineering team have develop a data pipeline for system
  49. 49. Solution Diagram Big Data is here Author @tantrieuf31
  50. 50. Problem: Topic Classification for News
  51. 51. Solution Diagram
  52. 52. FAQ for students How to learn Big Data ? Job Opportunity Ref resources
  53. 53. How to learn Big Data ? 1. Have lots of passion, curiosity with data 2. Knowledge about data structure, statistics and basic maths 3. Love to solve complex problems with data-driven mindset 4. Database knowledge: when to use NoSQL vs RDBMS 5. Knowledge about distributed computing 6. Linux / Open Source Tools 7. Programming language: Python / Java / SQL / JavaScript 8. English skills
  54. 54. Big Data Job Market is really hot https://www.class-central.com/subject/big-data
  55. 55. Some good books for self-learning ● http://sachvui.com/ebook/du-lieu-lon-big-data.281.html ● https://drive.google.com/open?id=0B3dHGVpTXDOhQXJCR01PVkpQMGM ● https://drive.google.com/file/d/1rPvfio6EkaUvGtgfQoq9p9Fa2ljOMIn1/view?usp=sharing ● https://drive.google.com/open?id=0B3dHGVpTXDOhVTBKX09NUnlLcm8
  56. 56. Free MOOC https://www.class-central.com/subject/big-data
  • Antonioni100

    Dec. 18, 2018
  • NguyenThanhTung13

    Nov. 26, 2018
  • jcunniet

    Nov. 21, 2018
  • web2bvn

    Nov. 7, 2018
  • thanhcnn2000

    Nov. 7, 2018

1) Introduction to the key Big Data concepts 1.1 The Origins of Big Data 1.2 What is Big Data ? 1.3 Why is Big Data So Important ? 1.4 How Is Big Data Used In Practice ? 2) Introduction to the key principles of Big Data Systems 2.1 How to design Data Pipeline in 6 steps 2.2 Using Lambda Architecture for big data processing 3) Practical case study : Chat bot with Video Recommendation Engine 4) FAQ for student

Views

Total views

3,334

On Slideshare

0

From embeds

0

Number of embeds

2,708

Actions

Downloads

55

Shares

0

Comments

0

Likes

5

×