SlideShare a Scribd company logo
1 of 59
Download to read offline
EDA Visualization
Orozco Hsu
2024-03-20
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Experiences
• Telecom big data Innovation
• Retail Media Network (RMN)
• Customer Data Platform (CDP)
• Know-your-customer (KYC)
• Digital Transformation
• Research
• Data Ops (ML Ops)
• Business Data Analysis, AI
2
Tutorial
Content
3
Story telling and visualization
Exploration Data Analysis and Visualization
Home work
What is data visualization?
Code
• Download materials:
• https://drive.google.com/drive/folders/1ibppjANnGy2RYe5CW805MwHrprm2
nu5f?usp=sharing
4
學習 Python 的建議書籍
• 史上最強Python入門邁向頂尖高手之路王者歸來
5
https://www.books.com.tw/products/0010976050?sloc=main
Python 視覺化套件
6
https://jovian.com/aakashns/python-matplotlib-data-visualization
Get ready to your Orange 3
• Open source machine learning and data visualization
• Version: 3.36.2
• https://orangedatamining.com/
7
Story telling With Data (SWD)
• Always remember Data Comparison!
• Focus on simplicity and ease of interpretation
• The takeaways!
8
https://www.storytellingwithdata.com
From touchdowns to takeaways
9
Sorting categories
10
A vertical bar chart can be a better choice if data is ordinal
Allow the labels to be written in a single,
easily readable line
11
Rainbow palette, overly distracting!
• If the goal is to observe the「fluctuation of commercials across
categories over the five years」, we could better achieve that by
iterating to a different graph type.
• On the other hand, if we’re meant simply to compare the overall
category trends,「toning down the color」usage might be beneficial.
12
Color in only the year with the highest
number of commercials in each category
13
This results in a visually chaotic!
2023
2022
Over-Time
The Over-Time means the Line-Graph
14
An overly complex visualization with numerous overlapping data series
In order of total number of commercials
across all five years of data
15
Bar charts instead of line graphs, we can
intentionally emphasize that aspect of our data
16
The number of commercial advertisers in each category, in each year, is a countable
The area graph small multiple chart
17
A visualization of this on social media.
It maintains visual interest while facilitating more straightforward
comparisons across categories over several years.
A combination of line graphs with descriptive
captions to convey these insights more clearly
18
A combination of line graphs with descriptive
captions to convey these insights more clearly
19
A combination of line graphs with descriptive
captions to convey these insights more clearly
20
Conclusion
• There is no singularly correct approach to data visualization.
• The key is to consider the audience's needs, the context of the
presentation, and the intended message.
• Visualizing data is as much an art as it is a science, requiring
experimentation, iteration, and feedback, rather than adherence to a
strict set of rules.
•All about communications!
21
https://www.storytellingwithdata.com/blog
What is data visualization?
• Data visualization is the graphical representation of information and
data.
• By using visual elements like charts, graphs, and maps.
• A way to see and understand trends, outliers, and patterns in data.
22
What is data visualization?
23
https://www.tableau.com/learn/articles/data-visualization#advantages-disadvantages
24
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
25
The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
Static chart
• There are generally THREE STEPS in drawing a chart:
• Observing the data, determine the relationship, and select the chart.
• What type of data it is, and what content you want to express.
• Category
• Numeric
• Text
• Datetime
• After clarifying the content to be expressed, you can choose which chart to
use to express it.
26
Pie chart
• You must have some kind of whole
amount that is divided into a number
of distinct parts.
• Your primary objective in a pie chart
should be to compare each group’s
contribution to the whole.
27
Line chart
• Line charts provide the clearest
graphical representation of time-
related variables and are the
preferred mode for representing
trends or variables over time.
28
Histogram chart
• It is used to summarize discrete
or continuous data that are
measured on an interval scale.
• It is often used to illustrate the
major features of the distribution
of the data in a convenient form.
29
Bar chart
• It provides a way of showing
data values represented as
the comparison of multiple
data sets side by side.
30
Differences between histogram and bar chart
Comparison terms Bar chart Histogram
Usage
To compare different categories of
data.
To display the distribution of a variable.
Type of variable Categorical variables Numeric variables
Rendering
Each data point is rendered as a
separate bar.
The data points are grouped and
rendered based on the bin value.
The entire range of data values is
divided into a series of non-
overlapping intervals.
Space between bars Can have space. No space.
Reordering bars Can be reordered. Cannot be reordered.
31
Scatter Plot
• It uses dots to
represent values for
two different numeric
variables and observe
relationships between
variables.
32
Pearson Correlation
Box plot
• Q1: The first quartile (25%) position.
• Q3: The third quartile (75%) position.
• Interquartile range (IQR)
• Lower and upper 1.5*IQR whiskers:
These represent the limits and
boundaries for the outliers.
• Outliers: Defined as observations that
fall below Q1 − 1.5 IQR or above Q3 +
1.5 IQR.
33
Box plot
34
35
New workflow
36
Add some widgets file, and data table
37
Open Orange workflow
• Double click 01.ows
38
Modify your output file path
• Check each of
Python widget,
change the old
path to your
existing path.
39
Dataset description (titanic.csv)
• In total with 12 columns.
• A training dataset to
predict whether passengers
will survive in the Titanic
accident.
40
Data Summary
• Load titanic.csv
• Data description
• Look at Names, Types, Role,
Values in table.
• Change the configurations
of Columns.
41
Data Summary
• Missing values
• Using the Features
Statistics Widget
• How about those missing
ratios?
42
Remove columns (called data preprocessing)
• Using Select columns widget.
43
Impute columns (called data preprocessing)
• Using Impute columns widget.
• For Default Method
• For each column
44
Pie chart
• Orange 3 has deprecated
Pie chat widget
• Use Python Script widget.
45
Line chart
• Using Line Plot widget.
• Typically, trend analysis
charts are presented
together with time-based
data.
46
Distribution chart
• Using distributions widget to
compare each variables.
47
Scatter plot
• Using scatter plot widget.
• It used to observe the degree
of correlation between
features
• positive correlation
• negative correlation
• noncorrelation
48
Box plot
• Using box plot widget.
• Comparing multiple
features with each other
49
Pivot Table
• Using pivot table widget.
• It summarizes the data
of a more extensive
table into a table of
statistics.
• The statistics can include
sums, averages, counts,
etc.
50
1. Show me top 10 data rows
• Hint: Use Data Sampler widget
51
2. Show me dataset info
• How many Rows?
• How many Features?
• All information like this!
52
3. Get a count of the number of survivors
53
4. Survival Conclusion
• For features, SEX, PCLASS, SIBSP,
PARCH, EMBARKED
• Women had a higher chance of survival
than men.
• First-class passengers had a higher
chance of survival.
• Passengers with siblings, spouses had a
higher chance of survival.
• Passengers with children and parents
had a higher chance of survival.
• Departing from the S terminal may
lead to lower cabin class and lower
chances of survival.
54
5. Show me sex survival rate
55
6. Look at survival rate by SEX and PCLASS
• Women in first class had a survival rate as high as 96.8%. In contrast,
men in economy class only had a 13.54% chance of survival
56
7. Look at survival rate by SEX, AGE and
PCLASS
• In the event of a disaster, women in
first class or business class have a 90%
chance of survival regardless of age.
• On the other hand, if a man is in
economy class and older than 18, the
chance of survival is only 13.36%.
• To summarize, in a disaster scenario,
girls and women have a higher chance
of survival compared to boys and men.
• Additionally, the higher the class (such
as first class), the higher the chances
of survival.
57
8. The price paid of each class
• Try to plot Pclass and Fare chart
to visualize data
• Every seat had someone board
for free, while others spent over
500 pounds for a first-class
ticket. It's quite an interesting
observation!
58
9. Visualizing data and express your thoughts
• Using today’s teaching knowledge and referencing
Story_telling_with_data.pdf, please visualize and analysis this data
(20240320_HW.csv) with the theme of sales.
• Based on your observations, explain the relationship between sales
and these variables.
59

More Related Content

Similar to 資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf

Similar to 資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf (20)

Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Data visualization
Data visualizationData visualization
Data visualization
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Guidelines for data visualisation: eye vegetables and eye candy
Guidelines for data visualisation: eye vegetables and eye candyGuidelines for data visualisation: eye vegetables and eye candy
Guidelines for data visualisation: eye vegetables and eye candy
 
Data Visualization Tips for Oracle BICS and DVCS
Data Visualization Tips for Oracle BICS and DVCSData Visualization Tips for Oracle BICS and DVCS
Data Visualization Tips for Oracle BICS and DVCS
 
Data visualization.pptx
Data visualization.pptxData visualization.pptx
Data visualization.pptx
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 
Visual Analytics in Big Data
Visual Analytics in Big DataVisual Analytics in Big Data
Visual Analytics in Big Data
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Quality Tools & Techniques Presentation.pptx
Quality Tools & Techniques Presentation.pptxQuality Tools & Techniques Presentation.pptx
Quality Tools & Techniques Presentation.pptx
 
04-Visual-Analytics-and-Tableau the given ppt-I.pptx
04-Visual-Analytics-and-Tableau the given ppt-I.pptx04-Visual-Analytics-and-Tableau the given ppt-I.pptx
04-Visual-Analytics-and-Tableau the given ppt-I.pptx
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
EDA.pptx
EDA.pptxEDA.pptx
EDA.pptx
 
Big data
Big dataBig data
Big data
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 

More from FEG

2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
FEG
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
FEG
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
FEG
 
1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf
FEG
 

More from FEG (20)

Sequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdfSequence Model pytorch at colab with gpu.pdf
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318Pytorch cnn netowork introduction 20240318
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices2023 Decision Tree analysis in business practices
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch2023 Clustering analysis using Python from scratch
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch2023 Data visualization using Python from scratch
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
 
202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)202312 Exploration Data Analysis Visualization (English version)
202312 Exploration Data Analysis Visualization (English version)
 
Transfer Learning (20230516)
Transfer Learning (20230516)Transfer Learning (20230516)
Transfer Learning (20230516)
 
Image Classification (20230411)
Image Classification (20230411)Image Classification (20230411)
Image Classification (20230411)
 
Google CoLab (20230321)
Google CoLab (20230321)Google CoLab (20230321)
Google CoLab (20230321)
 
Supervised Learning
Supervised LearningSupervised Learning
Supervised Learning
 
UnSupervised Learning Clustering
UnSupervised Learning ClusteringUnSupervised Learning Clustering
UnSupervised Learning Clustering
 
6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf6_Association_rule_碩士班第六次.pdf
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf5_Neural_network_碩士班第五次.pdf
5_Neural_network_碩士班第五次.pdf
 
4_Regression_analysis.pdf
4_Regression_analysis.pdf4_Regression_analysis.pdf
4_Regression_analysis.pdf
 
3_Decision_tree.pdf
3_Decision_tree.pdf3_Decision_tree.pdf
3_Decision_tree.pdf
 
2_Clustering.pdf
2_Clustering.pdf2_Clustering.pdf
2_Clustering.pdf
 
1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf1_大二班_資料視覺化_20221028.pdf
1_大二班_資料視覺化_20221028.pdf
 

Recently uploaded

SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
Peter Brusilovsky
 
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
nhezmainit1
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
cupulin
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MysoreMuleSoftMeetup
 

Recently uploaded (20)

80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
Major project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategiesMajor project report on Tata Motors and its marketing strategies
Major project report on Tata Motors and its marketing strategies
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
SPLICE Working Group: Reusable Code Examples
SPLICE Working Group:Reusable Code ExamplesSPLICE Working Group:Reusable Code Examples
SPLICE Working Group: Reusable Code Examples
 
Pharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdfPharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdf
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
How to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptxHow to Manage Website in Odoo 17 Studio App.pptx
How to Manage Website in Odoo 17 Studio App.pptx
 
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 

資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Experiences • Telecom big data Innovation • Retail Media Network (RMN) • Customer Data Platform (CDP) • Know-your-customer (KYC) • Digital Transformation • Research • Data Ops (ML Ops) • Business Data Analysis, AI 2
  • 3. Tutorial Content 3 Story telling and visualization Exploration Data Analysis and Visualization Home work What is data visualization?
  • 4. Code • Download materials: • https://drive.google.com/drive/folders/1ibppjANnGy2RYe5CW805MwHrprm2 nu5f?usp=sharing 4
  • 5. 學習 Python 的建議書籍 • 史上最強Python入門邁向頂尖高手之路王者歸來 5 https://www.books.com.tw/products/0010976050?sloc=main
  • 7. Get ready to your Orange 3 • Open source machine learning and data visualization • Version: 3.36.2 • https://orangedatamining.com/ 7
  • 8. Story telling With Data (SWD) • Always remember Data Comparison! • Focus on simplicity and ease of interpretation • The takeaways! 8 https://www.storytellingwithdata.com
  • 9. From touchdowns to takeaways 9
  • 10. Sorting categories 10 A vertical bar chart can be a better choice if data is ordinal
  • 11. Allow the labels to be written in a single, easily readable line 11
  • 12. Rainbow palette, overly distracting! • If the goal is to observe the「fluctuation of commercials across categories over the five years」, we could better achieve that by iterating to a different graph type. • On the other hand, if we’re meant simply to compare the overall category trends,「toning down the color」usage might be beneficial. 12
  • 13. Color in only the year with the highest number of commercials in each category 13 This results in a visually chaotic! 2023 2022 Over-Time
  • 14. The Over-Time means the Line-Graph 14 An overly complex visualization with numerous overlapping data series
  • 15. In order of total number of commercials across all five years of data 15
  • 16. Bar charts instead of line graphs, we can intentionally emphasize that aspect of our data 16 The number of commercial advertisers in each category, in each year, is a countable
  • 17. The area graph small multiple chart 17 A visualization of this on social media. It maintains visual interest while facilitating more straightforward comparisons across categories over several years.
  • 18. A combination of line graphs with descriptive captions to convey these insights more clearly 18
  • 19. A combination of line graphs with descriptive captions to convey these insights more clearly 19
  • 20. A combination of line graphs with descriptive captions to convey these insights more clearly 20
  • 21. Conclusion • There is no singularly correct approach to data visualization. • The key is to consider the audience's needs, the context of the presentation, and the intended message. • Visualizing data is as much an art as it is a science, requiring experimentation, iteration, and feedback, rather than adherence to a strict set of rules. •All about communications! 21 https://www.storytellingwithdata.com/blog
  • 22. What is data visualization? • Data visualization is the graphical representation of information and data. • By using visual elements like charts, graphs, and maps. • A way to see and understand trends, outliers, and patterns in data. 22
  • 23. What is data visualization? 23 https://www.tableau.com/learn/articles/data-visualization#advantages-disadvantages
  • 24. 24 The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
  • 25. 25 The Pyramid of Data Needs (and why it matters for your career) | by Hugh Williams | Medium
  • 26. Static chart • There are generally THREE STEPS in drawing a chart: • Observing the data, determine the relationship, and select the chart. • What type of data it is, and what content you want to express. • Category • Numeric • Text • Datetime • After clarifying the content to be expressed, you can choose which chart to use to express it. 26
  • 27. Pie chart • You must have some kind of whole amount that is divided into a number of distinct parts. • Your primary objective in a pie chart should be to compare each group’s contribution to the whole. 27
  • 28. Line chart • Line charts provide the clearest graphical representation of time- related variables and are the preferred mode for representing trends or variables over time. 28
  • 29. Histogram chart • It is used to summarize discrete or continuous data that are measured on an interval scale. • It is often used to illustrate the major features of the distribution of the data in a convenient form. 29
  • 30. Bar chart • It provides a way of showing data values represented as the comparison of multiple data sets side by side. 30
  • 31. Differences between histogram and bar chart Comparison terms Bar chart Histogram Usage To compare different categories of data. To display the distribution of a variable. Type of variable Categorical variables Numeric variables Rendering Each data point is rendered as a separate bar. The data points are grouped and rendered based on the bin value. The entire range of data values is divided into a series of non- overlapping intervals. Space between bars Can have space. No space. Reordering bars Can be reordered. Cannot be reordered. 31
  • 32. Scatter Plot • It uses dots to represent values for two different numeric variables and observe relationships between variables. 32 Pearson Correlation
  • 33. Box plot • Q1: The first quartile (25%) position. • Q3: The third quartile (75%) position. • Interquartile range (IQR) • Lower and upper 1.5*IQR whiskers: These represent the limits and boundaries for the outliers. • Outliers: Defined as observations that fall below Q1 − 1.5 IQR or above Q3 + 1.5 IQR. 33
  • 35. 35
  • 37. Add some widgets file, and data table 37
  • 38. Open Orange workflow • Double click 01.ows 38
  • 39. Modify your output file path • Check each of Python widget, change the old path to your existing path. 39
  • 40. Dataset description (titanic.csv) • In total with 12 columns. • A training dataset to predict whether passengers will survive in the Titanic accident. 40
  • 41. Data Summary • Load titanic.csv • Data description • Look at Names, Types, Role, Values in table. • Change the configurations of Columns. 41
  • 42. Data Summary • Missing values • Using the Features Statistics Widget • How about those missing ratios? 42
  • 43. Remove columns (called data preprocessing) • Using Select columns widget. 43
  • 44. Impute columns (called data preprocessing) • Using Impute columns widget. • For Default Method • For each column 44
  • 45. Pie chart • Orange 3 has deprecated Pie chat widget • Use Python Script widget. 45
  • 46. Line chart • Using Line Plot widget. • Typically, trend analysis charts are presented together with time-based data. 46
  • 47. Distribution chart • Using distributions widget to compare each variables. 47
  • 48. Scatter plot • Using scatter plot widget. • It used to observe the degree of correlation between features • positive correlation • negative correlation • noncorrelation 48
  • 49. Box plot • Using box plot widget. • Comparing multiple features with each other 49
  • 50. Pivot Table • Using pivot table widget. • It summarizes the data of a more extensive table into a table of statistics. • The statistics can include sums, averages, counts, etc. 50
  • 51. 1. Show me top 10 data rows • Hint: Use Data Sampler widget 51
  • 52. 2. Show me dataset info • How many Rows? • How many Features? • All information like this! 52
  • 53. 3. Get a count of the number of survivors 53
  • 54. 4. Survival Conclusion • For features, SEX, PCLASS, SIBSP, PARCH, EMBARKED • Women had a higher chance of survival than men. • First-class passengers had a higher chance of survival. • Passengers with siblings, spouses had a higher chance of survival. • Passengers with children and parents had a higher chance of survival. • Departing from the S terminal may lead to lower cabin class and lower chances of survival. 54
  • 55. 5. Show me sex survival rate 55
  • 56. 6. Look at survival rate by SEX and PCLASS • Women in first class had a survival rate as high as 96.8%. In contrast, men in economy class only had a 13.54% chance of survival 56
  • 57. 7. Look at survival rate by SEX, AGE and PCLASS • In the event of a disaster, women in first class or business class have a 90% chance of survival regardless of age. • On the other hand, if a man is in economy class and older than 18, the chance of survival is only 13.36%. • To summarize, in a disaster scenario, girls and women have a higher chance of survival compared to boys and men. • Additionally, the higher the class (such as first class), the higher the chances of survival. 57
  • 58. 8. The price paid of each class • Try to plot Pclass and Fare chart to visualize data • Every seat had someone board for free, while others spent over 500 pounds for a first-class ticket. It's quite an interesting observation! 58
  • 59. 9. Visualizing data and express your thoughts • Using today’s teaching knowledge and referencing Story_telling_with_data.pdf, please visualize and analysis this data (20240320_HW.csv) with the theme of sales. • Based on your observations, explain the relationship between sales and these variables. 59