SlideShare a Scribd company logo
1 of 72
Download to read offline
Collaboration with Statistician? 
矩陣視覺化於探索式資料分析 
1 
陳君厚 
中央研究院 
統計科學研究所 
中央研究院 
人文社會科學館國際會議廳 
August 30, 2014
2 
http://escience.washington.edu/blog/new-phd-tracks-big-data
3 
Q: 
君厚, 
Big 
Data 
這麼熱, 
我們是否應該排進統計系所學程? 
A: 
Big 
Data? 
現在的統計系所畢業學生能夠處理 
Data 
嗎?
4 
胡海國 
教授 
台灣大學 
醫學院 
精神科主任 
主要合作研究者 
楊泮池 
教授 
台灣大學 
醫學院 
院長 
陳章榮 
研究員 
美國 
FDA 
毒理研究所 
李御賢 
教授 
長庚紀念醫院 
核心實驗室 
銘傳大學 
白果能 
研究員 
中央研究院 
生物醫學科學研究所 
吳漢銘 
教授 
淡江大學 
數學系 
周正中 
教授 
中正大學 
生命科學系 
楊永正 
教授 
陽明大學 
生物醫學資訊研究所 
林文昌 
研究員 
中央研究院 
生物醫學科學研究所 
黃奇英 
教授 
陽明大學 
臨床醫學研究所 
楊欣洲 
研究員 
中央研究院 
生物醫學科學研究所
I. Collaboration with Statistician? 
中研院賴明昭副院長推廣跨領域合作 
(November 
10, 
2004) 
合作不一定是資料分析 
與統計學家合作很可能是需要資料分析
陳老師,我的同事用Factor Analysis上 
同樣的Journal只花了三分之一的時間 
Luxury Research vs. Necessity Research 
Senior Researcher vs. Young Investigator 
(Established) vs. (Struggling) 
Corr. 
-1 0 1 
0 0.2 0.4 0.6 0.8 1 
Corr. 
-.1 .1 
-1 0 1 
Corr. 
Corr. 
-0.2 0.2 -0.4 0.4 
-1 0 1 
-1 0 1
Chun-houh, can you create some powerful 
statistical/bioinformatics methods so we can 
get our experiments published in Nature/ 
Science? 
Sir, can you conduct some meaningful 
biological/medical experiments so we can 
get our methods published in Nature/ 
Science?
Mutual Trust 
(XX 
人在中研院 
YY 
會議中說) 
統計所的陳君厚說你們 
ZZ 所的生物晶片 
都有問題 
Mutual Understanding 
You can remove Figure1 
together with my name 
from the paper 
G7 
N6 
G13 
N1 
N4 
N2 
N3 
N5 
G10 
G12 
G5 
P2 
G11 
N7 
G15 
G3 
G6 
G4 
G16 
G8 
P7 
S2 
G14 
S1 
S3 
P4 
P5 
G2 
G1 
P3 
G9 
P1 
P6 
Negative Disorg. Host./ 
Excit. 
Del./ 
Hall. 
G7 
N6 
G13 
N1 
N4 
N2 
N3 
N5 
G10 
G12 
G5 
P2 
G11 
N7 
G15 
G3 
G6 
G4 
G16 
G8 
P7 
S2 
G14 
S1 
S3 
P4 
P5 
G2 
G1 
15 10 5 0 
Average Euc lidean Distanc e 
G1 
GONEG 
G2 
G3 
GWNEG 
G4 
PANSS Score 
1 2 3 4 5 6 7 
P3 
G9 
P1 
P6 
Average Correlation 
1 0.8 0.6 0.4 0.2 
Correlation Coefficient 
-1 -0.5 0 0.5 1 
Negative 
Symptoms 
Disorganized 
Thought 
Hostility / 
Excitement 
Delusion / 
Ha llucination 
Correlation Coefficient 
-1 -0.5 0 0.5 1 
Average Correlation Negative Disorg. Host./ 
PANSS Score 
1 2 3 4 5 6 7 
G7 
N6 
N3 
N1 
N2 
N4 
N5 
G16 
G10 
N7 
G5 
G13 
G11 
P2 
G15 
G12 
G8 
P7 
S1 
G14 
S2 
S3 
P4 
P6 
P3 
G9 
P1 
P5 
G4 
G2 
G1 
G3 
G6 
G7 
N6 
N3 
N1 
N2 
N4 
N5 
G16 
G10 
N7 
G5 
G13 
G11 
P2 
G15 
G12 
G8 
P7 
S1 
G14 
S2 
S3 
P4 
P6 
P3 
G9 
P1 
P5 
G4 
G2 
G1 
G3 
G6 
Negative 
Symptoms 
Disorganized 
Thought 
Hostility / 
Excitement 
Delusion / 
Ha llucination 
Anxiety 
Symptoms 
RMG 
(n=61) 
PDHG1 
(n=14) 
MBG 
(n=50) 
PDHG2 
(n=38) 
0 5 10 
Average Euc lidean Distanc e 
0.2 0.4 0.6 0.8 1 
Excit. 
Del./ 
Hall. Anxiety
II. 
矩陣視覺化於探索式資料分析 
9 
Matrix Visualization: 
Approaching Statistics and Statistical Approach 
矩陣視覺化: 
趨近統計與統計趨勢
Lab 309 (???) for Information Visualization 
Dr. 田銀錦 
Postdoc. Fellow 
張勝傑 
張文宗 
陳柏旭 
鐘雅齡 
黃建勳 
林香誼 
劉勝宗 
曾聖澧 
葉紫君 
吳怡真 
林倩如 
歐陽智聞 
. . . 
10 
Mr. 高君豪 
Ph.D. student 
Prof. 吳漢銘 
Dept. Math. 
Tamkang U. 
Prof. 須上英 
Dept. Stat. 
Nat’l Taipei U. 
Ms. 石佳鑫 
Research Assistant 
Dr. 何孟如 
Postdoc. Fellow
11 
Data analysis 
A process of 
• inspecting data 
• cleaning data 
• transforming data 
• modeling data 
With the goal of 
• discovering useful information 
• suggesting conclusions 
• supporting decision making 
了解資料 探索式資料分析 
(Exploratory Data Analysis) 
資料視覺化
Exploratory Data Analysis 
EDA, John Tukey (1977) 
It is important to understand what you 
CAN DO before you learn to measure 
how WELL you seem to have DONE it. 
1915~ 
2000 
allow the data to speak for themselves 
before standard assumptions or formal modeling 
The greatest value of a picture is when it 
forces us to notice what we never expected to see. 
Matrix Visualization as an EDA tool for 
assisting formal mathematical modeling 
12
John 
W. 
Tukey在探索式資料分析 
(Exploratory 
Data 
Analysis, 
EDA) 
書中開宗明義地提到: 
It 
is 
important 
to 
understand 
what 
you 
CAN 
DO 
before 
you 
learn 
to 
measure 
how 
WELL 
you 
seem 
to 
have 
DONE 
it. 
學習你可以做什麼,有助於在資料分析的過程中達到 
事半功倍的效果。EDA的作用在於從「看」資料獲得 
資料所傳達的訊息,所著重的是簡單的算術與容易建 
構的圖、表。透過 
E D A 
對於圖表中所顯露之型樣 
(pattern) 
做一初步的認知與描述,再進一步以人類的心 
智 
(mind) 
對所接收的訊息做全面的分析與判斷,以探 
索潛藏於資料中的訊息。強調的是探索式的分析而非 
嚴謹的模式確認。
14 
資料視覺化需要統計嗎?
I. Setosa 
I. Verginica 
I. Versicolor 
Species name 
80 
60 
40 
20 
0 
Pet É Pet É Sep É Sep É 
Graphics/Visualization for 
high dimensional data? 
P5 p10 p100 p10000 
80 
60 
40 
20 
0 
Pet É Pet É Sep É Sep É 
80 
60 
40 
20 
30 60 90 120 
Series 
Dat 
a 
nscores 
Pet 
al widt 
h 
I.S I.V I.V 
50 
40 
30 
20 
10 
Species name 
15
Recent Review Articles for MV 
The History of the Cluster Heat Map 
Leland WILKINSON and Michael FRIENDLY 
The American Statistician, 
May 2009, Vol. 63, No. 2 179 
REVIEW 
Seriation and Matrix Reordering Methods: An 
Historical Overview by Innar Liiv 
Statistical Analysis and Data Mining 
3: 70–91, 2010 
Figure 2. Shaded matrix display from Loua 
(1873), available online at http:// 
books.google.com/books/. This was designed 
as a summary of 40 separate maps of Paris, 
showing the characteristics (e.g., national 
origin, professions, age, social classes) of 20 
districts, using a color scale 
ranging from white (low) through yellow and 
blue to red (high). 
Figure 3. Sorted shaded display from 
Brinton (1914). The data are 
ranks of U.S. states on each of 10 
educational features assessed in 
1910. The matrix has been sorted by 
the row-marginal ranks. 
Figure 5. Sorted shaded display 
from Czekanowski (1909), 
reproduced in Hage and Harary 
(1995). 
Figure 9. Cluster heat map from 
Wilkinson (1994). The data are 
social 
statistics (i.e., urbanization, 
literacy, life expectancy for 
females, GDP, health 
expenditures, educational 
expenditures, military 
expenditures, death rate, infant 
mortality, birth rate, and ratio of 
birth to death rate) from a 
United Nations survey of world 
countries. The variables were 
standardized before the 
hierarchical clustering was 
performed. 
Matrix Visualization (MV): 
reorderable matrix, heatmap, 
color histogram, data image1 6
Data Matrix 
50 題精神症狀量表 
11000040000050000000000000000022233022200342203300 
32111010001030000002000000000000000000000000000002 
55500010000000000011110000000033333005315121444420 
55554550055515505500100003000030022100000000200000 
00000000010000000000000000000010000000000000000000 
20200220200002000002202010000032312000002132212220 
00000000030000000000000000022000000000000000000000 
31100020000131300002503300000020202003043331300031 
55100440000034404000550000000044443044414355413330 
00000000030000000000000000000000000000000000000000 
20200000030122032000000000000020101000000030200020 
50100050045000000013400000000020012000000320442311 
00050050000420000000200000200010001000000031410300 
00000030155033000000400000000000000000000000040333 
55500010000004403300304044000033323030000002332222 
00000000020000000000000000032100000000000000000000 
11100020000010000000503000000020022200200300302034 
55400055000044404002000200000020020000000320404433 
00000000000010000000000000000010002002222001211200 
42400030040030000010402200023022322002000330301222 
33300040000020000024404334002030230000000330400045 
23100030000030000004003300000000030000000022454302 
44400030000440030020233200000043433334433232222231 
32100010022023002000000000000020022033302042413333 
11100000055000000002403000000000000000000100010002 
00000000000000000000300000000000000000000000000020 
44400040044000000033243303300043333433300344424444 
11100000130011120000301122221010111212211001212221 
11111111111115000005001111111155555005511555555551 
33100000030050000001004344111010110111100010111211 
00000000040000000000002300044000000000000000000020 
00000030010030000022103202200032323322202233303321 
22110011000110000032103221200033223222323333313321 
44100340020020000000500000000000000000000020200020 
00000012000020000000000003000000000002200200000200 
44000040100000000001500100100001111021002032202300 
22000040000040000002100000000020000100100030202210 
40400000020055300000000002000000000000010031212101 
55550540000034444000500200000000000000010000000210 
00000000000000000000000000000011012000200130203210 
44140040000441140000100000000011101010000000223200 
22200050000040001000010000000032324000002222220400 
55000530040030000000500000000000000000000000000000 
43400030000040000000000000000044434033311032444422 
33300000004453000000003303000042333333302133333321 
10020000000002000000200000000020032000021032211210 
33200030000041330001210001000020102222211032323310 
33300020030030000001102202200031202222201032202222 
33300030040030000002102202000022202201100020202200 
55511150551115555522335544101142424433454455445545 
00000030020000000012003202300022212311200131322211 
40000000033033333300305555004320111202031231110020 
00000000000000000000002200043000000000000020000030 
11000030200330000023203203000043333333302333333332 
55000020000023301000003421004023332133444034311151 
21110150111155552201000000000021234031111021335500 
55500010100001000020002202004033342122122250552344 
55500000000030000002200003000010000000320000000311 
44400350001033332200402102211001000000100030202321 
33000330030330020000304330003002200200000200000032 
00000000050000000020303320143000021000000030000033 
00000000020020000000300000020000000000000000200100 
11102120122000000002215511115020102302411140411155 
55500500050033410002000403433000000000000000000010 
44400240040030033320505434204040022400000240414445 
50500050000000000000000000000033313433101111113402 
55500050000555500000000000000022221002100033312300 
30330000000050000045000000000031243313304323303410 
55100130031033000002500000000033322023123122411323 
00000000000000000000002100230100000000000000000012 
22100120000030000000200000000000000000000230210222 
20000000030000000003200000004300000000000000000043 
00000000030000000001300000020000000000000120000031 
33300320023210300202303323000020002200022043220022 
00000021010040100001003301100021001001301210103200 
50400450040050000000500000000000000000033000000000 
20000200000000000000100000000043434033202224012330 
51100030000003000503003320000033344043154514411412 
55000530013000000002000000000030002300000000355400 
00000000000000002002000000000022201001200022200211 
33000300044020000001422000021000000000000020100030 
10000013020000000043004324304044434033412244402420 
33300020030000200000003302044200000000000000000032 
33000000000030000022323200000032222322310230202211 
00000000131000000002212100033200100101100111100011 
00000000030000000023200000033000000000000000000030 
00000023020020000002300000022000000001100022120010 
00000000040000000023200000032000000200000210100011 
00000000020000000000100000022110000100000000220010 
44400030030030000033300300023020000200000120000002 
00000000030020000022202000032200000100000000000011 
00000003040000000033310000043000000000000230000032 
22200000000001000000002211000021221011101032000021 
41400240000130000002503002000022000002200032432223 
20200000031000000000400000243000000000000030000030 
95 位精神醫學患者 
Data Map 
嚴重 
5 
0 
正常 
50 題精神症狀量表 
95 位精神醫學患者
Generalized Association Plots (GAP) for MV of continuous data 
GAP 
2.permutation 
4.summary 
3.partition 
Data Matrix 
Continuous 
ordinal 
Binary 
nominal 
1.presentation 
18 
Approaching Statistics  Statistical Approach 
11000040000050000000000000000022233022200342203300 
32111010001030000002000000000000000000000000000002 
55500010000000000011110000000033333005315121444420 
55554550055515505500100003000030022100000000200000 
00000000010000000000000000000010000000000000000000 
20200220200002000002202010000032312000002132212220 
00000000030000000000000000022000000000000000000000 
31100020000131300002503300000020202003043331300031 
55100440000034404000550000000044443044414355413330 
00000000030000000000000000000000000000000000000000 
20200000030122032000000000000020101000000030200020 
50100050045000000013400000000020012000000320442311 
00050050000420000000200000200010001000000031410300 
00000030155033000000400000000000000000000000040333 
55500010000004403300304044000033323030000002332222 
00000000020000000000000000032100000000000000000000 
11100020000010000000503000000020022200200300302034 
55400055000044404002000200000020020000000320404433 
00000000000010000000000000000010002002222001211200 
42400030040030000010402200023022322002000330301222 
33300040000020000024404334002030230000000330400045 
23100030000030000004003300000000030000000022454302 
44400030000440030020233200000043433334433232222231 
32100010022023002000000000000020022033302042413333 
11100000055000000002403000000000000000000100010002 
00000000000000000000300000000000000000000000000020 
44400040044000000033243303300043333433300344424444 
11100000130011120000301122221010111212211001212221 
11111111111115000005001111111155555005511555555551 
33100000030050000001004344111010110111100010111211 
00000000040000000000002300044000000000000000000020 
00000030010030000022103202200032323322202233303321 
22110011000110000032103221200033223222323333313321 
44100340020020000000500000000000000000000020200020 
00000012000020000000000003000000000002200200000200 
44000040100000000001500100100001111021002032202300 
22000040000040000002100000000020000100100030202210 
40400000020055300000000002000000000000010031212101 
55550540000034444000500200000000000000010000000210 
00000000000000000000000000000011012000200130203210 
44140040000441140000100000000011101010000000223200 
22200050000040001000010000000032324000002222220400 
55000530040030000000500000000000000000000000000000 
43400030000040000000000000000044434033311032444422 
33300000004453000000003303000042333333302133333321 
10020000000002000000200000000020032000021032211210 
33200030000041330001210001000020102222211032323310 
33300020030030000001102202200031202222201032202222 
33300030040030000002102202000022202201100020202200 
55511150551115555522335544101142424433454455445545 
00000030020000000012003202300022212311200131322211 
40000000033033333300305555004320111202031231110020 
00000000000000000000002200043000000000000020000030 
11000030200330000023203203000043333333302333333332 
55000020000023301000003421004023332133444034311151 
21110150111155552201000000000021234031111021335500 
55500010100001000020002202004033342122122250552344 
55500000000030000002200003000010000000320000000311 
44400350001033332200402102211001000000100030202321 
33000330030330020000304330003002200200000200000032 
00000000050000000020303320143000021000000030000033 
00000000020020000000300000020000000000000000200100 
11102120122000000002215511115020102302411140411155 
55500500050033410002000403433000000000000000000010 
44400240040030033320505434204040022400000240414445 
50500050000000000000000000000033313433101111113402 
55500050000555500000000000000022221002100033312300 
30330000000050000045000000000031243313304323303410 
55100130031033000002500000000033322023123122411323 
00000000000000000000002100230100000000000000000012 
22100120000030000000200000000000000000000230210222 
20000000030000000003200000004300000000000000000043 
00000000030000000001300000020000000000000120000031 
33300320023210300202303323000020002200022043220022 
00000021010040100001003301100021001001301210103200 
50400450040050000000500000000000000000033000000000 
20000200000000000000100000000043434033202224012330 
51100030000003000503003320000033344043154514411412 
55000530013000000002000000000030002300000000355400 
00000000000000002002000000000022201001200022200211 
33000300044020000001422000021000000000000020100030 
10000013020000000043004324304044434033412244402420 
33300020030000200000003302044200000000000000000032 
33000000000030000022323200000032222322310230202211 
00000000131000000002212100033200100101100111100011 
00000000030000000023200000033000000000000000000030 
00000023020020000002300000022000000001100022120010 
00000000040000000023200000032000000200000210100011 
00000000020000000000100000022110000100000000220010 
44400030030030000033300300023020000200000120000002 
00000000030020000022202000032200000100000000000011 
00000003040000000033310000043000000000000230000032 
22200000000001000000002211000021221011101032000021 
41400240000130000002503002000022000002200032432223 
20200000031000000000400000243000000000000030000030
Some essential elements in a GAP MV procedure 
1. Data Matrix 
(n * p) 
(w/ Color coding) 
Continuous 
Ordinal 
Binary 
Nominal 
2. Proximity Matrix for Subject 
(n * n) 
Continuous 
Ordinal 
Binary 
Nominal 
3. Proximity 
(Variable p * p) 
Continuous 
Ordinal 
Binary 
Nominal 
4. Permutation 
(variable) 
4. Permutation 
(subject) 
19
Statistical Approach 
Identify Global Trend: Singular Value Decomposition 
Chen 2002, 
Statistica Sinica 
Rank 2 Elliptical 
R2E 
20 
SVD 
SVD1 
Alter O. et al 
2000, PNAS 
SVD2 
-1 0 +1 
(c) Correlation 
-8 1:1 +8 
(a) Expression 
(d) 
-1 0 +1 
(b) Correlation
21 
Eisen et al. (1998) 
Tree seriation  flipping 
of intermediate nodes (a) 
A B C D E 
D 
(b) 
A 
E 
B C 
(c) 
C E D B A 
1 flip 3 flips 5 flips 
many 
flips 
2n-1=25-1=16 
Different Seriations (Ordering of Terminal Nodes or 
Leaves) Generated from Identical Tree Structure 
ideal 
model 
external and internal references 
for guiding flipping mechanism 
Statistical Approach: 
Identify Local Clusters
-1 0 +1 
(c) Correlation 
-8 1:1 +8 
(a) Expression 
Approaching Statistics  Statistical Approach 
HCT + R2E = HCTR2E 
(d) 
-1 0 +1 
(b) Correlation 
-1 0 +1 
(c) Correlation 
(d) (e) 
-1 0 +1 
-8 1:1 +8 
(a) Expression 
(b) Correlation 
- 1 
0 
+1 
( c) 
Correl at i on 
( d) 
- 1 
0 
+1 
- 8 
1: 1 
+8 
( a) 
Expressi on ( b) 
Correl at i on 
Hierarchical Tree Seriation GAP Elliptical (R2E) Seriation Tree guided by (R2E) 22
GAP for Heritable (Genetic) Disease: Schizophrenia (National Taiwan University) 
Admission 
6 month 
Psychiatry Research (1998) Lin, Chen et al. 
Psychopathological Dimensions in 
Schizophrenia: A Correlational Approach 
to Items of the SANS and SAPS 
Corr. 
-1 0 1 
0.2 0.4 0.6 0.8 1 
Corr. 
-0.2 0.2 -0.4 0.4 
Absolute Random Error Coefficient 
0 1 
-.1 .1 
G7 
N6 
G13 
N1 
N4 
N2 
N3 
N5 
G10 
G12 
G5 
P2 
G11 
N7 
G15 
G3 
G6 
G4 
G16 
G8 
P7 
S2 
G14 
S1 
S3 
P4 
P5 
G2 
G1 
comforting the aggravating patient 
assistant to the aggravating patient 
transport of the aggravating patient to service setting  
financial aid 
general psychological/practical support 
coping with medical team 
understanding diagnosis and treatment 
identifying early signs of relapse 
understanding mental health laws 
general social acceptance 
occupational therapy 
sheltered working facilities 
advice on intimate relationship for patient 
lifelong custodial care for patient 
Need  
cluster for 
assistant 
to patient 
care 
Need  
cluster for 
accessing  
to relevant 
information 
Need  
cluster  
for 
societal  
support 
Need  
cluster 
for 
burden 
release 
Admission Hwu et al. 
Schizophrenia Research (2002) 
Symptom Patterns and Subgrouping 
of Schizophrenic Patients: 
Significance of Negative Symptoms 
Assessed on Admission 
0 0.2 0.4 0.6 0.8 1 
Corr. 
-1 0 1 
Corr. 
-1 0 1 
-1 0 1 
G7 
N6 
G13 
N1 
N4 
N2 
N3 
N5 
G10 
G12 
G5 
P2 
G11 
N7 
G15 
G3 
G6 
G4 
G16 
G8 
P7 
S2 
G14 
S1 
S3 
P4 
P5 
G2 
G1 
G1 
G2 
G3 
Average Correlation Negative Disorg. Host./ 
Genes, Brain and Behavior (2009) Lin 
et al. Clustering by neurocognition for 
fine-mapping of the schizophrenia 
susceptibility loci on chromosome 6p 
6 month Liu et al. 
J. of the Formosan Med. Ass. (2012) 
Medium-term course and outcome of 
schizophrenia depicted by the sixth-month 
subtype after an acute episode 
P3 
G9 
P1 
P6 
Negative Disorg. Host./ 
Excit. 
Del./ 
Hall. 
15 10 5 0 
Average Euc lidean Distanc e 
GONEG 
GWNEG 
G4 
PANSS Score 
1 2 3 4 5 6 7 
P3 
G9 
P1 
P6 
Average Correlation 
1 0.8 0.6 0.4 0.2 
Correlation Coefficient 
-1 -0.5 0 0.5 1 
Negative 
Symptoms 
Disorganized 
Thought 
Hostility / 
Excitement 
Delusion / 
Ha llucination 
Correlation Coefficient 
-1 -0.5 0 0.5 1 
PANSS Score 
1 2 3 4 5 6 7 
G7 
N6 
N3 
N1 
N2 
N4 
N5 
G16 
G10 
N7 
G5 
G13 
G11 
P2 
G15 
G12 
G8 
P7 
S1 
G14 
S2 
S3 
P4 
P6 
P3 
G9 
P1 
P5 
G4 
G2 
G1 
G3 
G6 
G7 
N6 
N3 
N1 
N2 
N4 
N5 
G16 
G10 
N7 
G5 
G13 
G11 
P2 
G15 
G12 
G8 
P7 
S1 
G14 
S2 
S3 
P4 
P6 
P3 
G9 
P1 
P5 
G4 
G2 
G1 
G3 
G6 
Negative 
Symptoms 
Disorganized 
Thought 
Hostility / 
Excitement 
Delusion / 
Ha llucination 
Anxiety 
Symptoms 
RMG 
(n=61) 
PDHG1 
(n=14) 
MBG 
(n=50) 
PDHG2 
(n=38) 
0 5 10 
Average Euc lidean Distanc e 
0.2 0.4 0.6 0.8 1 
Excit. 
Del./ 
Hall. Anxiety 
J. of the Formosan Med. Ass. 
(2008) Yeh et al. Factors 
Related to Perceived Needs of 
Chief Caregivers of Patients 
with Schizophrenia 
PLoS ONE (2011) Lai et al. 
MicroRNA expression aberration 
as potential peripheral blood 
biomarkers for schizophrenia 
Schizophrenia Research 
(2013) Liu et al. 
Development of a brief self-report 
questionnaire for 
screening putative pre-psychotic 
states.
GAP for Comparative Metabolome: Chinese Herbal Medicine 
Drs. Ning-Sun Yang, Lie-Fen Shyur, Wen-Chin Yang 
Agricultural Biotechnology Research Center (ABRC) of Academia Sinica 
BMC Genomics 9 (2008) 
Genomics and proteomics of 
immune modulatory effects 
of a butanol fraction of 
Echinacea purpurea in 
human dendritic cells 
Wang et al. 
Phytochemistry 70 (2009) Anti-diabetic 
properties of three common Bidens 
pilosa variants in Taiwan 
Chien et al. 
Journal of Nutritional 
Biochemistry 21 (2010) 
Comparative metabolomics 
approach coupled with cell-and 
gene-based assays for 
species classification and anti-inflammatory 
bioactivity 
validation of Echinacea plants 
Hou et al. 
BMC Complementary and 
Alternative Medicine 13 
(2013) Morus alba and active 
compound oxyresveratrol exert 
anti-inflammatory activity via 
inhibition of leukocyte 
migration involving MEK/ 
ERK signaling. 
Chen et al. 
紫錐菊 
咸豐草 
白桑
GAP for Cancer Study: Non–Small Cell Lung Cancer (National Taiwan University) 
Journal of Clinical Oncology 23 (2005) 
Tumor-Associated Macrophages in 
Cancer Progression Chen J. J. et al. 
The New England Journal of Medicine 356 (2007) A 
Five-Gene Signature and Clinical Outcome in Non– 
Small-Cell Lung Cancer Chen H. Y. et al. 
Cancer Research 66 (2006) 
Non–Small Cell Lung Cancer with Tumor 
Cell Invasiveness Sher Y. P. et al. 
BMC Genomics 6 (2005)Molecular 
signature of clinical severity in recovering 
patients with (SARS-CoV) 
Lee Y. S. et al. (Chang Gung Hospital) 
Open Access Scientific Reports 1 (2006) In silico 
Therapeutic Drug Screening for Reversing the Lung 
Adenocarcinoma Overexpressed Gene Signatures. 
Kuo Y. L. et al. (Nat’l Yang-Ming Univ.) 
GAP for Infectious Disease: SARS 
Protein-Protien 
Interaction 
Nat’l Yang-Ming Univ. 
Molecular and Cellular 
Proteomics 12 (2013) An 
analysis of protein-protein 
interactions in cross-talk 
pathways reveals CRKL as 
a novel prognostic marker 
in hepatocellular 
carcinoma. Liu et al. 
b Simple Match Between Pathways 
F13A1,HSPB1! 
MAPK14,EGFR! 
EGFR,HSPB1! 
STAT1,PDGFRB! 
PDGFRB,CRKL! 
HCK,CRKL! 
ITGAV,PTK2! 
FLT1,CRKL! 
CRKL,MAPK1! 
CRKL,RAF1! 
MAPK3,PTPN11! 
STAT5A,SHC1! 
CRK,SRC! 
GAB1,SOS1! 
CRK,SHC1! 
PXN,PTPN11! 
PDGFRB,PTPN11! 
PDGFRB,PLCG1! 
PLCG1,PTK2! 
CRKL,GAB1! 
CRKL,PTPN11! 
BAD,YWHAZ! 
BAD,RAF1! 
PTK2,PTEN! 
PXN,PTEN! 
CRKL,PIK3R1! 
AKT1,HSPB1! 
AKT1,PDPK1! 
MAPK14,AKT1! 
PIK3R1,SHC1! 
PIK3R1,SRC! 
HCK,SOS1! 
CRKL,SOS1! 
PDGFRB,RAF1! 
FLT1,PTPN11! 
HCK,PLCG1! 
FLT1,PLCG1! 
CRKL,EGFR! 
CRK,KDR! 
CRKL,PTK2! 
FLT1,PTK2! 
MAPK14,MAPK3! 
BAD,MAPK8! 
AKT1,SMAD4! 
FLT1,HCK! 
HCK,PIK3CB! 
CTNNB1,FLT1! 
PIK3R1,PXN! 
FLT1,PIK3R1! 
AKT1,PAK1! 
AKT1,NOS3! 
AKT1,MDM2! 
PTK2,YES1! 
PXN,MAPK8! 
CRK,FLT1! 
MAPK3,MAPK1! 
PDGFRB,SLC9A3R1! 
EGFR,HCK! 
MCM7,CDC6! 
CDC6,MCM6! 
PLK1,PKMYT1! 
E2F1,CDC6! 
CCNB1,PKMYT1! 
CDK7,E2F1! 
PLK1,CCNB1! 
CCNB1,CDC25A! 
CCNA2,CCNB1! 
GAP for 
C-Y F. Huang, 
a PPI to Pathway c Simple Match Between PPIs 
M1! 
M2! 
B1! 
B2! 
H1! 
H2! 
A1! 
A2! 
P1! 
P2! 
P3! 
P4! 
P5! 
Signalling to RAS! 
Signaling by EGFR! 
PDGFR-alpha signaling pathway! 
PDGFR-beta signaling pathway! 
Signaling events activated by Hepatocyte Growth Factor Receptor (c-Met)! 
IGF1 pathway! 
Signaling events mediated by VEGFR1 and VEGFR2! 
role of pi3k subunit p85 in regulation of actin organization and cell migration! 
PI3K/AKT signalling! 
akt signaling pathway! 
mTOR signaling pathway! 
Hedgehog signaling events mediated by Gli proteins! 
PPAR signaling pathway - Homo sapiens (human)! 
Canonical Wnt signaling pathway! 
Complement and coagulation cascades - Homo sapiens (human)! 
Unwinding of DNA! 
Activation of the pre-replicative complex! 
cdk regulation of dna replication! 
sonic hedgehog receptor ptc1 regulates cell cycle! 
Cyclin A/B1 associated events during G2/M transition! 
E2F mediated regulation of DNA replication! 
a 
Color legends 
Not on the pathway! 
Both Positive! 
Mahlavu Only! 
Huh7 Only! 
b,c0 1
Visualization for Binary Data 
Scatter-plot Matrix (SM) Parallel Coordinates Plot (PCP) 
1 
X1 x2 x3 x4 0 
Mosaic Plot 
26
Matrix Visualization for Binary Data 
Essential elements in a GAP MV procedure? 
Continuous Binary 
1. Data 
Matrix 
2. Subject 
Proximity 
3. Variable 
Proximity 
1. Data 
Matrix 
2. Subject 
Proximity 
3. Variable 
Proximity 
Correlation 
Covariance 
polychoric 
Correlation . . . 
Euclidean Distance 
Manhattan Distance 
Correlation … ? 
27
Commonly used similarity 
coefficients for binary data 
28 
Tzeng et al. (BMEI 2009) 
(IEEE Xplore Digital Library)
Binary GAP Example 
http://CGMIM Online 
www.bccrc.ca/ccr/CGMIM/ 
CGMIM performs automated text-mining of OMIM to identify genetically-related 
cancers 
Online Mendelian In Man (OMIM) is a computerized database of information 
about genes and heritable traits in human populations 
OMIM is maintained on the Internet by the 
National Center for Biotechnology Information at the 
US National Institutes of Health 
CGMIM considers 21 anatomic sites based on the major cancers 
identified by the National Cancer Institute of Canada 
CGMIM compares each OMIM entry name and alternative name with a list of 
gene names assigned by HUGO (HUman Genome Organization). 
CGMIM produces the number of genes for which an OMIM entry mentions 
each pair of cancers, as well as a ratio of the observed and expected number 29 
of 
genes for the combination
CGMIM 
All Data (1948 genes * 21 Sites) 
Original Order 
21 
Cancer 
Sites 
1948 
Related 
Genes 
Jaccard: a/(a+b+c) 
30
21 
Cancer 
Sites 
1948 
Related 
Genes 
CGMIM 
All Data (1948 genes * 21 Sites) 
Single_Tree_GrandPa_Guide 
Jaccard: a/(a+b+c) 
31
21 
Cancer 
Sites 
768 
Related 
Genes 
CGMIM 
768 genes at least at 2 Sites 
Original Order 
Jaccard: a/(a+b+c) 
32
21 
Cancer 
Sites 
768 
Related 
Genes 
CGMIM 
768 genes at least at 2 Sites 
GAP_Elliptical_Order 
Jaccard: a/(a+b+c) 
33
Approaching Statistics  Statistical Approach 
Matrix visualization 
of nominal data 
(GAP approach) 
Example: 
Classification of Animals Data 
Shizuhiko Nishisato 2006 
34
35 
票種 
code 
合計 
早鳥/社會 0 
351 
早鳥/學生 1 
121 
邀請/⼀一般 2 
48 
邀請/媒體 3 
25 
邀請/貴賓 4 
22 
黑客松 
5 
90 
邀請/講師 6 
32 
邀請/贊助 7 
12 
總計 701 
性別 
code 
合計 
女 
0 
166 
男 
1 
535 
總計 701 
業別 
code 
合計 
無法認定 0 
56 
資訊業 1 
203 
研究機構 
2 
41 
科技業 3 
33 
顧問公司 4 
13 
金融業 
5 
18 
通訊業 6 
21 
自由業 
7 
11 
傳播業 8 
17 
學術 9 
242 
政府機關 10 
11 
醫藥業 11 
8 
電子業 
12 
4 
公益團體 13 
8 
通路商 
14 
5 
其他 15 
10 
總計 701 
餐飲 
code 
合計 
素食 
0 
47 
葷食 
1 
654 
總計 701 
通知 
否 
0 
61 
是 1 
活動 
code 
合計 
640 
總計 701 
code 
合計 
黑客松 
0 
91 
資料分析 
1 
196 
演講議程 2 
414 
總計 701 
電郵 
code 
合計 
.edu 
0 
49 
.org 
1 
15 
gmail 
2 
500 
hotmail 
3 
44 
yahoo 
4 
10 
MSN 
5 
3 
.com 
6 
64 
其他 
7 
16 
總計 701 
報名資料 
可能變數 
報名序 
1~701 
票 
種 
業 
別 
通 
知 
活 
動 
電 
郵 
… 
… 
… 
… 
… 
0 1 1 1 1 
0 9 1 1 2 
0 1 1 0 6 
0 1 2 1 2 
0 9 2 1 2 
1 9 2 1 2 
1 9 2 1 2 
1 9 1 1 2 
1 9 1 1 2 
0 3 2 1 3 
1 9 2 1 3 
0 3 2 1 6 
0 11 2 1 2 
1 9 1 1 2 
1 9 2 1 2 
0 1 2 0 2 
1 9 2 1 2 
1 9 2 1 2 
0 3 2 1 6 
0 2 2 1 1 
0 1 2 1 2 
… 
… 
… 
… 
… 
701 人 
7 變數 
報 
名 
葷 
素 
性 
別 
… 
… 
… 
139 
1 
1 
140 
1 
1 
141 
0 
0 
142 
1 
1 
143 
1 
1 
144 
1 
0 
145 
1 
1 
146 
1 
1 
147 
1 
0 
148 
1 
1 
149 
1 
1 
150 
1 
1 
151 
1 
1 
152 
1 
1 
155 
1 
1 
157 
1 
1 
158 
1 
0 
159 
0 
1 
160 
1 
0 
162 
1 
1 
163 
1 
0 
… 
… 
…
Approaching Statistics  Statistical Approach 
Essential elements in a GAP MV procedure? 
1. Data 
Matrix 
2. Subject 
Proximity 
3. Variable 
Proximity 
type 
2 
measurements 
1. Data 
Matrix 
2. Subject 
Proximity 
3. Variable 
Continuous Proximity Nominal 
Correlation 
Covariance 
polychoric 
Correlation . . . 
Euclidean Distance 
Manhattan Distance 
Correlation … ? 
Matching 
proportion 
χ ? 
? 
36
A typical nominal data 
Shizuhiko 
Nishisato, 
2006 
Classifica5on 
of 
Animals 
35 
animals 
were 
sorted 
into 
piles 
of 
similar 
animals 
by 
15 
students 
Animal/Subject S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 
Alligator 8 1 6 9 6 1 3 4 1 4 3 4 3 6 4 
Bear 6 3 2 6 6 1 3 4 4 5 5 1 4 2 2 
Camel 4 3 9 3 1 4 3 5 4 2 5 1 7 7 8 
Cat 6 3 7 4 0 1 1 2 3 3 1 1 6 3 5 
Cheetah 3 3 7 4 0 1 3 5 4 3 6 1 6 2 2 
Chiken 7 2 4 1 2 5 7 1 5 1 1 3 8 4 1 
Chimpanzee 5 3 5 7 5 2 4 4 2 2 6 4 3 6 6 
Cow 1 3 9 6 1 1 3 5 3 4 1 1 4 5 8 
Crane 7 2 4 5 2 5 5 1 5 1 2 3 8 4 1 
Crow 7 2 4 5 2 5 5 1 5 1 2 3 8 4 1 
Dog 6 3 7 10 0 2 1 2 3 3 1 1 4 3 5 
Duck 7 2 4 1 2 5 5 1 5 1 2 3 8 4 1 
Elephant 4 3 6 3 1 4 3 5 4 5 3 1 7 7 2 
Fox 6 3 7 4 0 1 6 2 3 3 3 1 4 3 5 
Frog 8 1 3 2 3 3 2 3 1 4 4 2 1 1 3 
Giraffe 1 3 8 3 1 4 3 5 4 2 5 1 7 7 8 
Goat 3 3 9 6 1 4 6 5 3 3 1 1 5 3 5 
Hawk 7 2 4 5 2 5 5 1 5 1 3 3 8 4 1 
Hippopotamus 4 3 6 6 6 4 3 3 4 4 5 1 7 7 2 
Horse 6 3 9 6 1 2 3 5 3 3 1 1 5 5 8 
Leopard 1 3 7 4 0 1 3 5 4 3 3 1 6 2 2 
Lion 5 3 7 4 6 1 3 5 4 3 3 1 7 2 2 
Lizard 2 1 3 2 3 3 2 3 1 4 4 2 2 1 3 
Monkey 6 3 5 7 5 2 4 4 2 2 6 4 3 6 6 
Ostrich 3 2 4 1 2 5 3 1 5 1 5 3 8 7 8 
Pig 1 3 9 6 1 1 6 5 3 3 1 1 5 5 5 
Pigeon 7 2 4 5 2 5 5 1 5 1 2 1 8 4 1 
Rabbit 6 3 1 6 0 4 6 2 3 3 1 1 5 3 5 
Racoon 6 3 7 10 4 1 6 2 3 3 3 1 4 3 5 
Rhinoceros 4 3 5 6 6 4 3 5 4 4 5 1 7 7 2 
Snake 8 1 3 9 6 3 2 3 1 4 4 2 2 1 3 
Sparrow 7 2 4 5 2 5 5 1 5 2 2 3 8 4 1 
Tiger 5 3 7 4 0 1 3 5 4 3 3 1 6 2 2 
Tortoise 8 1 3 9 3 3 2 3 1 5 4 2 1 1 3 
Turkey 7 2 4 1 2 5 7 1 5 1 1 3 8 4 1 
What about 
3500 samples 
1500 variables 
? 
37
Alligator 
Bear 
Camel 
Cat 
Cheetah 
Chicken 
Cow 
Crane 
Chimpanzee 
Crow 
Dog 
Duck 
Elephant 
Fox 
Frog 
Giraffe 
Goat 
Hawk 
Hippopotamus 
Horse 
Leopard 
Lion 
Lizard 
Ostrich 
Pig 
Pigeon 
Rabbit 
Racoon 
Rhinoceros 
Snake 
Sparrow 
Tiger 
Tortoise 
Turkey 
Monkey
39 
Uni-variate Display Bar-Chart Pie-Chart 
S12 
S2 
Mammalia 
Reptilia 
Aves 
Reptilia 
AvesMammalia 
Primates ? 
20 
15 
10 
5 
0 
12 
3 
1 
2 
3 
4
1. Reptile 
2. Bird3. Mammal 
41 
2. Reptile 
3. Bird4. Primate? 
1. Mammal 
Bi-variate Display 
Mosaic Display
Conventional multivariate visualization for this data 
2D Mosaic Display 
5D Mosaic Display 
Scatter-plot Matrix 
Scatter-plot 
Parallel Coordinate Plot 
42
GAP Categorical MV 
Solution with 
3 matrix maps 
(original orders) 
43 
Alliga 
Bear 
Camel 
Cat 
Cheeta 
Chicke 
Chimpa 
Cow 
Crane 
Crow 
Dog 
Duck 
Elepha 
Fox 
Frog 
Giraff 
Goat 
Hawk 
Hippop 
Horse 
Leopar 
Lion 
Lizard 
Monke 
Ostric 
Pig 
Pigeon 
Rabbit 
Racoo 
Rhinoc 
Snake 
Sparro 
Tiger 
Tortoi 
Turkey
GAP Categorical MV 
44 
Solution with 
3 matrix maps 
(R2E orders)
45 
11 
6 
7 
15 
14 
12 
13 
9 
4 
3 
5 
8 
10 
2 
1 
Ostri c 
Turk e y 
Chicke 
Pigeon 
Hawk 
Sparro 
Duck 
Crow 
Cran e 
L i zard 
Frog 
Tor toi 
Snake 
Alli ga 
Hippop 
Bear 
Rhinoc 
Elepha 
L i o n 
Tiger 
Leopar 
Cow 
Fox 
Racoo 
Cheeta 
Cat 
Dog 
P i g 
Rabbit 
Hors e 
Goat 
Camel 
Gir aff 
Monke 
Chimpa 
11 
6 
7 
15 
14 
12 
13 
9 
4 
3 
5 
8 
10 
2 
1 
Aves 
Reptilia 
Mammalia 
Primates
D0_Lion 
D1_Elephant 
D2_Camel 
D3_Hawk 
D4_Fox 
A0_Dog 
A1_Alligator 
A2_Chimpanzee 
A3_Cow 
A4_Crow 
A5_Pigeon 
A6_Cheetah 
A7_Chiken 
A8_Bear 
A9_Cat 
B0_Rabbit 
B1_Frog 
B2_Goat 
B3_Tiger 
B4_Rhinoceros 
B5_Giraffe 
B6_Duck 
B7_Sparrow 
B8_Hippopotamus 
B9_Monkey 
C0_Turkey 
C1_Pig 
C2_Crane 
C3_Leopard 
C4_Ostrich 
C5_Lizard 
C6_Horse 
C7_Racoon 
C8_Tortoise 
C9_Snake
Alligator 
Bear 
Camel 
Cat 
Cheetah 
Chicken 
Cow 
Crane 
Chimpanzee 
Crow 
Dog 
Duck 
Elephant 
Fox 
Frog 
Giraffe 
Goat 
Hawk 
Hippopotamus 
Horse 
Leopard 
Lion 
Lizard 
Ostrich 
Pig 
Pigeon 
Rabbit 
Racoon 
Rhinoceros 
Snake 
Sparrow 
Tiger 
Tortoise 
Turkey 
Monkey 
?
與會人士類別性資料 
Univariate frequency breakdown 
Bivariate table 
48
49 
票種 
code 
合計 
早鳥票/社會人士 0 
351 
早鳥票/學生 1 
121 
邀請票/⼀一般 2 
48 
邀請票/媒體 3 
25 
邀請票/貴賓 4 
22 
g0v 黑客松票 
5 
90 
邀請票/講師 6 
32 
邀請票/贊助 7 
12 
總計 701 
性別 
code 
合計 
女 
0 
166 
男 
1 
535 
總計 701 
業別 
code 
合計 
Missing、無法認定 0 
56 
資訊業 1 
203 
研究機構 
2 
41 
科技業 3 
33 
顧問公司 4 
13 
金融業 
5 
18 
通訊業 6 
21 
自由業 
7 
11 
傳播業 8 
17 
學術 9 
242 
政府機關 10 
11 
醫藥業 11 
8 
電子業 
12 
4 
公益團體 13 
8 
通路商 
14 
5 
製造業、食品業、 
運輸業、服務業 15 
10 
總計 701 
餐飲需求 
code 
合計 
素食 
0 
47 
葷食 
1 
654 
總計 701 
參加活動 
code 
合計 
g0v 黑客松 
0 
91 
資料分析上手課程 
1 
196 
演講議程 2 
414 
總計 701 
日後通知 
code 
合計 
否 
0 
61 
是 1 
640 
總計 701 
電郵 
code 
合計 
.edu 
0 
49 
.org 
1 
15 
gmail 
2 
500 
hotmail 
3 
44 
yahoo 
4 
10 
MSN 
5 
3 
.com 
6 
64 
未定、其他 
7 
16 
總計 701 
報名資料可能變數 
報名序 
1~701
50 
701 人 
5 變數 
Optimization 
票 
種 
業 
別 
通 
知 
活 
動 
電 
郵 
… 
… 
… 
… 
… 
0 1 1 1 1 
0 9 1 1 2 
0 1 1 0 6 
0 1 2 1 2 
0 9 2 1 2 
1 9 2 1 2 
1 9 2 1 2 
1 9 1 1 2 
1 9 1 1 2 
0 3 2 1 3 
1 9 2 1 3 
0 3 2 1 6 
0 11 2 1 2 
1 9 1 1 2 
1 9 2 1 2 
0 1 2 0 2 
1 9 2 1 2 
1 9 2 1 2 
0 3 2 1 6 
0 2 2 1 1 
0 1 2 1 2 
… 
… 
… 
… 
… 
3 共變數 
葷 
素 
性 
別 
報 
名 
… 
… 
… 
1 
1 
139 
1 
1 
140 
0 
0 
141 
1 
1 
142 
1 
1 
143 
1 
0 
144 
1 
1 
145 
1 
1 
146 
1 
0 
147 
1 
1 
148 
1 
1 
149 
1 
1 
150 
1 
1 
151 
1 
1 
152 
1 
1 
155 
1 
1 
157 
1 
0 
158 
0 
1 
159 
1 
0 
160 
1 
1 
162 
1 
0 
163 
… 
… 
…
51 
其他 
.com 
MSN 
Yahoo 
Hotmail 
Gmail 
.org 
.edu 
是 
否 
業別 
演講議程 
資料分析 
黑客松 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
票種 
邀請贊助 
邀請講師 
黑客松 
邀請貴賓 
邀請媒體 
邀請一般 
早鳥學生 
早鳥社會 
電郵 
通知 
活動 
李育杰 
701 
人 
5 變數
52 
報 
名 
女素序 
票 業 活 通 電 
種 別 動 知 郵 
票種 
業別 
活動 
通知 
電郵 
5 變數 
701 人 
Original Orders (報名序)
5 變數 Original Orders (報名序) 
53 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通知 是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
報 
名 
序 
票 
種 
業 
別 
電 
郵 
女素 
票 業 活 通 電 
種 別 動 知 郵 
票種 
業別 
活動 
通知 
電郵 
701 人
5 變數 
票 通 活 電 業 
種 知 動 郵 別 
Random Orders 
54 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通知 是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
報 
名 
序 
票 
種 
業 
別 
電 
郵 
票種 
通知 
活動 
電郵 
女素 業別 
701 人
沈澱圖 (pie-chart, bar-chat) 
(可觀察單一變數各類別之 
比例,但失去變數間連結) 
55 
女男 素葷報 名票 種 業別 活動 通知 電郵 
通知 
活動 
票種 
業別 
電郵 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
是 
否 
演講議程 
資料分析 
黑客松 
邀請/贊助 
邀請/講師 
黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
701 
人 
166
電 通 活 票 業 
郵 知 動 種 別 其他 
56 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
報 
名 
序 
票 
種 
業 
別 
Elliptical Seriations (R2E) 
電郵 
通知 
活動 
票種 
業別 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通知 是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
電 
郵 
5 變數 
女素 
701 人
業 電 通 活 票 
別 郵 知 動 種 
57 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
報 
名 
序 
業別 
電郵 
通知 
活動 
票種 
票 
種 
業 
別 
Hierarchical Clustering Tree 
(HCT) 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通知 是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
電 
郵 
5 變數 
女素 
701 人
電 通 活 票 業 
郵 知 動 種 別 
58 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
報 
名 
序 
票 
種 
業 
別 
Hierarchical Clustering Tree 
(HCT) 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通知 是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
電 
電郵 郵 
通知 
活動 
票種 
業別 
5 變數 
女素 
701 人
Hierarchical Clustering Tree with 
Flips Guided Elliptical Seriation 
59 
電郵 通知 活動 票種 業別 
報名 Orders: HCT-R2E 
票 
種 
其他 
通路商 
公益團體 
電子業 
醫藥業 
政府機關 
學術 
傳播業 
自由業 
通訊業 
金融業 
顧問公司 
科技業 
研究機構 
資訊業 
無法認定 
業 
別 
電 
郵 
其他 
.com 
MSN 
yahoo 
hotmail 
gmail 
.org 
.edu 
通 
知 
是 
否 
活 
動 
演講議程 
資料分析 
g0v黑客松 
邀請/贊助 
邀請/講師 
g0v黑客松 
邀請/貴賓 
邀請/媒體 
邀請/一般 
早鳥/學生 
早鳥/社會 
女素
Approaching Statistics  Statistical Approach 
CIA 
Data: 
160 international organization 
membership pattern (variables) for 
230 countries/regions (subjects) 
0. non-member □ 1. member ■ 
2. observer 3. associate member 
4. guest 5. dialogue partner 
CIA Political Map of the World 
230 
countries 
(regions) 
http://www.faqs.org/docs/factbook/index.html 
160 international 
organization 
60 
Matrix Visualization with cartography links
Draw one membership map for each organization (variable)? 
1 2 3 
4 5 6 
7 8 9 
. . . 160 maps (?) . . . 
158 159 160 61
Cartography Coloring Scheme with Categorical GAP (CartoGAP) - 2 
Data: 
Ranks of 
5 Candidates 
(扁宋連許李) 
on 360 Townships 
2000 總統大選資料 
Is it possible to visualize 
information structure 
for all 5 candidates 
in a single MAP? 
A B C 
A B C 
D E 
 
D E 
? 
Rank 
1 
2 
3 
4 
4.5 
5 
 
 
 
 
 
 
 
 
	#
! 
  

 
 
ABCDE 
扁 
宋 
連 
許 
扁宋連 
許李 
李
Cartography Coloring Scheme with GAP (CartoGAP)-2 
(B). CateGAP Color Map for Each Individual Variable 
A B C 
E 
A B C 
D E 
D 
(C). Final Single 
CateGAP Cartography 
Color MAP for Complete 
Information Visualization 
扁 
宋 
連 
李 
扁宋連 
許李 
許
From physical maps to conceptual maps 
64 
Chromosome Map 
Macro Biodiversity 
Semiconductor 
Wafer Quality 
Control 
Micro 
Biodiversity
65 
Matrix Visualization for 
Symbolic Data (Analysis) 
for Big Data?
1.1 
Symbolic 
Data 
Analysis 
(SDA) 
and 
1.2 
Matrix 
Visualizaon 
(MV) 
Fig. 
1. 
Diagram 
for 
related 
conven5onal 
data 
matrix 
and 
symbolic 
(interval 
type) 
data 
table 
with 
their 
corresponding 
proximity 
matrices 
for 
samples/concepts 
and 
variables.
Example: Japan Minryoku 2010 Data (with Junji Nakano, ISM) 
67 
Level 1 
Level 2 
Level 3 
Level 4 
Region (10) 
Area (151) 
District (821) 
City (1899) 
58 variables 
1899 
Level 4 
Cities 
市區町村 
58 variables 
(interval) 
151 
Level 2 
Areas地域 
continuous 
Data 
↓ 
Rank 
Data 
(1~1899) 
merged 
(interval of ranks) 
data 
covariate 
10 
Level 1 
Regions
Japan Minryoku 2010 Data
12 displaying modes 
for MV of interval data 
58 interval variables 
151 regions (concepts) 
Min 
Mid 
Max 
Length 
Length  949 
len949, 949mid 
1746  length 
900mid1000 
Sufficient 
Sediment 
Row Condition 
Col Condition
Statisticians, Data Analysts,  Bioinformaticists 
A statistician is someone who wants to get exactly the right 
answer, even if it’s the answer to the wrong question. 
A data analyst is someone who is willing to settle for an 
approximate answer, as long as it’s the answer to the right 
question. 
A bioinformaticist is someone who is willing to settle for 
answers of unknown accuracy, to questions that have not 
been clearly articulated, as long as the results can be 
graphed in color. 
David B. Allison, Ph.D. 
Department of Biostatistics 
University of Alabama at Birmingham
Approaching Statistics  Statistical Approach 
12. MV for Color Blind people 
Types of color blind 
Monochromacy 
Dichromacy 
Protanopia and deuteranopia 
Hereditary tritanopia 
Anomalous Trichromacy 
http://www.vischeck.com/examples/ 
To act passively to prevent from using color systems that are 
difficult for color blind people to understand. or 
To work actively in assisting people with visual impairments to 
have better visualization of data/information. 
“I believe there are more mathematics/statistics 
blind people than color blind people” 71

More Related Content

What's hot

Биљни и животињски свет
Биљни и животињски светБиљни и животињски свет
Биљни и животињски светprijicsolar
 
Bezicni internet
Bezicni internetBezicni internet
Bezicni internetMeri Hodzic
 
Zglavkari ponavljanje
Zglavkari ponavljanjeZglavkari ponavljanje
Zglavkari ponavljanjeEna Horvat
 
Reka Tisa, Petra Dilparić
Reka Tisa, Petra DilparićReka Tisa, Petra Dilparić
Reka Tisa, Petra Dilparićdvucen
 
Anatomija ženskog spolnog sustava & menstruacijski ciklus
Anatomija ženskog spolnog sustava & menstruacijski ciklusAnatomija ženskog spolnog sustava & menstruacijski ciklus
Anatomija ženskog spolnog sustava & menstruacijski ciklusVinko Bubic, MD
 
Спољашње силе
Спољашње силеСпољашње силе
Спољашње силеprijicsolar
 
ENZIMI- BIOKATALIZATORI
ENZIMI- BIOKATALIZATORIENZIMI- BIOKATALIZATORI
ENZIMI- BIOKATALIZATORIKlara Kakučka
 
Sistem organa za varenje - "muzikanti":)
Sistem organa za varenje - "muzikanti":)Sistem organa za varenje - "muzikanti":)
Sistem organa za varenje - "muzikanti":)plavaplaneta
 
Produktivnost Rada (3) (1).pptx
Produktivnost Rada (3) (1).pptxProduktivnost Rada (3) (1).pptx
Produktivnost Rada (3) (1).pptxLamijaHasanspahi
 
1.reč o istorijskoj nauci 1
1.reč o istorijskoj nauci 11.reč o istorijskoj nauci 1
1.reč o istorijskoj nauci 1Šule Malićević
 
Veliki mozak
Veliki mozakVeliki mozak
Veliki mozakdr Šarac
 

What's hot (20)

Биљни и животињски свет
Биљни и животињски светБиљни и животињски свет
Биљни и животињски свет
 
Bezicni internet
Bezicni internetBezicni internet
Bezicni internet
 
Zglavkari ponavljanje
Zglavkari ponavljanjeZglavkari ponavljanje
Zglavkari ponavljanje
 
Reka Tisa, Petra Dilparić
Reka Tisa, Petra DilparićReka Tisa, Petra Dilparić
Reka Tisa, Petra Dilparić
 
002 Skeletni Sistem Kicmenjaka
002 Skeletni Sistem Kicmenjaka002 Skeletni Sistem Kicmenjaka
002 Skeletni Sistem Kicmenjaka
 
Спољашње силе
Спољашње силеСпољашње силе
Спољашње силе
 
Anatomija ženskog spolnog sustava & menstruacijski ciklus
Anatomija ženskog spolnog sustava & menstruacijski ciklusAnatomija ženskog spolnog sustava & menstruacijski ciklus
Anatomija ženskog spolnog sustava & menstruacijski ciklus
 
Спољашње силе
Спољашње силеСпољашње силе
Спољашње силе
 
ENZIMI- BIOKATALIZATORI
ENZIMI- BIOKATALIZATORIENZIMI- BIOKATALIZATORI
ENZIMI- BIOKATALIZATORI
 
Veliki mozak
Veliki mozakVeliki mozak
Veliki mozak
 
Aleksandar veliki
Aleksandar velikiAleksandar veliki
Aleksandar veliki
 
Sistem organa za varenje - "muzikanti":)
Sistem organa za varenje - "muzikanti":)Sistem organa za varenje - "muzikanti":)
Sistem organa za varenje - "muzikanti":)
 
Produktivnost Rada (3) (1).pptx
Produktivnost Rada (3) (1).pptxProduktivnost Rada (3) (1).pptx
Produktivnost Rada (3) (1).pptx
 
Evolucija coveka
Evolucija covekaEvolucija coveka
Evolucija coveka
 
Zdrava ishrana
Zdrava ishranaZdrava ishrana
Zdrava ishrana
 
EE otpad
EE otpadEE otpad
EE otpad
 
1.reč o istorijskoj nauci 1
1.reč o istorijskoj nauci 11.reč o istorijskoj nauci 1
1.reč o istorijskoj nauci 1
 
Zmije
ZmijeZmije
Zmije
 
Veliki mozak
Veliki mozakVeliki mozak
Veliki mozak
 
Travni ekosistemi
Travni ekosistemiTravni ekosistemi
Travni ekosistemi
 

Viewers also liked

[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123台灣資料科學年會
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)台灣資料科學年會
 
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統台灣資料科學年會
 
李慕約&王向榮/如何備料:資料的抓取、清理以及串接
李慕約&王向榮/如何備料:資料的抓取、清理以及串接李慕約&王向榮/如何備料:資料的抓取、清理以及串接
李慕約&王向榮/如何備料:資料的抓取、清理以及串接台灣資料科學年會
 
一個賭徒的告白:從預測市場看金融交易
一個賭徒的告白:從預測市場看金融交易一個賭徒的告白:從預測市場看金融交易
一個賭徒的告白:從預測市場看金融交易台灣資料科學年會
 
林佳賢/資料視覺化的 20 個小訣竅
林佳賢/資料視覺化的 20 個小訣竅林佳賢/資料視覺化的 20 個小訣竅
林佳賢/資料視覺化的 20 個小訣竅台灣資料科學年會
 
R統計軟體簡介
R統計軟體簡介R統計軟體簡介
R統計軟體簡介Person Lin
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用Person Lin
 
機率統計 -- 使用 R 軟體
機率統計 -- 使用 R 軟體機率統計 -- 使用 R 軟體
機率統計 -- 使用 R 軟體鍾誠 陳鍾誠
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
資料科學的第一堂課 Data Science Orientation
資料科學的第一堂課 Data Science Orientation資料科學的第一堂課 Data Science Orientation
資料科學的第一堂課 Data Science OrientationRyan Chung
 
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studioR Ladies Taipei
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室台灣資料科學年會
 
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務台灣資料科學年會
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學台灣資料科學年會
 
初學R語言的60分鐘
初學R語言的60分鐘初學R語言的60分鐘
初學R語言的60分鐘Chen-Pan Liao
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 HadooperFred Chiang
 

Viewers also liked (20)

[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123[系列活動] 給工程師的統計學及資料分析 123
[系列活動] 給工程師的統計學及資料分析 123
 
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
[系列活動] 智慧製造與生產線上的資料科學 (製造資料科學:從預測性思維到處方性決策)
 
Z > B 的資料科學
Z > B 的資料科學Z > B 的資料科學
Z > B 的資料科學
 
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
天下武功唯快不破:利用串流資料實做出即時分類器和即時推薦系統
 
李慕約&王向榮/如何備料:資料的抓取、清理以及串接
李慕約&王向榮/如何備料:資料的抓取、清理以及串接李慕約&王向榮/如何備料:資料的抓取、清理以及串接
李慕約&王向榮/如何備料:資料的抓取、清理以及串接
 
一個賭徒的告白:從預測市場看金融交易
一個賭徒的告白:從預測市場看金融交易一個賭徒的告白:從預測市場看金融交易
一個賭徒的告白:從預測市場看金融交易
 
林佳賢/資料視覺化的 20 個小訣竅
林佳賢/資料視覺化的 20 個小訣竅林佳賢/資料視覺化的 20 個小訣竅
林佳賢/資料視覺化的 20 個小訣竅
 
R統計軟體簡介
R統計軟體簡介R統計軟體簡介
R統計軟體簡介
 
R統計軟體 -安裝與使用
R統計軟體 -安裝與使用R統計軟體 -安裝與使用
R統計軟體 -安裝與使用
 
第一場預測
第一場預測第一場預測
第一場預測
 
機率統計 -- 使用 R 軟體
機率統計 -- 使用 R 軟體機率統計 -- 使用 R 軟體
機率統計 -- 使用 R 軟體
 
新手村-資料探索
新手村-資料探索新手村-資料探索
新手村-資料探索
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
資料科學的第一堂課 Data Science Orientation
資料科學的第一堂課 Data Science Orientation資料科學的第一堂課 Data Science Orientation
資料科學的第一堂課 Data Science Orientation
 
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio
不會寫程式的人友善上手機器學習-淺談 Azure machine learning studio
 
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
吳齊軒/漫談 R 的學習挑戰與 R 語言翻轉教室
 
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
[DSC 2016] 系列活動:許懷中 / R 語言資料探勘實務
 
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
曾韵/沒有大數據怎麼辦 ? 會計師事務所的小數據科學
 
初學R語言的60分鐘
初學R語言的60分鐘初學R語言的60分鐘
初學R語言的60分鐘
 
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
那些你知道的,但還沒看過的 Big Data 風景 ─ 致 Hadooper
 

Similar to Collaboration with Statistician? 矩陣視覺化於探索式資料分析

Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slidesChirag Patel
 
1. Understanding research and statistics.ppt
1. Understanding research and statistics.ppt1. Understanding research and statistics.ppt
1. Understanding research and statistics.pptKamalAdhikari26
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)台灣資料科學年會
 
Recent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareRecent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareYoon Sup Choi
 
poster presentation
poster presentationposter presentation
poster presentationZeyang Li
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Nicola Amoroso
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryNeo4j
 
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxCT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxannettsparrow
 
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docxgertrudebellgrove
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataChirag Patel
 
NeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingNeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingKrzysztof Gorgolewski
 
5The Application of Clinical Systems to the Stud.docx
5The Application of Clinical Systems to the Stud.docx5The Application of Clinical Systems to the Stud.docx
5The Application of Clinical Systems to the Stud.docxblondellchancy
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
 
If only access were our only infrastructure problem!
If only access were our only infrastructure problem!If only access were our only infrastructure problem!
If only access were our only infrastructure problem!Björn Brembs
 

Similar to Collaboration with Statistician? 矩陣視覺化於探索式資料分析 (20)

Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Ml in genomics
Ml in genomicsMl in genomics
Ml in genomics
 
Oxford_15-03-22.pptx
Oxford_15-03-22.pptxOxford_15-03-22.pptx
Oxford_15-03-22.pptx
 
Data analytics to support exposome research course slides
Data analytics to support exposome research course slidesData analytics to support exposome research course slides
Data analytics to support exposome research course slides
 
Statistics
StatisticsStatistics
Statistics
 
1. Understanding research and statistics.ppt
1. Understanding research and statistics.ppt1. Understanding research and statistics.ppt
1. Understanding research and statistics.ppt
 
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
 
Recent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcareRecent advances and challenges of digital mental healthcare
Recent advances and challenges of digital mental healthcare
 
poster presentation
poster presentationposter presentation
poster presentation
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docxCT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
CT AND SPECT ANALYSIS1CT and SPECT ProceduresA statistic.docx
 
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx
(I’ll GO OVER STEP BY STEP IN CLASS TOMORROW)Part OneP.docx
 
Methods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big dataMethods to enhance the validity of precision guidelines emerging from big data
Methods to enhance the validity of precision guidelines emerging from big data
 
NeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingNeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimaging
 
5The Application of Clinical Systems to the Stud.docx
5The Application of Clinical Systems to the Stud.docx5The Application of Clinical Systems to the Stud.docx
5The Application of Clinical Systems to the Stud.docx
 
Haladjian CV
Haladjian CVHaladjian CV
Haladjian CV
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
If only access were our only infrastructure problem!
If only access were our only infrastructure problem!If only access were our only infrastructure problem!
If only access were our only infrastructure problem!
 

More from 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

More from 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Collaboration with Statistician? 矩陣視覺化於探索式資料分析

  • 1. Collaboration with Statistician? 矩陣視覺化於探索式資料分析 1 陳君厚 中央研究院 統計科學研究所 中央研究院 人文社會科學館國際會議廳 August 30, 2014
  • 3. 3 Q: 君厚, Big Data 這麼熱, 我們是否應該排進統計系所學程? A: Big Data? 現在的統計系所畢業學生能夠處理 Data 嗎?
  • 4. 4 胡海國 教授 台灣大學 醫學院 精神科主任 主要合作研究者 楊泮池 教授 台灣大學 醫學院 院長 陳章榮 研究員 美國 FDA 毒理研究所 李御賢 教授 長庚紀念醫院 核心實驗室 銘傳大學 白果能 研究員 中央研究院 生物醫學科學研究所 吳漢銘 教授 淡江大學 數學系 周正中 教授 中正大學 生命科學系 楊永正 教授 陽明大學 生物醫學資訊研究所 林文昌 研究員 中央研究院 生物醫學科學研究所 黃奇英 教授 陽明大學 臨床醫學研究所 楊欣洲 研究員 中央研究院 生物醫學科學研究所
  • 5. I. Collaboration with Statistician? 中研院賴明昭副院長推廣跨領域合作 (November 10, 2004) 合作不一定是資料分析 與統計學家合作很可能是需要資料分析
  • 6. 陳老師,我的同事用Factor Analysis上 同樣的Journal只花了三分之一的時間 Luxury Research vs. Necessity Research Senior Researcher vs. Young Investigator (Established) vs. (Struggling) Corr. -1 0 1 0 0.2 0.4 0.6 0.8 1 Corr. -.1 .1 -1 0 1 Corr. Corr. -0.2 0.2 -0.4 0.4 -1 0 1 -1 0 1
  • 7. Chun-houh, can you create some powerful statistical/bioinformatics methods so we can get our experiments published in Nature/ Science? Sir, can you conduct some meaningful biological/medical experiments so we can get our methods published in Nature/ Science?
  • 8. Mutual Trust (XX 人在中研院 YY 會議中說) 統計所的陳君厚說你們 ZZ 所的生物晶片 都有問題 Mutual Understanding You can remove Figure1 together with my name from the paper G7 N6 G13 N1 N4 N2 N3 N5 G10 G12 G5 P2 G11 N7 G15 G3 G6 G4 G16 G8 P7 S2 G14 S1 S3 P4 P5 G2 G1 P3 G9 P1 P6 Negative Disorg. Host./ Excit. Del./ Hall. G7 N6 G13 N1 N4 N2 N3 N5 G10 G12 G5 P2 G11 N7 G15 G3 G6 G4 G16 G8 P7 S2 G14 S1 S3 P4 P5 G2 G1 15 10 5 0 Average Euc lidean Distanc e G1 GONEG G2 G3 GWNEG G4 PANSS Score 1 2 3 4 5 6 7 P3 G9 P1 P6 Average Correlation 1 0.8 0.6 0.4 0.2 Correlation Coefficient -1 -0.5 0 0.5 1 Negative Symptoms Disorganized Thought Hostility / Excitement Delusion / Ha llucination Correlation Coefficient -1 -0.5 0 0.5 1 Average Correlation Negative Disorg. Host./ PANSS Score 1 2 3 4 5 6 7 G7 N6 N3 N1 N2 N4 N5 G16 G10 N7 G5 G13 G11 P2 G15 G12 G8 P7 S1 G14 S2 S3 P4 P6 P3 G9 P1 P5 G4 G2 G1 G3 G6 G7 N6 N3 N1 N2 N4 N5 G16 G10 N7 G5 G13 G11 P2 G15 G12 G8 P7 S1 G14 S2 S3 P4 P6 P3 G9 P1 P5 G4 G2 G1 G3 G6 Negative Symptoms Disorganized Thought Hostility / Excitement Delusion / Ha llucination Anxiety Symptoms RMG (n=61) PDHG1 (n=14) MBG (n=50) PDHG2 (n=38) 0 5 10 Average Euc lidean Distanc e 0.2 0.4 0.6 0.8 1 Excit. Del./ Hall. Anxiety
  • 9. II. 矩陣視覺化於探索式資料分析 9 Matrix Visualization: Approaching Statistics and Statistical Approach 矩陣視覺化: 趨近統計與統計趨勢
  • 10. Lab 309 (???) for Information Visualization Dr. 田銀錦 Postdoc. Fellow 張勝傑 張文宗 陳柏旭 鐘雅齡 黃建勳 林香誼 劉勝宗 曾聖澧 葉紫君 吳怡真 林倩如 歐陽智聞 . . . 10 Mr. 高君豪 Ph.D. student Prof. 吳漢銘 Dept. Math. Tamkang U. Prof. 須上英 Dept. Stat. Nat’l Taipei U. Ms. 石佳鑫 Research Assistant Dr. 何孟如 Postdoc. Fellow
  • 11. 11 Data analysis A process of • inspecting data • cleaning data • transforming data • modeling data With the goal of • discovering useful information • suggesting conclusions • supporting decision making 了解資料 探索式資料分析 (Exploratory Data Analysis) 資料視覺化
  • 12. Exploratory Data Analysis EDA, John Tukey (1977) It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it. 1915~ 2000 allow the data to speak for themselves before standard assumptions or formal modeling The greatest value of a picture is when it forces us to notice what we never expected to see. Matrix Visualization as an EDA tool for assisting formal mathematical modeling 12
  • 13. John W. Tukey在探索式資料分析 (Exploratory Data Analysis, EDA) 書中開宗明義地提到: It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it. 學習你可以做什麼,有助於在資料分析的過程中達到 事半功倍的效果。EDA的作用在於從「看」資料獲得 資料所傳達的訊息,所著重的是簡單的算術與容易建 構的圖、表。透過 E D A 對於圖表中所顯露之型樣 (pattern) 做一初步的認知與描述,再進一步以人類的心 智 (mind) 對所接收的訊息做全面的分析與判斷,以探 索潛藏於資料中的訊息。強調的是探索式的分析而非 嚴謹的模式確認。
  • 15. I. Setosa I. Verginica I. Versicolor Species name 80 60 40 20 0 Pet É Pet É Sep É Sep É Graphics/Visualization for high dimensional data? P5 p10 p100 p10000 80 60 40 20 0 Pet É Pet É Sep É Sep É 80 60 40 20 30 60 90 120 Series Dat a nscores Pet al widt h I.S I.V I.V 50 40 30 20 10 Species name 15
  • 16. Recent Review Articles for MV The History of the Cluster Heat Map Leland WILKINSON and Michael FRIENDLY The American Statistician, May 2009, Vol. 63, No. 2 179 REVIEW Seriation and Matrix Reordering Methods: An Historical Overview by Innar Liiv Statistical Analysis and Data Mining 3: 70–91, 2010 Figure 2. Shaded matrix display from Loua (1873), available online at http:// books.google.com/books/. This was designed as a summary of 40 separate maps of Paris, showing the characteristics (e.g., national origin, professions, age, social classes) of 20 districts, using a color scale ranging from white (low) through yellow and blue to red (high). Figure 3. Sorted shaded display from Brinton (1914). The data are ranks of U.S. states on each of 10 educational features assessed in 1910. The matrix has been sorted by the row-marginal ranks. Figure 5. Sorted shaded display from Czekanowski (1909), reproduced in Hage and Harary (1995). Figure 9. Cluster heat map from Wilkinson (1994). The data are social statistics (i.e., urbanization, literacy, life expectancy for females, GDP, health expenditures, educational expenditures, military expenditures, death rate, infant mortality, birth rate, and ratio of birth to death rate) from a United Nations survey of world countries. The variables were standardized before the hierarchical clustering was performed. Matrix Visualization (MV): reorderable matrix, heatmap, color histogram, data image1 6
  • 17. Data Matrix 50 題精神症狀量表 11000040000050000000000000000022233022200342203300 32111010001030000002000000000000000000000000000002 55500010000000000011110000000033333005315121444420 55554550055515505500100003000030022100000000200000 00000000010000000000000000000010000000000000000000 20200220200002000002202010000032312000002132212220 00000000030000000000000000022000000000000000000000 31100020000131300002503300000020202003043331300031 55100440000034404000550000000044443044414355413330 00000000030000000000000000000000000000000000000000 20200000030122032000000000000020101000000030200020 50100050045000000013400000000020012000000320442311 00050050000420000000200000200010001000000031410300 00000030155033000000400000000000000000000000040333 55500010000004403300304044000033323030000002332222 00000000020000000000000000032100000000000000000000 11100020000010000000503000000020022200200300302034 55400055000044404002000200000020020000000320404433 00000000000010000000000000000010002002222001211200 42400030040030000010402200023022322002000330301222 33300040000020000024404334002030230000000330400045 23100030000030000004003300000000030000000022454302 44400030000440030020233200000043433334433232222231 32100010022023002000000000000020022033302042413333 11100000055000000002403000000000000000000100010002 00000000000000000000300000000000000000000000000020 44400040044000000033243303300043333433300344424444 11100000130011120000301122221010111212211001212221 11111111111115000005001111111155555005511555555551 33100000030050000001004344111010110111100010111211 00000000040000000000002300044000000000000000000020 00000030010030000022103202200032323322202233303321 22110011000110000032103221200033223222323333313321 44100340020020000000500000000000000000000020200020 00000012000020000000000003000000000002200200000200 44000040100000000001500100100001111021002032202300 22000040000040000002100000000020000100100030202210 40400000020055300000000002000000000000010031212101 55550540000034444000500200000000000000010000000210 00000000000000000000000000000011012000200130203210 44140040000441140000100000000011101010000000223200 22200050000040001000010000000032324000002222220400 55000530040030000000500000000000000000000000000000 43400030000040000000000000000044434033311032444422 33300000004453000000003303000042333333302133333321 10020000000002000000200000000020032000021032211210 33200030000041330001210001000020102222211032323310 33300020030030000001102202200031202222201032202222 33300030040030000002102202000022202201100020202200 55511150551115555522335544101142424433454455445545 00000030020000000012003202300022212311200131322211 40000000033033333300305555004320111202031231110020 00000000000000000000002200043000000000000020000030 11000030200330000023203203000043333333302333333332 55000020000023301000003421004023332133444034311151 21110150111155552201000000000021234031111021335500 55500010100001000020002202004033342122122250552344 55500000000030000002200003000010000000320000000311 44400350001033332200402102211001000000100030202321 33000330030330020000304330003002200200000200000032 00000000050000000020303320143000021000000030000033 00000000020020000000300000020000000000000000200100 11102120122000000002215511115020102302411140411155 55500500050033410002000403433000000000000000000010 44400240040030033320505434204040022400000240414445 50500050000000000000000000000033313433101111113402 55500050000555500000000000000022221002100033312300 30330000000050000045000000000031243313304323303410 55100130031033000002500000000033322023123122411323 00000000000000000000002100230100000000000000000012 22100120000030000000200000000000000000000230210222 20000000030000000003200000004300000000000000000043 00000000030000000001300000020000000000000120000031 33300320023210300202303323000020002200022043220022 00000021010040100001003301100021001001301210103200 50400450040050000000500000000000000000033000000000 20000200000000000000100000000043434033202224012330 51100030000003000503003320000033344043154514411412 55000530013000000002000000000030002300000000355400 00000000000000002002000000000022201001200022200211 33000300044020000001422000021000000000000020100030 10000013020000000043004324304044434033412244402420 33300020030000200000003302044200000000000000000032 33000000000030000022323200000032222322310230202211 00000000131000000002212100033200100101100111100011 00000000030000000023200000033000000000000000000030 00000023020020000002300000022000000001100022120010 00000000040000000023200000032000000200000210100011 00000000020000000000100000022110000100000000220010 44400030030030000033300300023020000200000120000002 00000000030020000022202000032200000100000000000011 00000003040000000033310000043000000000000230000032 22200000000001000000002211000021221011101032000021 41400240000130000002503002000022000002200032432223 20200000031000000000400000243000000000000030000030 95 位精神醫學患者 Data Map 嚴重 5 0 正常 50 題精神症狀量表 95 位精神醫學患者
  • 18. Generalized Association Plots (GAP) for MV of continuous data GAP 2.permutation 4.summary 3.partition Data Matrix Continuous ordinal Binary nominal 1.presentation 18 Approaching Statistics Statistical Approach 11000040000050000000000000000022233022200342203300 32111010001030000002000000000000000000000000000002 55500010000000000011110000000033333005315121444420 55554550055515505500100003000030022100000000200000 00000000010000000000000000000010000000000000000000 20200220200002000002202010000032312000002132212220 00000000030000000000000000022000000000000000000000 31100020000131300002503300000020202003043331300031 55100440000034404000550000000044443044414355413330 00000000030000000000000000000000000000000000000000 20200000030122032000000000000020101000000030200020 50100050045000000013400000000020012000000320442311 00050050000420000000200000200010001000000031410300 00000030155033000000400000000000000000000000040333 55500010000004403300304044000033323030000002332222 00000000020000000000000000032100000000000000000000 11100020000010000000503000000020022200200300302034 55400055000044404002000200000020020000000320404433 00000000000010000000000000000010002002222001211200 42400030040030000010402200023022322002000330301222 33300040000020000024404334002030230000000330400045 23100030000030000004003300000000030000000022454302 44400030000440030020233200000043433334433232222231 32100010022023002000000000000020022033302042413333 11100000055000000002403000000000000000000100010002 00000000000000000000300000000000000000000000000020 44400040044000000033243303300043333433300344424444 11100000130011120000301122221010111212211001212221 11111111111115000005001111111155555005511555555551 33100000030050000001004344111010110111100010111211 00000000040000000000002300044000000000000000000020 00000030010030000022103202200032323322202233303321 22110011000110000032103221200033223222323333313321 44100340020020000000500000000000000000000020200020 00000012000020000000000003000000000002200200000200 44000040100000000001500100100001111021002032202300 22000040000040000002100000000020000100100030202210 40400000020055300000000002000000000000010031212101 55550540000034444000500200000000000000010000000210 00000000000000000000000000000011012000200130203210 44140040000441140000100000000011101010000000223200 22200050000040001000010000000032324000002222220400 55000530040030000000500000000000000000000000000000 43400030000040000000000000000044434033311032444422 33300000004453000000003303000042333333302133333321 10020000000002000000200000000020032000021032211210 33200030000041330001210001000020102222211032323310 33300020030030000001102202200031202222201032202222 33300030040030000002102202000022202201100020202200 55511150551115555522335544101142424433454455445545 00000030020000000012003202300022212311200131322211 40000000033033333300305555004320111202031231110020 00000000000000000000002200043000000000000020000030 11000030200330000023203203000043333333302333333332 55000020000023301000003421004023332133444034311151 21110150111155552201000000000021234031111021335500 55500010100001000020002202004033342122122250552344 55500000000030000002200003000010000000320000000311 44400350001033332200402102211001000000100030202321 33000330030330020000304330003002200200000200000032 00000000050000000020303320143000021000000030000033 00000000020020000000300000020000000000000000200100 11102120122000000002215511115020102302411140411155 55500500050033410002000403433000000000000000000010 44400240040030033320505434204040022400000240414445 50500050000000000000000000000033313433101111113402 55500050000555500000000000000022221002100033312300 30330000000050000045000000000031243313304323303410 55100130031033000002500000000033322023123122411323 00000000000000000000002100230100000000000000000012 22100120000030000000200000000000000000000230210222 20000000030000000003200000004300000000000000000043 00000000030000000001300000020000000000000120000031 33300320023210300202303323000020002200022043220022 00000021010040100001003301100021001001301210103200 50400450040050000000500000000000000000033000000000 20000200000000000000100000000043434033202224012330 51100030000003000503003320000033344043154514411412 55000530013000000002000000000030002300000000355400 00000000000000002002000000000022201001200022200211 33000300044020000001422000021000000000000020100030 10000013020000000043004324304044434033412244402420 33300020030000200000003302044200000000000000000032 33000000000030000022323200000032222322310230202211 00000000131000000002212100033200100101100111100011 00000000030000000023200000033000000000000000000030 00000023020020000002300000022000000001100022120010 00000000040000000023200000032000000200000210100011 00000000020000000000100000022110000100000000220010 44400030030030000033300300023020000200000120000002 00000000030020000022202000032200000100000000000011 00000003040000000033310000043000000000000230000032 22200000000001000000002211000021221011101032000021 41400240000130000002503002000022000002200032432223 20200000031000000000400000243000000000000030000030
  • 19. Some essential elements in a GAP MV procedure 1. Data Matrix (n * p) (w/ Color coding) Continuous Ordinal Binary Nominal 2. Proximity Matrix for Subject (n * n) Continuous Ordinal Binary Nominal 3. Proximity (Variable p * p) Continuous Ordinal Binary Nominal 4. Permutation (variable) 4. Permutation (subject) 19
  • 20. Statistical Approach Identify Global Trend: Singular Value Decomposition Chen 2002, Statistica Sinica Rank 2 Elliptical R2E 20 SVD SVD1 Alter O. et al 2000, PNAS SVD2 -1 0 +1 (c) Correlation -8 1:1 +8 (a) Expression (d) -1 0 +1 (b) Correlation
  • 21. 21 Eisen et al. (1998) Tree seriation flipping of intermediate nodes (a) A B C D E D (b) A E B C (c) C E D B A 1 flip 3 flips 5 flips many flips 2n-1=25-1=16 Different Seriations (Ordering of Terminal Nodes or Leaves) Generated from Identical Tree Structure ideal model external and internal references for guiding flipping mechanism Statistical Approach: Identify Local Clusters
  • 22. -1 0 +1 (c) Correlation -8 1:1 +8 (a) Expression Approaching Statistics Statistical Approach HCT + R2E = HCTR2E (d) -1 0 +1 (b) Correlation -1 0 +1 (c) Correlation (d) (e) -1 0 +1 -8 1:1 +8 (a) Expression (b) Correlation - 1 0 +1 ( c) Correl at i on ( d) - 1 0 +1 - 8 1: 1 +8 ( a) Expressi on ( b) Correl at i on Hierarchical Tree Seriation GAP Elliptical (R2E) Seriation Tree guided by (R2E) 22
  • 23. GAP for Heritable (Genetic) Disease: Schizophrenia (National Taiwan University) Admission 6 month Psychiatry Research (1998) Lin, Chen et al. Psychopathological Dimensions in Schizophrenia: A Correlational Approach to Items of the SANS and SAPS Corr. -1 0 1 0.2 0.4 0.6 0.8 1 Corr. -0.2 0.2 -0.4 0.4 Absolute Random Error Coefficient 0 1 -.1 .1 G7 N6 G13 N1 N4 N2 N3 N5 G10 G12 G5 P2 G11 N7 G15 G3 G6 G4 G16 G8 P7 S2 G14 S1 S3 P4 P5 G2 G1 comforting the aggravating patient assistant to the aggravating patient transport of the aggravating patient to service setting financial aid general psychological/practical support coping with medical team understanding diagnosis and treatment identifying early signs of relapse understanding mental health laws general social acceptance occupational therapy sheltered working facilities advice on intimate relationship for patient lifelong custodial care for patient Need cluster for assistant to patient care Need cluster for accessing to relevant information Need cluster for societal support Need cluster for burden release Admission Hwu et al. Schizophrenia Research (2002) Symptom Patterns and Subgrouping of Schizophrenic Patients: Significance of Negative Symptoms Assessed on Admission 0 0.2 0.4 0.6 0.8 1 Corr. -1 0 1 Corr. -1 0 1 -1 0 1 G7 N6 G13 N1 N4 N2 N3 N5 G10 G12 G5 P2 G11 N7 G15 G3 G6 G4 G16 G8 P7 S2 G14 S1 S3 P4 P5 G2 G1 G1 G2 G3 Average Correlation Negative Disorg. Host./ Genes, Brain and Behavior (2009) Lin et al. Clustering by neurocognition for fine-mapping of the schizophrenia susceptibility loci on chromosome 6p 6 month Liu et al. J. of the Formosan Med. Ass. (2012) Medium-term course and outcome of schizophrenia depicted by the sixth-month subtype after an acute episode P3 G9 P1 P6 Negative Disorg. Host./ Excit. Del./ Hall. 15 10 5 0 Average Euc lidean Distanc e GONEG GWNEG G4 PANSS Score 1 2 3 4 5 6 7 P3 G9 P1 P6 Average Correlation 1 0.8 0.6 0.4 0.2 Correlation Coefficient -1 -0.5 0 0.5 1 Negative Symptoms Disorganized Thought Hostility / Excitement Delusion / Ha llucination Correlation Coefficient -1 -0.5 0 0.5 1 PANSS Score 1 2 3 4 5 6 7 G7 N6 N3 N1 N2 N4 N5 G16 G10 N7 G5 G13 G11 P2 G15 G12 G8 P7 S1 G14 S2 S3 P4 P6 P3 G9 P1 P5 G4 G2 G1 G3 G6 G7 N6 N3 N1 N2 N4 N5 G16 G10 N7 G5 G13 G11 P2 G15 G12 G8 P7 S1 G14 S2 S3 P4 P6 P3 G9 P1 P5 G4 G2 G1 G3 G6 Negative Symptoms Disorganized Thought Hostility / Excitement Delusion / Ha llucination Anxiety Symptoms RMG (n=61) PDHG1 (n=14) MBG (n=50) PDHG2 (n=38) 0 5 10 Average Euc lidean Distanc e 0.2 0.4 0.6 0.8 1 Excit. Del./ Hall. Anxiety J. of the Formosan Med. Ass. (2008) Yeh et al. Factors Related to Perceived Needs of Chief Caregivers of Patients with Schizophrenia PLoS ONE (2011) Lai et al. MicroRNA expression aberration as potential peripheral blood biomarkers for schizophrenia Schizophrenia Research (2013) Liu et al. Development of a brief self-report questionnaire for screening putative pre-psychotic states.
  • 24. GAP for Comparative Metabolome: Chinese Herbal Medicine Drs. Ning-Sun Yang, Lie-Fen Shyur, Wen-Chin Yang Agricultural Biotechnology Research Center (ABRC) of Academia Sinica BMC Genomics 9 (2008) Genomics and proteomics of immune modulatory effects of a butanol fraction of Echinacea purpurea in human dendritic cells Wang et al. Phytochemistry 70 (2009) Anti-diabetic properties of three common Bidens pilosa variants in Taiwan Chien et al. Journal of Nutritional Biochemistry 21 (2010) Comparative metabolomics approach coupled with cell-and gene-based assays for species classification and anti-inflammatory bioactivity validation of Echinacea plants Hou et al. BMC Complementary and Alternative Medicine 13 (2013) Morus alba and active compound oxyresveratrol exert anti-inflammatory activity via inhibition of leukocyte migration involving MEK/ ERK signaling. Chen et al. 紫錐菊 咸豐草 白桑
  • 25. GAP for Cancer Study: Non–Small Cell Lung Cancer (National Taiwan University) Journal of Clinical Oncology 23 (2005) Tumor-Associated Macrophages in Cancer Progression Chen J. J. et al. The New England Journal of Medicine 356 (2007) A Five-Gene Signature and Clinical Outcome in Non– Small-Cell Lung Cancer Chen H. Y. et al. Cancer Research 66 (2006) Non–Small Cell Lung Cancer with Tumor Cell Invasiveness Sher Y. P. et al. BMC Genomics 6 (2005)Molecular signature of clinical severity in recovering patients with (SARS-CoV) Lee Y. S. et al. (Chang Gung Hospital) Open Access Scientific Reports 1 (2006) In silico Therapeutic Drug Screening for Reversing the Lung Adenocarcinoma Overexpressed Gene Signatures. Kuo Y. L. et al. (Nat’l Yang-Ming Univ.) GAP for Infectious Disease: SARS Protein-Protien Interaction Nat’l Yang-Ming Univ. Molecular and Cellular Proteomics 12 (2013) An analysis of protein-protein interactions in cross-talk pathways reveals CRKL as a novel prognostic marker in hepatocellular carcinoma. Liu et al. b Simple Match Between Pathways F13A1,HSPB1! MAPK14,EGFR! EGFR,HSPB1! STAT1,PDGFRB! PDGFRB,CRKL! HCK,CRKL! ITGAV,PTK2! FLT1,CRKL! CRKL,MAPK1! CRKL,RAF1! MAPK3,PTPN11! STAT5A,SHC1! CRK,SRC! GAB1,SOS1! CRK,SHC1! PXN,PTPN11! PDGFRB,PTPN11! PDGFRB,PLCG1! PLCG1,PTK2! CRKL,GAB1! CRKL,PTPN11! BAD,YWHAZ! BAD,RAF1! PTK2,PTEN! PXN,PTEN! CRKL,PIK3R1! AKT1,HSPB1! AKT1,PDPK1! MAPK14,AKT1! PIK3R1,SHC1! PIK3R1,SRC! HCK,SOS1! CRKL,SOS1! PDGFRB,RAF1! FLT1,PTPN11! HCK,PLCG1! FLT1,PLCG1! CRKL,EGFR! CRK,KDR! CRKL,PTK2! FLT1,PTK2! MAPK14,MAPK3! BAD,MAPK8! AKT1,SMAD4! FLT1,HCK! HCK,PIK3CB! CTNNB1,FLT1! PIK3R1,PXN! FLT1,PIK3R1! AKT1,PAK1! AKT1,NOS3! AKT1,MDM2! PTK2,YES1! PXN,MAPK8! CRK,FLT1! MAPK3,MAPK1! PDGFRB,SLC9A3R1! EGFR,HCK! MCM7,CDC6! CDC6,MCM6! PLK1,PKMYT1! E2F1,CDC6! CCNB1,PKMYT1! CDK7,E2F1! PLK1,CCNB1! CCNB1,CDC25A! CCNA2,CCNB1! GAP for C-Y F. Huang, a PPI to Pathway c Simple Match Between PPIs M1! M2! B1! B2! H1! H2! A1! A2! P1! P2! P3! P4! P5! Signalling to RAS! Signaling by EGFR! PDGFR-alpha signaling pathway! PDGFR-beta signaling pathway! Signaling events activated by Hepatocyte Growth Factor Receptor (c-Met)! IGF1 pathway! Signaling events mediated by VEGFR1 and VEGFR2! role of pi3k subunit p85 in regulation of actin organization and cell migration! PI3K/AKT signalling! akt signaling pathway! mTOR signaling pathway! Hedgehog signaling events mediated by Gli proteins! PPAR signaling pathway - Homo sapiens (human)! Canonical Wnt signaling pathway! Complement and coagulation cascades - Homo sapiens (human)! Unwinding of DNA! Activation of the pre-replicative complex! cdk regulation of dna replication! sonic hedgehog receptor ptc1 regulates cell cycle! Cyclin A/B1 associated events during G2/M transition! E2F mediated regulation of DNA replication! a Color legends Not on the pathway! Both Positive! Mahlavu Only! Huh7 Only! b,c0 1
  • 26. Visualization for Binary Data Scatter-plot Matrix (SM) Parallel Coordinates Plot (PCP) 1 X1 x2 x3 x4 0 Mosaic Plot 26
  • 27. Matrix Visualization for Binary Data Essential elements in a GAP MV procedure? Continuous Binary 1. Data Matrix 2. Subject Proximity 3. Variable Proximity 1. Data Matrix 2. Subject Proximity 3. Variable Proximity Correlation Covariance polychoric Correlation . . . Euclidean Distance Manhattan Distance Correlation … ? 27
  • 28. Commonly used similarity coefficients for binary data 28 Tzeng et al. (BMEI 2009) (IEEE Xplore Digital Library)
  • 29. Binary GAP Example http://CGMIM Online www.bccrc.ca/ccr/CGMIM/ CGMIM performs automated text-mining of OMIM to identify genetically-related cancers Online Mendelian In Man (OMIM) is a computerized database of information about genes and heritable traits in human populations OMIM is maintained on the Internet by the National Center for Biotechnology Information at the US National Institutes of Health CGMIM considers 21 anatomic sites based on the major cancers identified by the National Cancer Institute of Canada CGMIM compares each OMIM entry name and alternative name with a list of gene names assigned by HUGO (HUman Genome Organization). CGMIM produces the number of genes for which an OMIM entry mentions each pair of cancers, as well as a ratio of the observed and expected number 29 of genes for the combination
  • 30. CGMIM All Data (1948 genes * 21 Sites) Original Order 21 Cancer Sites 1948 Related Genes Jaccard: a/(a+b+c) 30
  • 31. 21 Cancer Sites 1948 Related Genes CGMIM All Data (1948 genes * 21 Sites) Single_Tree_GrandPa_Guide Jaccard: a/(a+b+c) 31
  • 32. 21 Cancer Sites 768 Related Genes CGMIM 768 genes at least at 2 Sites Original Order Jaccard: a/(a+b+c) 32
  • 33. 21 Cancer Sites 768 Related Genes CGMIM 768 genes at least at 2 Sites GAP_Elliptical_Order Jaccard: a/(a+b+c) 33
  • 34. Approaching Statistics Statistical Approach Matrix visualization of nominal data (GAP approach) Example: Classification of Animals Data Shizuhiko Nishisato 2006 34
  • 35. 35 票種 code 合計 早鳥/社會 0 351 早鳥/學生 1 121 邀請/⼀一般 2 48 邀請/媒體 3 25 邀請/貴賓 4 22 黑客松 5 90 邀請/講師 6 32 邀請/贊助 7 12 總計 701 性別 code 合計 女 0 166 男 1 535 總計 701 業別 code 合計 無法認定 0 56 資訊業 1 203 研究機構 2 41 科技業 3 33 顧問公司 4 13 金融業 5 18 通訊業 6 21 自由業 7 11 傳播業 8 17 學術 9 242 政府機關 10 11 醫藥業 11 8 電子業 12 4 公益團體 13 8 通路商 14 5 其他 15 10 總計 701 餐飲 code 合計 素食 0 47 葷食 1 654 總計 701 通知 否 0 61 是 1 活動 code 合計 640 總計 701 code 合計 黑客松 0 91 資料分析 1 196 演講議程 2 414 總計 701 電郵 code 合計 .edu 0 49 .org 1 15 gmail 2 500 hotmail 3 44 yahoo 4 10 MSN 5 3 .com 6 64 其他 7 16 總計 701 報名資料 可能變數 報名序 1~701 票 種 業 別 通 知 活 動 電 郵 … … … … … 0 1 1 1 1 0 9 1 1 2 0 1 1 0 6 0 1 2 1 2 0 9 2 1 2 1 9 2 1 2 1 9 2 1 2 1 9 1 1 2 1 9 1 1 2 0 3 2 1 3 1 9 2 1 3 0 3 2 1 6 0 11 2 1 2 1 9 1 1 2 1 9 2 1 2 0 1 2 0 2 1 9 2 1 2 1 9 2 1 2 0 3 2 1 6 0 2 2 1 1 0 1 2 1 2 … … … … … 701 人 7 變數 報 名 葷 素 性 別 … … … 139 1 1 140 1 1 141 0 0 142 1 1 143 1 1 144 1 0 145 1 1 146 1 1 147 1 0 148 1 1 149 1 1 150 1 1 151 1 1 152 1 1 155 1 1 157 1 1 158 1 0 159 0 1 160 1 0 162 1 1 163 1 0 … … …
  • 36. Approaching Statistics Statistical Approach Essential elements in a GAP MV procedure? 1. Data Matrix 2. Subject Proximity 3. Variable Proximity type 2 measurements 1. Data Matrix 2. Subject Proximity 3. Variable Continuous Proximity Nominal Correlation Covariance polychoric Correlation . . . Euclidean Distance Manhattan Distance Correlation … ? Matching proportion χ ? ? 36
  • 37. A typical nominal data Shizuhiko Nishisato, 2006 Classifica5on of Animals 35 animals were sorted into piles of similar animals by 15 students Animal/Subject S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 Alligator 8 1 6 9 6 1 3 4 1 4 3 4 3 6 4 Bear 6 3 2 6 6 1 3 4 4 5 5 1 4 2 2 Camel 4 3 9 3 1 4 3 5 4 2 5 1 7 7 8 Cat 6 3 7 4 0 1 1 2 3 3 1 1 6 3 5 Cheetah 3 3 7 4 0 1 3 5 4 3 6 1 6 2 2 Chiken 7 2 4 1 2 5 7 1 5 1 1 3 8 4 1 Chimpanzee 5 3 5 7 5 2 4 4 2 2 6 4 3 6 6 Cow 1 3 9 6 1 1 3 5 3 4 1 1 4 5 8 Crane 7 2 4 5 2 5 5 1 5 1 2 3 8 4 1 Crow 7 2 4 5 2 5 5 1 5 1 2 3 8 4 1 Dog 6 3 7 10 0 2 1 2 3 3 1 1 4 3 5 Duck 7 2 4 1 2 5 5 1 5 1 2 3 8 4 1 Elephant 4 3 6 3 1 4 3 5 4 5 3 1 7 7 2 Fox 6 3 7 4 0 1 6 2 3 3 3 1 4 3 5 Frog 8 1 3 2 3 3 2 3 1 4 4 2 1 1 3 Giraffe 1 3 8 3 1 4 3 5 4 2 5 1 7 7 8 Goat 3 3 9 6 1 4 6 5 3 3 1 1 5 3 5 Hawk 7 2 4 5 2 5 5 1 5 1 3 3 8 4 1 Hippopotamus 4 3 6 6 6 4 3 3 4 4 5 1 7 7 2 Horse 6 3 9 6 1 2 3 5 3 3 1 1 5 5 8 Leopard 1 3 7 4 0 1 3 5 4 3 3 1 6 2 2 Lion 5 3 7 4 6 1 3 5 4 3 3 1 7 2 2 Lizard 2 1 3 2 3 3 2 3 1 4 4 2 2 1 3 Monkey 6 3 5 7 5 2 4 4 2 2 6 4 3 6 6 Ostrich 3 2 4 1 2 5 3 1 5 1 5 3 8 7 8 Pig 1 3 9 6 1 1 6 5 3 3 1 1 5 5 5 Pigeon 7 2 4 5 2 5 5 1 5 1 2 1 8 4 1 Rabbit 6 3 1 6 0 4 6 2 3 3 1 1 5 3 5 Racoon 6 3 7 10 4 1 6 2 3 3 3 1 4 3 5 Rhinoceros 4 3 5 6 6 4 3 5 4 4 5 1 7 7 2 Snake 8 1 3 9 6 3 2 3 1 4 4 2 2 1 3 Sparrow 7 2 4 5 2 5 5 1 5 2 2 3 8 4 1 Tiger 5 3 7 4 0 1 3 5 4 3 3 1 6 2 2 Tortoise 8 1 3 9 3 3 2 3 1 5 4 2 1 1 3 Turkey 7 2 4 1 2 5 7 1 5 1 1 3 8 4 1 What about 3500 samples 1500 variables ? 37
  • 38. Alligator Bear Camel Cat Cheetah Chicken Cow Crane Chimpanzee Crow Dog Duck Elephant Fox Frog Giraffe Goat Hawk Hippopotamus Horse Leopard Lion Lizard Ostrich Pig Pigeon Rabbit Racoon Rhinoceros Snake Sparrow Tiger Tortoise Turkey Monkey
  • 39. 39 Uni-variate Display Bar-Chart Pie-Chart S12 S2 Mammalia Reptilia Aves Reptilia AvesMammalia Primates ? 20 15 10 5 0 12 3 1 2 3 4
  • 40.
  • 41. 1. Reptile 2. Bird3. Mammal 41 2. Reptile 3. Bird4. Primate? 1. Mammal Bi-variate Display Mosaic Display
  • 42. Conventional multivariate visualization for this data 2D Mosaic Display 5D Mosaic Display Scatter-plot Matrix Scatter-plot Parallel Coordinate Plot 42
  • 43. GAP Categorical MV Solution with 3 matrix maps (original orders) 43 Alliga Bear Camel Cat Cheeta Chicke Chimpa Cow Crane Crow Dog Duck Elepha Fox Frog Giraff Goat Hawk Hippop Horse Leopar Lion Lizard Monke Ostric Pig Pigeon Rabbit Racoo Rhinoc Snake Sparro Tiger Tortoi Turkey
  • 44. GAP Categorical MV 44 Solution with 3 matrix maps (R2E orders)
  • 45. 45 11 6 7 15 14 12 13 9 4 3 5 8 10 2 1 Ostri c Turk e y Chicke Pigeon Hawk Sparro Duck Crow Cran e L i zard Frog Tor toi Snake Alli ga Hippop Bear Rhinoc Elepha L i o n Tiger Leopar Cow Fox Racoo Cheeta Cat Dog P i g Rabbit Hors e Goat Camel Gir aff Monke Chimpa 11 6 7 15 14 12 13 9 4 3 5 8 10 2 1 Aves Reptilia Mammalia Primates
  • 46. D0_Lion D1_Elephant D2_Camel D3_Hawk D4_Fox A0_Dog A1_Alligator A2_Chimpanzee A3_Cow A4_Crow A5_Pigeon A6_Cheetah A7_Chiken A8_Bear A9_Cat B0_Rabbit B1_Frog B2_Goat B3_Tiger B4_Rhinoceros B5_Giraffe B6_Duck B7_Sparrow B8_Hippopotamus B9_Monkey C0_Turkey C1_Pig C2_Crane C3_Leopard C4_Ostrich C5_Lizard C6_Horse C7_Racoon C8_Tortoise C9_Snake
  • 47. Alligator Bear Camel Cat Cheetah Chicken Cow Crane Chimpanzee Crow Dog Duck Elephant Fox Frog Giraffe Goat Hawk Hippopotamus Horse Leopard Lion Lizard Ostrich Pig Pigeon Rabbit Racoon Rhinoceros Snake Sparrow Tiger Tortoise Turkey Monkey ?
  • 48. 與會人士類別性資料 Univariate frequency breakdown Bivariate table 48
  • 49. 49 票種 code 合計 早鳥票/社會人士 0 351 早鳥票/學生 1 121 邀請票/⼀一般 2 48 邀請票/媒體 3 25 邀請票/貴賓 4 22 g0v 黑客松票 5 90 邀請票/講師 6 32 邀請票/贊助 7 12 總計 701 性別 code 合計 女 0 166 男 1 535 總計 701 業別 code 合計 Missing、無法認定 0 56 資訊業 1 203 研究機構 2 41 科技業 3 33 顧問公司 4 13 金融業 5 18 通訊業 6 21 自由業 7 11 傳播業 8 17 學術 9 242 政府機關 10 11 醫藥業 11 8 電子業 12 4 公益團體 13 8 通路商 14 5 製造業、食品業、 運輸業、服務業 15 10 總計 701 餐飲需求 code 合計 素食 0 47 葷食 1 654 總計 701 參加活動 code 合計 g0v 黑客松 0 91 資料分析上手課程 1 196 演講議程 2 414 總計 701 日後通知 code 合計 否 0 61 是 1 640 總計 701 電郵 code 合計 .edu 0 49 .org 1 15 gmail 2 500 hotmail 3 44 yahoo 4 10 MSN 5 3 .com 6 64 未定、其他 7 16 總計 701 報名資料可能變數 報名序 1~701
  • 50. 50 701 人 5 變數 Optimization 票 種 業 別 通 知 活 動 電 郵 … … … … … 0 1 1 1 1 0 9 1 1 2 0 1 1 0 6 0 1 2 1 2 0 9 2 1 2 1 9 2 1 2 1 9 2 1 2 1 9 1 1 2 1 9 1 1 2 0 3 2 1 3 1 9 2 1 3 0 3 2 1 6 0 11 2 1 2 1 9 1 1 2 1 9 2 1 2 0 1 2 0 2 1 9 2 1 2 1 9 2 1 2 0 3 2 1 6 0 2 2 1 1 0 1 2 1 2 … … … … … 3 共變數 葷 素 性 別 報 名 … … … 1 1 139 1 1 140 0 0 141 1 1 142 1 1 143 1 0 144 1 1 145 1 1 146 1 0 147 1 1 148 1 1 149 1 1 150 1 1 151 1 1 152 1 1 155 1 1 157 1 0 158 0 1 159 1 0 160 1 1 162 1 0 163 … … …
  • 51. 51 其他 .com MSN Yahoo Hotmail Gmail .org .edu 是 否 業別 演講議程 資料分析 黑客松 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 票種 邀請贊助 邀請講師 黑客松 邀請貴賓 邀請媒體 邀請一般 早鳥學生 早鳥社會 電郵 通知 活動 李育杰 701 人 5 變數
  • 52. 52 報 名 女素序 票 業 活 通 電 種 別 動 知 郵 票種 業別 活動 通知 電郵 5 變數 701 人 Original Orders (報名序)
  • 53. 5 變數 Original Orders (報名序) 53 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 其他 .com MSN yahoo hotmail gmail .org .edu 通知 是 否 活 動 演講議程 資料分析 g0v黑客松 報 名 序 票 種 業 別 電 郵 女素 票 業 活 通 電 種 別 動 知 郵 票種 業別 活動 通知 電郵 701 人
  • 54. 5 變數 票 通 活 電 業 種 知 動 郵 別 Random Orders 54 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 其他 .com MSN yahoo hotmail gmail .org .edu 通知 是 否 活 動 演講議程 資料分析 g0v黑客松 報 名 序 票 種 業 別 電 郵 票種 通知 活動 電郵 女素 業別 701 人
  • 55. 沈澱圖 (pie-chart, bar-chat) (可觀察單一變數各類別之 比例,但失去變數間連結) 55 女男 素葷報 名票 種 業別 活動 通知 電郵 通知 活動 票種 業別 電郵 其他 .com MSN yahoo hotmail gmail .org .edu 是 否 演講議程 資料分析 黑客松 邀請/贊助 邀請/講師 黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 701 人 166
  • 56. 電 通 活 票 業 郵 知 動 種 別 其他 56 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 報 名 序 票 種 業 別 Elliptical Seriations (R2E) 電郵 通知 活動 票種 業別 .com MSN yahoo hotmail gmail .org .edu 通知 是 否 活 動 演講議程 資料分析 g0v黑客松 電 郵 5 變數 女素 701 人
  • 57. 業 電 通 活 票 別 郵 知 動 種 57 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 報 名 序 業別 電郵 通知 活動 票種 票 種 業 別 Hierarchical Clustering Tree (HCT) 其他 .com MSN yahoo hotmail gmail .org .edu 通知 是 否 活 動 演講議程 資料分析 g0v黑客松 電 郵 5 變數 女素 701 人
  • 58. 電 通 活 票 業 郵 知 動 種 別 58 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 報 名 序 票 種 業 別 Hierarchical Clustering Tree (HCT) 其他 .com MSN yahoo hotmail gmail .org .edu 通知 是 否 活 動 演講議程 資料分析 g0v黑客松 電 電郵 郵 通知 活動 票種 業別 5 變數 女素 701 人
  • 59. Hierarchical Clustering Tree with Flips Guided Elliptical Seriation 59 電郵 通知 活動 票種 業別 報名 Orders: HCT-R2E 票 種 其他 通路商 公益團體 電子業 醫藥業 政府機關 學術 傳播業 自由業 通訊業 金融業 顧問公司 科技業 研究機構 資訊業 無法認定 業 別 電 郵 其他 .com MSN yahoo hotmail gmail .org .edu 通 知 是 否 活 動 演講議程 資料分析 g0v黑客松 邀請/贊助 邀請/講師 g0v黑客松 邀請/貴賓 邀請/媒體 邀請/一般 早鳥/學生 早鳥/社會 女素
  • 60. Approaching Statistics Statistical Approach CIA Data: 160 international organization membership pattern (variables) for 230 countries/regions (subjects) 0. non-member □ 1. member ■ 2. observer 3. associate member 4. guest 5. dialogue partner CIA Political Map of the World 230 countries (regions) http://www.faqs.org/docs/factbook/index.html 160 international organization 60 Matrix Visualization with cartography links
  • 61. Draw one membership map for each organization (variable)? 1 2 3 4 5 6 7 8 9 . . . 160 maps (?) . . . 158 159 160 61
  • 62. Cartography Coloring Scheme with Categorical GAP (CartoGAP) - 2 Data: Ranks of 5 Candidates (扁宋連許李) on 360 Townships 2000 總統大選資料 Is it possible to visualize information structure for all 5 candidates in a single MAP? A B C A B C D E D E ? Rank 1 2 3 4 4.5 5 #
  • 63. ! ABCDE 扁 宋 連 許 扁宋連 許李 李
  • 64. Cartography Coloring Scheme with GAP (CartoGAP)-2 (B). CateGAP Color Map for Each Individual Variable A B C E A B C D E D (C). Final Single CateGAP Cartography Color MAP for Complete Information Visualization 扁 宋 連 李 扁宋連 許李 許
  • 65. From physical maps to conceptual maps 64 Chromosome Map Macro Biodiversity Semiconductor Wafer Quality Control Micro Biodiversity
  • 66. 65 Matrix Visualization for Symbolic Data (Analysis) for Big Data?
  • 67. 1.1 Symbolic Data Analysis (SDA) and 1.2 Matrix Visualizaon (MV) Fig. 1. Diagram for related conven5onal data matrix and symbolic (interval type) data table with their corresponding proximity matrices for samples/concepts and variables.
  • 68. Example: Japan Minryoku 2010 Data (with Junji Nakano, ISM) 67 Level 1 Level 2 Level 3 Level 4 Region (10) Area (151) District (821) City (1899) 58 variables 1899 Level 4 Cities 市區町村 58 variables (interval) 151 Level 2 Areas地域 continuous Data ↓ Rank Data (1~1899) merged (interval of ranks) data covariate 10 Level 1 Regions
  • 70. 12 displaying modes for MV of interval data 58 interval variables 151 regions (concepts) Min Mid Max Length Length 949 len949, 949mid 1746 length 900mid1000 Sufficient Sediment Row Condition Col Condition
  • 71. Statisticians, Data Analysts, Bioinformaticists A statistician is someone who wants to get exactly the right answer, even if it’s the answer to the wrong question. A data analyst is someone who is willing to settle for an approximate answer, as long as it’s the answer to the right question. A bioinformaticist is someone who is willing to settle for answers of unknown accuracy, to questions that have not been clearly articulated, as long as the results can be graphed in color. David B. Allison, Ph.D. Department of Biostatistics University of Alabama at Birmingham
  • 72. Approaching Statistics Statistical Approach 12. MV for Color Blind people Types of color blind Monochromacy Dichromacy Protanopia and deuteranopia Hereditary tritanopia Anomalous Trichromacy http://www.vischeck.com/examples/ To act passively to prevent from using color systems that are difficult for color blind people to understand. or To work actively in assisting people with visual impairments to have better visualization of data/information. “I believe there are more mathematics/statistics blind people than color blind people” 71