SlideShare a Scribd company logo
1 of 47
Mining Temporal Pattern
and Related Applications
Yi-Cheng Chen 陳以錚
1
Curriculum Vitae
 Basic Information
 Birthday – Aug. 31, 1978

 Education
 Depart. of CSE, YZU (B. S. 2000)
 Depart. of CS, NTUST (M. S. 2002)
 Depart. of CSIE, NCTU (Ph. D. 2012)

Advisor: Prof. Suh-Yin Lee ( 李素瑛 教授 ), Wen-Chih Peng ( 彭文志
教授 )
 Ph. D. Dissertation:
A Study on Time Interval-based Sequential Patterns Mining


2
Outline
Current Research
Temporal Pattern Mining
Social Network Analysis
Smart Home Application
Cloud Computing

3
Why Data Mining?
Commercial Viewpoint
 Lots of data is being collected
 Web data, e-commerce
 purchases at department
 Bank/Credit Card

transactions

 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer

Relationship Management)

4
Why Data Mining?
Scientific Viewpoint
 Data collected and stored at

enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene

expression data

 scientific simulations

generating terabytes of data

 Traditional techniques infeasible for raw data
 Data mining may help scientists
 in classifying and analyzing data
 in Hypothesis Formation

5
Data Mining
 We are buried in data, but looking for knowledge
 Data mining
 Knowledge discovery in databases
 Extraction of interesting knowledge (rules,

regularities, patterns) from data in large databases

6
Temporal Pattern
Mining

7
Sequential Pattern Mining
 Point-based sequential pattern mining
 Customer analysis, network intrusion detection, finding tandem

repeats in DNA sequence…
 Simple relation between point

time point-based
beer
milk
diaper

beer

milk
diaper

Three relation
(before, equal, after )

with min_sup = 2, 〈(ab)dc〉 is
a frequent sequential pattern
8
Interval Data Everywhere !!
Interval data
Data has duration time
Clinical data, library data, appliance usage data

Applications
Diagnosis System, recommendation system, Smart

home

DB

Diagnosis System

Recommendation

Smart Home

9
Temporal Pattern Mining
 Interval-based sequential pattern mining
 Library reader analysis, patient disease analysis, stock

fluctuation, ...
 Complex relations

time interval-based
cough
Chess pain
fever

Allen’s 13 temporal
relations

With min_sup = 4,
is a frequent temporal pattern
10
Allen Relationship
 Allen’s 13 temporal logics describe relationship between any two
events (binary relation) [ACM 1983]

11
Real example
 Some temporal patterns generated from NCTU library

12
Motivation


Representation
 Allen’s relations are binary relation
 Express the relation more than 3 intervals





Efficient algorithms






Ambiguous problem
Space usage

Mining temporal pattern *
Mining closed temporal pattern
Incrementally maintain discovered temporal pattern
and closed temporal pattern

Related applications
 Social network
 Smart home

13
Proposed Method
 Coincidence representation
 Segment intervals into disjoint slices
 Nonambiguous and compact representation

 Endpoint representation
 Global information of a sequence
 Nonambiguous and compact representation
 TPMiner (Temporal Pattern Miner)
 Pattern-growth approach
Without candidate generation and test
 Two components
 RPrefixSpan
 Pruning strategies


14
Coincidence representation
 Segment intervals into disjoint slices
 Four kinds of event slice
 Start slice (+), intermediate slice (*), finish slice (-) and intact

slice ( )

 Coincidence
 Slices occurring simultaneously

 Space usage (for a k-pattern)
 Best: k, Worst: 2k space
A

event intervals
coincidence

B
(A+) (A−B+) (B−)

coincidence representation:

C
D

E

(C+) (C*D) (C−)

(E)

(A+) (A−B+) (B−) (C+) (C*D ) (C−) (E)

15
Incision strategy
 A data structure, endtime_list
 Sort and merge

(A, 1, 4)

 Trace endtime_list one-by-one

(B, 2, 5)
(C, 2, 8)

endtime_list
symbol time type
1
4
2
5
2
8
3
5
5
7

s
f
s
f
s
f
s
f
s
f

endtime_list
symbol time type

sort
merge

A
BC
D
A
BD
E
E
C

1
2
3
4
5
5
7
8

s
s
s
f
f
s
f
f

(E, 5, 7)

coincidence representation:
(A+) (B+ C+) (A− D+ ) (B− D− ) @ (E) (C− )

…

A
A
B
B
C
C
D
D
E
E

(D, 3, 5)

trace one- by- one

16
Endpoint Representation
time points of events

A
B
C
D

A+ ( B+ C+ ) A− ( B− C− D+ ) D−

 Sequence of ordered time points
 +: start time, − : finish time

 Nonambiguous
 Space usage (for a k-pattern)
 2k space
17
Example Database

18
TPMiner – RPrefixSpan (1/2)
 Every item is disjoint
 The relations among slices are simple
 Before, equal and after (like time-point data)

 RPrefixSpan
 Borrow the idea of PrefixSpan

 Scan local database to find frequent slices
 Append and extend the pattern
 Project database

 Pruning strategy
 Reduce search space
 Pre-pruning and post-pruning
19
TPMiner – RPrefixSpan (2/2)
..

D |e1...

D |e2
…

scan
database
frequent items:
e1, e2, ..., ei, ..., en

…



D |ei

D |en

transform sequences
 and project database

..

..

D |ei...

..

..

collect all mining

patterns
Frequent
temporal patterns

..

D |en...

..

..
..

D |e2...

…

D

…

D |e1

..

..

recursively project database and

append & extend pattern
20
Pruning Strategy – Pre-pruning
 Utilize the concept of slice and coincidence
 Start

slices and finish slices occur in pairs
 Only require projecting the frequent finish slices which
have the corresponding start slices in their prefixes
Non-qualified pattern

〈 A+ A− 〉
D|〈A+ A−〉
〈A B 〉
+

…

〈 A+ 〉

D| 〈A+〉

scan
database
frequent local slice :
A− , B+, B− , C

+

D|〈A+ B+〉

Non-promising projection
can be pre-pruning !

〈 A+ B− 〉
D|〈A+ B−〉
〈 A+ C 〉
D|〈A+ C〉
21
Pruning Strategy – Post-pruning
 Utilize the concept of slice and coincidence
 Start

slice always appear before finish slice
 Only collect the significant postfixes
 With respect to a prefix α, all finish slices in postfix
have corresponding start slices in α
…
...

S1: 〈(B + )(D + )(E)(D - )(B
-
)〉
S2: 〈(B + )(B - D + )(E)(D - )〉
S3: 〈(B)(A)(D + )(E)(D - )〉

〈 E〉

...

A coincidence database D

Projected database
can be post-pruning

D |〈 E 〉
S1: 〈(D - )(B
-
)〉
S2: 〈(D - )〉
S3: 〈(D - )〉

…

Insignificant sequences
22
Experimental Results (1/2)
D200k – C40 – N10k

70000

IEMiner
TPMiner-CR
TPMiner-ER

40000
30000
20000
10000

3500
3000
2500
2000
1500
1000
500
0

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(a) The performance of six algorithms

1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(b) The number of temporal patterns
N10k – C20 – N10k

2500
H-DFS
ARMADA

2000

memory usage (MB)

execution time (sec)

50000

number of generated patterns

H-DFS
ARMADA
TPrefixSpan

60000

D200k – C40 – N10k

4000

TPrefixSpan
IEMiner

1500

TPMiner-CR
TPMiner-ER

1000

500

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

23
Experimental Results (2/2)
7000

6000
TPMiner-CR

5000

TPMiner-CR without
pre-pruning strategy

TPMiner-CR

5000

execution time (sec)

execution time (sec)

6000

4000
3000
2000

TPMiner-CR without
post-pruning strategy

4000
3000
2000
1000

1000

0

0
1

0.9

0.8

0.7

0.6

1

0.5

minimum support (%)

0.8

0.7

0.6

0.5

minimum support (%)

(a) The performance test of influence
on pre-pruning strategies

(b) The performance test of influence
on post-pruning strategies

6000

8000

4000

7000

TPMiner-CR

TPMiner-CR without
subset-pruning strategy

6000

TPMiner-CR without
any pruning strategy

execution time (sec)

TPMiner-CR

5000

execution time (sec)

0.9

3000
2000
1000

5000
4000
3000
2000
1000
0

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(c) The performance test of influence
on subset-pruning strategies

1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(b) The performance test of influence
on all proposed pruning strategies

24
Related Applications

25
Smart Home Application
Home
Home Server

light
Air Conditioner

Cloud
Database

Current Behavior

(1) Sensor data log

P1:
P2:
P3:

(3) Behavior Detection

Alarm
D-Link controler

light

ID 2

Air Conditioner

…
Usage
Patterns

(2) Pattern Mining

ID 2

Usage Pattern

Light
ID 3
ID 4

on

off

on

off

on

off

on

off

light
Air Conditioner

Current Behavior

(4) Abnormal Detection

Remote Control

(5) System Alarm &
Remote Control
26
Dynamic Social Network (1/2)
Dynamic social network
 A sequence of interaction graph
…

 Nodes and edges vary with time

A lossless transformation
 Graph sequence  interval sequence

t1

SID

A

A

A

E
B

E
B

B

D
C

D
C

G2

start
time

finish
time

B
D
E

1
2
4

3
4
6

A
C

1
1

3
3

B
D
E

1
2
4

3
4
6

A
C

1
1

3
3

E
B

D
C

G1

E

event
symbol

A

A

D

….

B

C

G3

G4

C
D

t2

t3

event sequence
B

E
D
A
C

B

E
D
A
C

27
Dynamic Social Network (2/2)
Reduce the complexity of graph
Avoid

isomorphism testing

Dynamic Social Network Analysis
Pattern

mining
Classification
Recommending system
Network sampling
Clustering

28
Social Network
Analysis

29
Social Network Analysis
 A graph representation
 Nodes and edges

30
Influence
Maximization

31
Advertisement Budget
 According to

, advertisement spending on
worldwide social networking sites
 2008, $23.3 millions
 2010, $23.6 billions
 2011, almost $25.5 billions

Advertisement spending

32
Influence Maximization
 Word-of-mouth effect in social network
 Influence maximization problem
 Select initial users (seeds) so that the number of users
that adopt the product or innovation is maximized
social network

social network

Seeds select

33
Motivation

Characteristic of social network
 Community structure
10

5
4

6
1

9

7

2

11

3

10

5
12

4

8

6
1

9

7

2

11

3
12

8

Community and degree heuristic (CDH)
 Utilize community information
 Avoid influence overlapping

34
Proposed Algorithm – CDH
 Framework of CDH

35
CDH – Adjust Step
 Adjust selected fundamental nodes
 Seeds selected from large community may activate more
inactive nodes than small community
 Replace the fundamental node in small community


If we can activate more inactive nodes

 Finally, output the result as selected seed nodes
second largest
degree node
in C1
replace!!

largest degree
node in Ck
C1

C2

C3

……

Ck

delete!!

36
Experimental Results - Facebook

37
Dynamic
Recommendation

38
Recommendation System
 predict the ratings or preferences
 using a model build from the characteristics

(a) amazon.com

(b) youtube.com
39
Collaborative Filtering (CF)
Calculate the similarity between the active user and the
other users
•
Person’s correlation, cosine similarity, conditional
probability, etc.
2. Predict the rating of items that have not been rated by the
active user
3. Output the top-k items by the predicting results
1.

40
item

i1

i2

i3

4

1

i4

4

user

A

B

2

C

3

Avg.
Of
user

3

4

3

3

2

2

(1 − 3)(2 − 3)
wa , b =
normalize
(1 − 3)(3 − 2) + (4 − 3)(3 − 2)
wa , c =
normalize
(4 − 3) * wa , b + (2 − 2) * wa , c
pa , i = 3 +
normalize
4

41
Motivation
 Dynamic! Dynamic! Dynamic!
 Why we need dynamic
 All things vary with time

 Dynamic Collaborative Filtering
 consider the time influence in the calculation.

 Without considering about the time
 the results of prediction might be out of date.

42
Dynamic Similarity based on
Collaborative Filtering (DSCF)
( user->item : rating (time) )
1 -> 1193 :5 (2012.5.18)
5 -> 661 :3 (2012.3.5)
3 -> 914 :3 (2012.6.27)
1 -> 3408 :4 (2012.3.18)
……

( user->item : rating (time) )
9 -> 6610 : 5 (2012.7.8)
2 -> 6610 : 3 (2012.7.15)
… ….

Msimt0−1
DBt0−1

……….
…..
..

*α

Msimt0
……….
….
..

Msimtt −1
DBtt −1

……….
….
..

Msimt0 = α ⋅ Msimt0−1 + (1 − α ) ⋅ Msimtt −1

*(1-α)
*(1-α)

43
Advanced DSCF
 α (similarity decay value, SDV) might not be

consistent for all time.
 each user might have his/her own SDV in different
time points.
 feedback predicted values from actual values

44
?

Active
user

Predict

pa ,i = ra

∑
+

k
j =1

(rj ,i − rj ) ⋅ [α ⋅ sim′ , j + (1 − α ) sim′′, j ]
a
a

∑

k
j =1

′
[α ⋅ sim′ , j + (1 − α ) sima′, j ]
a

Recommend

Aa,i

Active
user

∑
+

k

Feedback

Aa ,i = ra

j =1

′
′
( r j ,i − rj ) ⋅[α ⋅ sima , j + (1 −α ) sima′, j ]

∑

k
j =1

′
[α ⋅ sima , j + (1 −α ) sim′′, j ]
a
45
Experimental Results

46
47

More Related Content

What's hot

Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmNECST Lab @ Politecnico di Milano
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...AshishDPatel1
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in RSudhakar Chavan
 
Strata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big DataStrata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big DataSrinath Perera
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
 
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance IJECEIAES
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big DataNick Boucart
 
Team activity analysis / visualization
Team activity analysis / visualizationTeam activity analysis / visualization
Team activity analysis / visualizationNicolas Maisonneuve
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...IRJET Journal
 
Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13specialk29
 
IRJET- Mining Frequent Itemset on Temporal data
IRJET-  	  Mining  Frequent Itemset on Temporal dataIRJET-  	  Mining  Frequent Itemset on Temporal data
IRJET- Mining Frequent Itemset on Temporal dataIRJET Journal
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsZbigniew Jerzak
 
Concurrent Replication of Parallel and Distributed Simulations
Concurrent Replication of Parallel and Distributed SimulationsConcurrent Replication of Parallel and Distributed Simulations
Concurrent Replication of Parallel and Distributed SimulationsGabriele D'Angelo
 

What's hot (20)

Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
t10_part1.pptx
t10_part1.pptxt10_part1.pptx
t10_part1.pptx
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
GaianDB
GaianDBGaianDB
GaianDB
 
P229 godfrey
P229 godfreyP229 godfrey
P229 godfrey
 
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation AlgorithmA Scalable Dataflow Implementation of Curran's Approximation Algorithm
A Scalable Dataflow Implementation of Curran's Approximation Algorithm
 
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
Graph based Approach and Clustering of Patterns (GACP) for Sequential Pattern...
 
Spanner (may 19)
Spanner (may 19)Spanner (may 19)
Spanner (may 19)
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Strata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big DataStrata 2014 Talk:Tracking a Soccer Game with Big Data
Strata 2014 Talk:Tracking a Soccer Game with Big Data
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
 
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance
Optimal Round Robin CPU Scheduling Algorithm using Manhattan Distance
 
Experimenting With Big Data
Experimenting With Big DataExperimenting With Big Data
Experimenting With Big Data
 
Team activity analysis / visualization
Team activity analysis / visualizationTeam activity analysis / visualization
Team activity analysis / visualization
 
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
Empirical Analysis of Radix Sort using Curve Fitting Technique in Personal Co...
 
Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13
 
IRJET- Mining Frequent Itemset on Temporal data
IRJET-  	  Mining  Frequent Itemset on Temporal dataIRJET-  	  Mining  Frequent Itemset on Temporal data
IRJET- Mining Frequent Itemset on Temporal data
 
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing SystemsLatency-aware Elastic Scaling for Distributed Data Stream Processing Systems
Latency-aware Elastic Scaling for Distributed Data Stream Processing Systems
 
Concurrent Replication of Parallel and Distributed Simulations
Concurrent Replication of Parallel and Distributed SimulationsConcurrent Replication of Parallel and Distributed Simulations
Concurrent Replication of Parallel and Distributed Simulations
 

Similar to The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemErdi Olmezogullari
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning TomographyAmir Adler
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban DonatoEsteban Donato
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark Summit
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...Mokhtar SELLAMI
 
Performance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applicationsPerformance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applicationsAccumulo Summit
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationKuldeep Jiwani
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactArun Kejariwal
 

Similar to The study on mining temporal patterns and related applications in dynamic social network(20120928晚) (20)

Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Design and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management SystemDesign and Implementation of A Data Stream Management System
Design and Implementation of A Data Stream Management System
 
Deep Learning Tomography
Deep Learning TomographyDeep Learning Tomography
Deep Learning Tomography
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
50120140503004
5012014050300450120140503004
50120140503004
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas Dinsmore
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
CARI-2020, Application of LSTM architectures for next frame forecasting in Se...
 
Performance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applicationsPerformance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applications
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
ODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identification
 
Finding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impactFinding bad apples early: Minimizing performance impact
Finding bad apples early: Minimizing performance impact
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 

The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

  • 1. Mining Temporal Pattern and Related Applications Yi-Cheng Chen 陳以錚 1
  • 2. Curriculum Vitae  Basic Information  Birthday – Aug. 31, 1978  Education  Depart. of CSE, YZU (B. S. 2000)  Depart. of CS, NTUST (M. S. 2002)  Depart. of CSIE, NCTU (Ph. D. 2012) Advisor: Prof. Suh-Yin Lee ( 李素瑛 教授 ), Wen-Chih Peng ( 彭文志 教授 )  Ph. D. Dissertation: A Study on Time Interval-based Sequential Patterns Mining  2
  • 3. Outline Current Research Temporal Pattern Mining Social Network Analysis Smart Home Application Cloud Computing 3
  • 4. Why Data Mining? Commercial Viewpoint  Lots of data is being collected  Web data, e-commerce  purchases at department  Bank/Credit Card transactions  Computers have become cheaper and more powerful  Competitive Pressure is Strong  Provide better, customized services for an edge (e.g. in Customer Relationship Management) 4
  • 5. Why Data Mining? Scientific Viewpoint  Data collected and stored at enormous speeds (GB/hour)  remote sensors on a satellite  telescopes scanning the skies  microarrays generating gene expression data  scientific simulations generating terabytes of data  Traditional techniques infeasible for raw data  Data mining may help scientists  in classifying and analyzing data  in Hypothesis Formation 5
  • 6. Data Mining  We are buried in data, but looking for knowledge  Data mining  Knowledge discovery in databases  Extraction of interesting knowledge (rules, regularities, patterns) from data in large databases 6
  • 8. Sequential Pattern Mining  Point-based sequential pattern mining  Customer analysis, network intrusion detection, finding tandem repeats in DNA sequence…  Simple relation between point time point-based beer milk diaper beer milk diaper Three relation (before, equal, after ) with min_sup = 2, 〈(ab)dc〉 is a frequent sequential pattern 8
  • 9. Interval Data Everywhere !! Interval data Data has duration time Clinical data, library data, appliance usage data Applications Diagnosis System, recommendation system, Smart home DB Diagnosis System Recommendation Smart Home 9
  • 10. Temporal Pattern Mining  Interval-based sequential pattern mining  Library reader analysis, patient disease analysis, stock fluctuation, ...  Complex relations time interval-based cough Chess pain fever Allen’s 13 temporal relations With min_sup = 4, is a frequent temporal pattern 10
  • 11. Allen Relationship  Allen’s 13 temporal logics describe relationship between any two events (binary relation) [ACM 1983] 11
  • 12. Real example  Some temporal patterns generated from NCTU library 12
  • 13. Motivation  Representation  Allen’s relations are binary relation  Express the relation more than 3 intervals    Efficient algorithms     Ambiguous problem Space usage Mining temporal pattern * Mining closed temporal pattern Incrementally maintain discovered temporal pattern and closed temporal pattern Related applications  Social network  Smart home 13
  • 14. Proposed Method  Coincidence representation  Segment intervals into disjoint slices  Nonambiguous and compact representation  Endpoint representation  Global information of a sequence  Nonambiguous and compact representation  TPMiner (Temporal Pattern Miner)  Pattern-growth approach Without candidate generation and test  Two components  RPrefixSpan  Pruning strategies  14
  • 15. Coincidence representation  Segment intervals into disjoint slices  Four kinds of event slice  Start slice (+), intermediate slice (*), finish slice (-) and intact slice ( )  Coincidence  Slices occurring simultaneously  Space usage (for a k-pattern)  Best: k, Worst: 2k space A event intervals coincidence B (A+) (A−B+) (B−) coincidence representation: C D E (C+) (C*D) (C−) (E) (A+) (A−B+) (B−) (C+) (C*D ) (C−) (E) 15
  • 16. Incision strategy  A data structure, endtime_list  Sort and merge (A, 1, 4)  Trace endtime_list one-by-one (B, 2, 5) (C, 2, 8) endtime_list symbol time type 1 4 2 5 2 8 3 5 5 7 s f s f s f s f s f endtime_list symbol time type sort merge A BC D A BD E E C 1 2 3 4 5 5 7 8 s s s f f s f f (E, 5, 7) coincidence representation: (A+) (B+ C+) (A− D+ ) (B− D− ) @ (E) (C− ) … A A B B C C D D E E (D, 3, 5) trace one- by- one 16
  • 17. Endpoint Representation time points of events A B C D A+ ( B+ C+ ) A− ( B− C− D+ ) D−  Sequence of ordered time points  +: start time, − : finish time  Nonambiguous  Space usage (for a k-pattern)  2k space 17
  • 19. TPMiner – RPrefixSpan (1/2)  Every item is disjoint  The relations among slices are simple  Before, equal and after (like time-point data)  RPrefixSpan  Borrow the idea of PrefixSpan  Scan local database to find frequent slices  Append and extend the pattern  Project database  Pruning strategy  Reduce search space  Pre-pruning and post-pruning 19
  • 20. TPMiner – RPrefixSpan (2/2) .. D |e1... D |e2 … scan database frequent items: e1, e2, ..., ei, ..., en …  D |ei D |en transform sequences  and project database .. .. D |ei... .. .. collect all mining  patterns Frequent temporal patterns .. D |en... .. .. .. D |e2... … D … D |e1 .. .. recursively project database and  append & extend pattern 20
  • 21. Pruning Strategy – Pre-pruning  Utilize the concept of slice and coincidence  Start slices and finish slices occur in pairs  Only require projecting the frequent finish slices which have the corresponding start slices in their prefixes Non-qualified pattern 〈 A+ A− 〉 D|〈A+ A−〉 〈A B 〉 + … 〈 A+ 〉 D| 〈A+〉 scan database frequent local slice : A− , B+, B− , C + D|〈A+ B+〉 Non-promising projection can be pre-pruning ! 〈 A+ B− 〉 D|〈A+ B−〉 〈 A+ C 〉 D|〈A+ C〉 21
  • 22. Pruning Strategy – Post-pruning  Utilize the concept of slice and coincidence  Start slice always appear before finish slice  Only collect the significant postfixes  With respect to a prefix α, all finish slices in postfix have corresponding start slices in α … ... S1: 〈(B + )(D + )(E)(D - )(B - )〉 S2: 〈(B + )(B - D + )(E)(D - )〉 S3: 〈(B)(A)(D + )(E)(D - )〉 〈 E〉 ... A coincidence database D Projected database can be post-pruning D |〈 E 〉 S1: 〈(D - )(B - )〉 S2: 〈(D - )〉 S3: 〈(D - )〉 … Insignificant sequences 22
  • 23. Experimental Results (1/2) D200k – C40 – N10k 70000 IEMiner TPMiner-CR TPMiner-ER 40000 30000 20000 10000 3500 3000 2500 2000 1500 1000 500 0 0 1 0.9 0.8 0.7 0.6 0.5 minimum support (%) (a) The performance of six algorithms 1 0.9 0.8 0.7 0.6 0.5 minimum support (%) (b) The number of temporal patterns N10k – C20 – N10k 2500 H-DFS ARMADA 2000 memory usage (MB) execution time (sec) 50000 number of generated patterns H-DFS ARMADA TPrefixSpan 60000 D200k – C40 – N10k 4000 TPrefixSpan IEMiner 1500 TPMiner-CR TPMiner-ER 1000 500 0 1 0.9 0.8 0.7 0.6 0.5 minimum support (%) 23
  • 24. Experimental Results (2/2) 7000 6000 TPMiner-CR 5000 TPMiner-CR without pre-pruning strategy TPMiner-CR 5000 execution time (sec) execution time (sec) 6000 4000 3000 2000 TPMiner-CR without post-pruning strategy 4000 3000 2000 1000 1000 0 0 1 0.9 0.8 0.7 0.6 1 0.5 minimum support (%) 0.8 0.7 0.6 0.5 minimum support (%) (a) The performance test of influence on pre-pruning strategies (b) The performance test of influence on post-pruning strategies 6000 8000 4000 7000 TPMiner-CR TPMiner-CR without subset-pruning strategy 6000 TPMiner-CR without any pruning strategy execution time (sec) TPMiner-CR 5000 execution time (sec) 0.9 3000 2000 1000 5000 4000 3000 2000 1000 0 0 1 0.9 0.8 0.7 0.6 0.5 minimum support (%) (c) The performance test of influence on subset-pruning strategies 1 0.9 0.8 0.7 0.6 0.5 minimum support (%) (b) The performance test of influence on all proposed pruning strategies 24
  • 26. Smart Home Application Home Home Server light Air Conditioner Cloud Database Current Behavior (1) Sensor data log P1: P2: P3: (3) Behavior Detection Alarm D-Link controler light ID 2 Air Conditioner … Usage Patterns (2) Pattern Mining ID 2 Usage Pattern Light ID 3 ID 4 on off on off on off on off light Air Conditioner Current Behavior (4) Abnormal Detection Remote Control (5) System Alarm & Remote Control 26
  • 27. Dynamic Social Network (1/2) Dynamic social network  A sequence of interaction graph …  Nodes and edges vary with time A lossless transformation  Graph sequence  interval sequence t1 SID A A A E B E B B D C D C G2 start time finish time B D E 1 2 4 3 4 6 A C 1 1 3 3 B D E 1 2 4 3 4 6 A C 1 1 3 3 E B D C G1 E event symbol A A D …. B C G3 G4 C D t2 t3 event sequence B E D A C B E D A C 27
  • 28. Dynamic Social Network (2/2) Reduce the complexity of graph Avoid isomorphism testing Dynamic Social Network Analysis Pattern mining Classification Recommending system Network sampling Clustering 28
  • 30. Social Network Analysis  A graph representation  Nodes and edges 30
  • 32. Advertisement Budget  According to , advertisement spending on worldwide social networking sites  2008, $23.3 millions  2010, $23.6 billions  2011, almost $25.5 billions Advertisement spending 32
  • 33. Influence Maximization  Word-of-mouth effect in social network  Influence maximization problem  Select initial users (seeds) so that the number of users that adopt the product or innovation is maximized social network social network Seeds select 33
  • 34. Motivation Characteristic of social network  Community structure 10 5 4 6 1 9 7 2 11 3 10 5 12 4 8 6 1 9 7 2 11 3 12 8 Community and degree heuristic (CDH)  Utilize community information  Avoid influence overlapping 34
  • 35. Proposed Algorithm – CDH  Framework of CDH 35
  • 36. CDH – Adjust Step  Adjust selected fundamental nodes  Seeds selected from large community may activate more inactive nodes than small community  Replace the fundamental node in small community  If we can activate more inactive nodes  Finally, output the result as selected seed nodes second largest degree node in C1 replace!! largest degree node in Ck C1 C2 C3 …… Ck delete!! 36
  • 37. Experimental Results - Facebook 37
  • 39. Recommendation System  predict the ratings or preferences  using a model build from the characteristics (a) amazon.com (b) youtube.com 39
  • 40. Collaborative Filtering (CF) Calculate the similarity between the active user and the other users • Person’s correlation, cosine similarity, conditional probability, etc. 2. Predict the rating of items that have not been rated by the active user 3. Output the top-k items by the predicting results 1. 40
  • 41. item i1 i2 i3 4 1 i4 4 user A B 2 C 3 Avg. Of user 3 4 3 3 2 2 (1 − 3)(2 − 3) wa , b = normalize (1 − 3)(3 − 2) + (4 − 3)(3 − 2) wa , c = normalize (4 − 3) * wa , b + (2 − 2) * wa , c pa , i = 3 + normalize 4 41
  • 42. Motivation  Dynamic! Dynamic! Dynamic!  Why we need dynamic  All things vary with time  Dynamic Collaborative Filtering  consider the time influence in the calculation.  Without considering about the time  the results of prediction might be out of date. 42
  • 43. Dynamic Similarity based on Collaborative Filtering (DSCF) ( user->item : rating (time) ) 1 -> 1193 :5 (2012.5.18) 5 -> 661 :3 (2012.3.5) 3 -> 914 :3 (2012.6.27) 1 -> 3408 :4 (2012.3.18) …… ( user->item : rating (time) ) 9 -> 6610 : 5 (2012.7.8) 2 -> 6610 : 3 (2012.7.15) … …. Msimt0−1 DBt0−1 ………. ….. .. *α Msimt0 ………. …. .. Msimtt −1 DBtt −1 ………. …. .. Msimt0 = α ⋅ Msimt0−1 + (1 − α ) ⋅ Msimtt −1 *(1-α) *(1-α) 43
  • 44. Advanced DSCF  α (similarity decay value, SDV) might not be consistent for all time.  each user might have his/her own SDV in different time points.  feedback predicted values from actual values 44
  • 45. ? Active user Predict pa ,i = ra ∑ + k j =1 (rj ,i − rj ) ⋅ [α ⋅ sim′ , j + (1 − α ) sim′′, j ] a a ∑ k j =1 ′ [α ⋅ sim′ , j + (1 − α ) sima′, j ] a Recommend Aa,i Active user ∑ + k Feedback Aa ,i = ra j =1 ′ ′ ( r j ,i − rj ) ⋅[α ⋅ sima , j + (1 −α ) sima′, j ] ∑ k j =1 ′ [α ⋅ sima , j + (1 −α ) sim′′, j ] a 45
  • 47. 47

Editor's Notes

  1. Here is Allen’s 13 temporal relation. That is …… For two intervals, we can describe their relationship by their start times and finish times. For example, interval A meets interval B, if A’ finish time is equal to B’s start time.
  2. By our observation, there are three important issues for mining closed temporal patterns. Complex relationship. Since an interval has duration time, the relation between any two intervals is complicated, Representation. Since all Allen’s temporal relations are binary relations. That means it can describe the relation between any two intervals easily. However, when we want to express the relation more than 3 intervals, here comes the problem. How can we express a pattern non-ambiguously? And what is the space usage to describe a pattern correctly? The last issue is how to design an efficient algorithm? Could we avoid candidate generation? And how can we reduce the number of scanning database? In this paper, we propose a non-ambiguous and compact representation to express a temporal pattern. And also propose an efficient method that can simplify the process of complex relation, And avoid candidate generation.
  3. By our observation, there are three important issues for mining closed temporal patterns. Complex relationship. Since an interval has duration time, the relation between any two intervals is complicated, Representation. Since all Allen’s temporal relations are binary relations. That means it can describe the relation between any two intervals easily. However, when we want to express the relation more than 3 intervals, here comes the problem. How can we express a pattern non-ambiguously? And what is the space usage to describe a pattern correctly? The last issue is how to design an efficient algorithm? Could we avoid candidate generation? And how can we reduce the number of scanning database? In this paper, we propose a non-ambiguous and compact representation to express a temporal pattern. And also propose an efficient method that can simplify the process of complex relation, And avoid candidate generation.
  4. We propose a new representation named coincidence representation. We use the global information of a sequence to segment intervals into disjoint slices It’s a nonambiguous and compact representation. Based on coincidence representation, we propose a CTMiner algorithm. It’s a pattern-growth approach. And do not need candidate generation and test. CTMiner can be decomposed into two components, Incision strategy and CprefixSpan. Incision strategy is used to transform sequence into coincidence representation. CprefixSpan is used to mine all frequent temporal patterns.
  5. Coincidence representation segments intervals into disjoint slices, according to the arrangement of all end time points in the sequence. There are four kinds of event slices: Start slice, like this is the start slice of A, start slice of B, Start slice of C. Intermediate slice, like this is the intermediate slice of C. Finish slice, like this is the finish slice of A, finish slice of B, finish slice of C. Intact slice, means do not be cut. Like, intact slice of D, intact slice of E. Coincidence is the group of event slices which occurs simultaneously. So these are coincidences. Concatenate all coincidences together can form the coincidence representation of a sequence.
  6. CTMiner has two main components Incision strategy and CprefixSpan. Incision strategy is used to transform sequence into coincidence representation. We first put all end time points of all intervals into endtime_list. Then sort by time in increasing order. And merge two symbols if time and type are the same. Like B’s start time and C’s start time. Then trace endtime_list one-by-one can transform sequence into coincidence representation. Like this example.
  7. Temporal representation It use the sequence of order time point to express a temporal pattern. plus represent the start time and minus represent the finish time. Like this example, pattern ABCD is expressed as this sequence. The start time of A is smaller than the start time of B, so here is A smaller then B. The start time of B is equal to the start time of C, so here is B equal C. Temporal representation has no ambiguous problem. It uses 4k  1 space to describe a k-pattern. ( 2k for event index, and 2k minus 1 for relation describer) TprefixSpan adopts this representation
  8. Mining closed pattern actually require a complicated process. It usually need a lot of closure checking. That is checking whether the mining pattern is a sub-pattern of existed pattern, Or the previous mining pattern is a sub-pattern of current mining pattern. We propose an endpoint representation. Which can transform quickly from interval data. Based on this simple representation, we can do the closure checking easily. It also is a nonambiguous and compact representation. So we can transform the example database to a endpoint database.
  9. Every coincidence in the coincidence sequence is disjoint. So the relations among slices are simple, just before, equal and after. We borrow the idea of PrefixSpan An efficient algorithm for time-point data. CprefixSpan has three step: Scan local database to find frequent slices Append and extend the pattern Project database By utilizing the concept of slice and coincidence, We propose two pruning strategies to reduce search space. Pre-pruning and post-pruning.
  10. This is the overview of CprefixSpan. We first scan database to get all frequent intervals. And transform sequences and project database by these frequent intervals. Then project database and append & extend pattern recursively. Finally, collect all mining results can get all frequent temporal patterns.
  11. Pre-pruning strategy is based on the concept of slice and coincidence. Since the start slices and finish slices always occur in pairs, We only require projecting the frequent finish slices which have the corresponding start slices in their prefixes. Like this example, for the projected database of A plus, After scanning database, suppose we have four frequent slices, A, B+, B and C. Then we append and extend pattern and build four projected database. However, B minus do not have corresponding B plus in prefix pattern. So by pre-pruning strategy, we can avoid a lot of non-promising projections.
  12. Post-pruning strategy is based on the concept of slice and coincidence. Since the start slices and finish slices always occur in pairs, And Start slice always appear before finish slice. So when we building a projected database, We only collect the significant postfixes. significant postfixes means with respect to a prefix , all finish slices in postfix have corresponding start slices in . For example, given a coincidence database, When we building the projected database with E, All three sequences are insignificant. So, the projected database with E can be post-pruning.
  13. Mining closed pattern actually require a complicated process. It usually need a lot of closure checking. That is checking whether the mining pattern is a sub-pattern of existed pattern, Or the previous mining pattern is a sub-pattern of current mining pattern. We propose an endpoint representation. Which can transform quickly from interval data. Based on this simple representation, we can do the closure checking easily. It also is a nonambiguous and compact representation. So we can transform the example database to a endpoint database.
  14. Mining closed pattern actually require a complicated process. It usually need a lot of closure checking. That is checking whether the mining pattern is a sub-pattern of existed pattern, Or the previous mining pattern is a sub-pattern of current mining pattern. We propose an endpoint representation. Which can transform quickly from interval data. Based on this simple representation, we can do the closure checking easily. It also is a nonambiguous and compact representation. So we can transform the example database to a endpoint database.
  15. Based on endpoint representation, we propose a CEMiner algorithm. It’s a pattern-growth approach. And do not need candidate generation and test. CEMiner can be decomposed into two components, Closure checking and Pruning Strategy.
  16. Sequential pattern mining is an important and basic research in data mining domain because it is useful in many applications. Such as, customer’s analysis, network intrusion detection, and finding tandem repeats in DNA sequences, to name a few. Like this example. We can find some behaviors and common patterns from customer’s buying record, like: buy milk than buy (beer & diaper) together. However, in some real world applications, an event is usually a time interval instead of a time point. It usually has a duration time. Like, library reader analysis, patient disease analysis, stock fluctuation, to name a few. Like this example. We can find the correlation between diseases from patient’s record. The relation between time points is simple, just before, equal and after. But the relation between time intervals is quite different. It is more complicated. We usually use Allen’s 13 temporal relation to describe the relationship between any two time intervals .
  17. According to eMarketer, we can know that advertisement spending on social networking grow rapidly every year. Look at this figure, In 2008, the total expense was about $23.3 billions. In 2010, it was about $23.6 billions. But in 2011, it almost reached $26 billions. So, obviously, the research and related technology on social network marketing is very very important
  18. According to eMarketer, we can know that advertisement spending on social networking grow rapidly every year. Look at this figure, In 2008, the total expense was about $23.3 billions. In 2010, it was about $23.6 billions. But in 2011, it almost reached $26 billions. So, obviously, the research and related technology on social network marketing is very very important
  19. This is the framework of CDH algorithm Given a social network represented as a graph, we first detect community. Then use community information to construct the potential pool and find the fundamental nodes from the pool. Finally, adjust the fundamental nodes to select the seed nodes.