The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Mining Temporal Pattern
and Related Applications
Yi-Cheng Chen 陳以錚
1

Curriculum Vitae
 Basic Information
 Birthday – Aug. 31, 1978

 Education
 Depart. of CSE, YZU (B. S. 2000)
 Depart. of CS, NTUST (M. S. 2002)
 Depart. of CSIE, NCTU (Ph. D. 2012)

Advisor: Prof. Suh-Yin Lee ( 李素瑛教授 ), Wen-Chih Peng ( 彭文志
教授 )
 Ph. D. Dissertation:
A Study on Time Interval-based Sequential Patterns Mining


2

Outline
Current Research
Temporal Pattern Mining
Social Network Analysis
Smart Home Application
Cloud Computing

3

Why Data Mining?
Commercial Viewpoint
 Lots of data is being collected
 Web data, e-commerce
 purchases at department
 Bank/Credit Card

transactions

 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer

Relationship Management)

4

Why Data Mining?
Scientific Viewpoint
 Data collected and stored at

enormous speeds (GB/hour)
 remote sensors on a satellite
 telescopes scanning the skies
 microarrays generating gene

expression data

 scientific simulations

generating terabytes of data

 Traditional techniques infeasible for raw data
 Data mining may help scientists
 in classifying and analyzing data
 in Hypothesis Formation

5

Data Mining
 We are buried in data, but looking for knowledge
 Data mining
 Knowledge discovery in databases
 Extraction of interesting knowledge (rules,

regularities, patterns) from data in large databases

6

Sequential Pattern Mining
 Point-based sequential pattern mining
 Customer analysis, network intrusion detection, finding tandem

repeats in DNA sequence…
 Simple relation between point

time point-based
beer
milk
diaper

beer

milk
diaper

Three relation
(before, equal, after )

with min_sup = 2, 〈(ab)dc〉 is
a frequent sequential pattern
8

Interval Data Everywhere !!
Interval data
Data has duration time
Clinical data, library data, appliance usage data

Applications
Diagnosis System, recommendation system, Smart

home

DB

Diagnosis System

Recommendation

Smart Home

9

Temporal Pattern Mining
 Interval-based sequential pattern mining
 Library reader analysis, patient disease analysis, stock

fluctuation, ...
 Complex relations

time interval-based
cough
Chess pain
fever

Allen’s 13 temporal
relations

With min_sup = 4,
is a frequent temporal pattern
10

Allen Relationship
 Allen’s 13 temporal logics describe relationship between any two
events (binary relation) [ACM 1983]

11

Real example
 Some temporal patterns generated from NCTU library

12

Motivation


Representation
 Allen’s relations are binary relation
 Express the relation more than 3 intervals





Efficient algorithms






Ambiguous problem
Space usage

Mining temporal pattern *
Mining closed temporal pattern
Incrementally maintain discovered temporal pattern
and closed temporal pattern

Related applications
 Social network
 Smart home

13

Proposed Method
 Coincidence representation
 Segment intervals into disjoint slices
 Nonambiguous and compact representation

 Endpoint representation
 Global information of a sequence
 Nonambiguous and compact representation
 TPMiner (Temporal Pattern Miner)
 Pattern-growth approach
Without candidate generation and test
 Two components
 RPrefixSpan
 Pruning strategies


14

Coincidence representation
 Segment intervals into disjoint slices
 Four kinds of event slice
 Start slice (+), intermediate slice (*), finish slice (-) and intact

slice ( )

 Coincidence
 Slices occurring simultaneously

 Space usage (for a k-pattern)
 Best: k, Worst: 2k space
A

event intervals
coincidence

B
(A+) (A−B+) (B−)

coincidence representation:

C
D

E

(C+) (C*D) (C−)

(E)

(A+) (A−B+) (B−) (C+) (C*D ) (C−) (E)

15

Incision strategy
 A data structure, endtime_list
 Sort and merge

(A, 1, 4)

 Trace endtime_list one-by-one

(B, 2, 5)
(C, 2, 8)

endtime_list
symbol time type
1
4
2
5
2
8
3
5
5
7

s
f
s
f
s
f
s
f
s
f

endtime_list
symbol time type

sort
merge

A
BC
D
A
BD
E
E
C

1
2
3
4
5
5
7
8

s
s
s
f
f
s
f
f

(E, 5, 7)

coincidence representation:
(A+) (B+ C+) (A− D+ ) (B− D− ) @ (E) (C− )

…

A
A
B
B
C
C
D
D
E
E

(D, 3, 5)

trace one- by- one

16

Endpoint Representation
time points of events

A
B
C
D

A+ ( B+ C+ ) A− ( B− C− D+ ) D−

 Sequence of ordered time points
 +: start time, − : finish time

 Nonambiguous
 Space usage (for a k-pattern)
 2k space
17

TPMiner – RPrefixSpan (1/2)
 Every item is disjoint
 The relations among slices are simple
 Before, equal and after (like time-point data)

 RPrefixSpan
 Borrow the idea of PrefixSpan

 Scan local database to find frequent slices
 Append and extend the pattern
 Project database

 Pruning strategy
 Reduce search space
 Pre-pruning and post-pruning
19

TPMiner – RPrefixSpan (2/2)
..

D |e1...

D |e2
…

scan
database
frequent items:
e1, e2, ..., ei, ..., en

…



D |ei

D |en

transform sequences
 and project database

..

..

D |ei...

..

..

collect all mining

patterns
Frequent
temporal patterns

..

D |en...

..

..
..

D |e2...

…

D

…

D |e1

..

..

recursively project database and

append & extend pattern
20

Pruning Strategy – Pre-pruning
 Utilize the concept of slice and coincidence
 Start

slices and finish slices occur in pairs
 Only require projecting the frequent finish slices which
have the corresponding start slices in their prefixes
Non-qualified pattern

〈 A+ A− 〉
D|〈A+ A−〉
〈A B 〉
+

…

〈 A+ 〉

D| 〈A+〉

scan
database
frequent local slice :
A− , B+, B− , C

+

D|〈A+ B+〉

Non-promising projection
can be pre-pruning !

〈 A+ B− 〉
D|〈A+ B−〉
〈 A+ C 〉
D|〈A+ C〉
21

Pruning Strategy – Post-pruning
 Utilize the concept of slice and coincidence
 Start

slice always appear before finish slice
 Only collect the significant postfixes
 With respect to a prefix α, all finish slices in postfix
have corresponding start slices in α
…
...

S1: 〈(B ＋ )(D ＋ )(E)(D － )(B
－
)〉
S2: 〈(B ＋ )(B － D ＋ )(E)(D － )〉
S3: 〈(B)(A)(D ＋ )(E)(D － )〉

〈 E〉

...

A coincidence database D

Projected database
can be post-pruning

D |〈 E 〉
S1: 〈(D － )(B
－
)〉
S2: 〈(D － )〉
S3: 〈(D － )〉

…

Insignificant sequences
22

Experimental Results (1/2)
D200k – C40 – N10k

70000

IEMiner
TPMiner-CR
TPMiner-ER

40000
30000
20000
10000

3500
3000
2500
2000
1500
1000
500
0

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(a) The performance of six algorithms

1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(b) The number of temporal patterns
N10k – C20 – N10k

2500
H-DFS
ARMADA

2000

memory usage (MB)

execution time (sec)

50000

number of generated patterns

H-DFS
ARMADA
TPrefixSpan

60000

D200k – C40 – N10k

4000

TPrefixSpan
IEMiner

1500

TPMiner-CR
TPMiner-ER

1000

500

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

23

Experimental Results (2/2)
7000

6000
TPMiner-CR

5000

TPMiner-CR without
pre-pruning strategy

TPMiner-CR

5000



6000

4000
3000
2000

TPMiner-CR without
post-pruning strategy

4000
3000
2000
1000

1000

0

0
1

0.9

0.8

0.7

0.6

1

0.5

minimum support (%)

0.8

0.7

0.6

0.5

minimum support (%)

(a) The performance test of influence
on pre-pruning strategies

(b) The performance test of influence
on post-pruning strategies

6000

8000

4000

7000

TPMiner-CR

TPMiner-CR without
subset-pruning strategy

6000

TPMiner-CR without
any pruning strategy


TPMiner-CR

5000


0.9

3000
2000
1000

5000
4000
3000
2000
1000
0

0
1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(c) The performance test of influence
on subset-pruning strategies

1

0.9

0.8

0.7

0.6

0.5

minimum support (%)

(b) The performance test of influence
on all proposed pruning strategies

24

Smart Home Application
Home
Home Server

light
Air Conditioner

Cloud
Database

Current Behavior

(1) Sensor data log

P1:
P2:
P3:

(3) Behavior Detection

Alarm
D-Link controler

light

ID 2

Air Conditioner

…
Usage
Patterns

(2) Pattern Mining

ID 2

Usage Pattern

Light
ID 3
ID 4

on

off

on

off

on

off

on

off

light
Air Conditioner

Current Behavior

(4) Abnormal Detection

Remote Control

(5) System Alarm &
Remote Control
26

Dynamic Social Network (1/2)
Dynamic social network
 A sequence of interaction graph
…

 Nodes and edges vary with time

A lossless transformation
 Graph sequence  interval sequence

t1

SID

A

A

A

E
B

E
B

B

D
C

D
C

G2

start
time

finish
time

B
D
E

1
2
4

3
4
6

A
C

1
1

3
3

B
D
E

1
2
4

3
4
6

A
C

1
1

3
3

E
B

D
C

G1

E

event
symbol

A

A

D

….

B

C

G3

G4

C
D

t2

t3

event sequence
B

E
D
A
C

B

E
D
A
C

27

Dynamic Social Network (2/2)
Reduce the complexity of graph
Avoid

isomorphism testing

Dynamic Social Network Analysis
Pattern

mining
Classification
Recommending system
Network sampling
Clustering

28

Social Network Analysis
 A graph representation
 Nodes and edges

30

Advertisement Budget
 According to

, advertisement spending on
worldwide social networking sites
 2008, $23.3 millions
 2010, $23.6 billions
 2011, almost $25.5 billions

Advertisement spending

32

Influence Maximization
 Word-of-mouth effect in social network
 Influence maximization problem
 Select initial users (seeds) so that the number of users
that adopt the product or innovation is maximized
social network

social network

Seeds select

33

Motivation

Characteristic of social network
 Community structure
10

5
4

6
1

9

7

2

11

3

10

5
12

4

8

6
1

9

7

2

11

3
12

8

Community and degree heuristic (CDH)
 Utilize community information
 Avoid influence overlapping

34

Proposed Algorithm – CDH
 Framework of CDH

35

CDH – Adjust Step
 Adjust selected fundamental nodes
 Seeds selected from large community may activate more
inactive nodes than small community
 Replace the fundamental node in small community


If we can activate more inactive nodes

 Finally, output the result as selected seed nodes
second largest
degree node
in C1
replace!!

largest degree
node in Ck
C1

C2

C3

……

Ck

delete!!

36

Experimental Results - Facebook

37

Recommendation System
 predict the ratings or preferences
 using a model build from the characteristics

(a) amazon.com

(b) youtube.com
39

Collaborative Filtering (CF)
Calculate the similarity between the active user and the
other users
•
Person’s correlation, cosine similarity, conditional
probability, etc.
2. Predict the rating of items that have not been rated by the
active user
3. Output the top-k items by the predicting results
1.

40

item

i1

i2

i3

4

1

i4

4

user

A

B

2

C

3

Avg.
Of
user

3

4

3

3

2

2

(1 − 3)(2 − 3)
wa , b =
normalize
(1 − 3)(3 − 2) + (4 − 3)(3 − 2)
wa , c =
normalize
(4 − 3) * wa , b + (2 − 2) * wa , c
pa , i = 3 +
normalize
4

41

Motivation
 Dynamic! Dynamic! Dynamic!
 Why we need dynamic
 All things vary with time

 Dynamic Collaborative Filtering
 consider the time influence in the calculation.

 Without considering about the time
 the results of prediction might be out of date.

42

Dynamic Similarity based on
Collaborative Filtering (DSCF)
( user->item : rating (time) )
1 -> 1193 :5 (2012.5.18)
5 -> 661 :3 (2012.3.5)
3 -> 914 :3 (2012.6.27)
1 -> 3408 :4 (2012.3.18)
……

( user->item : rating (time) )
9 -> 6610 : 5 (2012.7.8)
2 -> 6610 : 3 (2012.7.15)
… ….

Msimt0−1
DBt0−1

……….
…..
..

*α

Msimt0
……….
….
..

Msimtt −1
DBtt −1

……….
….
..

Msimt0 = α ⋅ Msimt0−1 + (1 − α ) ⋅ Msimtt −1

*(1-α)
*(1-α)

43

Advanced DSCF
 α (similarity decay value, SDV) might not be

consistent for all time.
 each user might have his/her own SDV in different
time points.
 feedback predicted values from actual values

44

?

Active
user

Predict

pa ,i = ra

∑
+

k
j =1

(rj ,i − rj ) ⋅ [α ⋅ sim′ , j + (1 − α ) sim′′, j ]
a
a

∑

k
j =1

′
[α ⋅ sim′ , j + (1 − α ) sima′, j ]
a

Recommend

Aa,i

Active
user

∑
+

k

Feedback

Aa ,i = ra

j =1

′
′
( r j ,i − rj ) ⋅[α ⋅ sima , j + (1 −α ) sima′, j ]

∑

k
j =1

′
[α ⋅ sima , j + (1 −α ) sim′′, j ]
a
45

The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Similar to The study on mining temporal patterns and related applications in dynamic social network(20120928晚) (20)

Recently uploaded

Recently uploaded (20)

The study on mining temporal patterns and related applications in dynamic social network(20120928晚)

Editor's Notes