SlideShare a Scribd company logo
1 of 42
Download to read offline
TENSOR DECOMPOSITION WITH PYTHON
LEARNING STRUCTURES FROM MULTIDIMENSIONAL DATA
ANDRÉ PANISSON
@apanisson
ISI Foundation, Torino & New York City
WHAT IS DATA DECOMPOSITION?
DECOMPOSITION == FACTORIZATION
Representation a dataset as a sum of (interpretable) parts
▸ Represent data as the combination of many components / factors
▸ Dimensionality reduction: each new dimension

represents a latent variable:
▸ text corpus => topics
▸ shopping behaviour => segments (user segmentation)
▸ social network => groups, communities
▸ psychology surveys => personality traits
▸ electronic medical records => health conditions
▸ chemical solutions => chemical ingredients
X
W
H
DATA DECOMPOSITION
▸ Decomposition of data represented in two dimensions:

MATRIX FACTORIZATION
▸ text: documents X terms
▸ surveys: subjects X questions
▸ electronic medical records: patients X diagnosis/drugs
▸ Decomposition of data represented in more dimensions:

TENSOR FACTORIZATION
▸ social networks: user X user (adjacency matrix) X time
▸ text: authors X terms X time
▸ spectroscopy:

solution sample X wavelength (emission) X wavelength (excitation)
WHY TENSOR FACTORIZATION + PYTHON?
▸ Matrix Factorization is already used in many fields
▸ Tensor Factorization is becoming very popular

for multiway data analysis
▸ TF is very useful to explore time-varying network data
▸ But still, the most used tool is Matlab
▸ There’s room for improvement in 

the Python libraries for TF
MATRIX DECOMPOSITION
FACTOR ANALYSIS
Spearman ~1900
X≈WH
Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects
Spearman, 1927: The abilities of man.
≈
tests
subjects subjects
tests
Int.
Int.
X W
H
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84.
. , ,
. , ,
. . .
gene
dna
genetic
life
evolve
organism
brai n
neuron
nerve
data
number
computer
. , ,
Topics Documents
Topic proportions and
assignments
0.04
0.02
0.01
0.04
0.02
0.01
0.02
0.01
0.01
0.02
0.02
0.01
data
number
computer
. , ,
0.02
0.02
0.01
TOPIC MODELING / LATENT SEMANTIC ANALYSIS
X≈WH
Non-negative Matrix Factorization (NMF):
(~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung)
2005 Gaussier et al. "Relation between PLSA and NMF and implications."
arg min
W,H
kX WHk s. t. W, H 0
≈
documents
terms terms
documents
topic
topic
Sparse

Matrix! W
H
NON-NEGATIVE MATRIX FACTORIZATION (NMF)
NMF gives Part based representation

(Lee & Seung – Nature 1999)
NMF
=×
Original
PCA
×
=
NMF is similar to Spectral Clustering

(Ding et al. - SDM 2005)
arg min
W,H
kX WHk s. t. W, H 0
W W •
XHT
WHHT
H H •
WT
X
WTWH
NMF brings interpretation!
from sklearn import datasets, decomposition, utils
digits = datasets.fetch_mldata('MNIST original')
A = utils.shuffle(digits.data)
nmf = decomposition.NMF(n_components=20)
W = nmf.fit_transform(A)
H = nmf.components_
plt.rc("image", cmap="binary")
plt.figure(figsize=(8,4))
for i in range(20):
plt.subplot(2,5,i+1)
plt.imshow(H[i].reshape(28,28))
plt.xticks(())
plt.yticks(())
plt.tight_layout()
TENSORS AND TENSOR DECOMPOSITION
BEYOND MATRICES: HIGH DIMENSIONAL DATASETS
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Environmental analysis
▸ Measurement as a function of (Location, Time, Variable)
Sensory analysis
▸ Score as a function of (Wine sample, Judge, Attribute)
Process analysis
▸ Measurement as a function of (Batch, Variable, time)
Spectroscopy
▸ Intensity as a function of (Wavelength, Retention, Sample, Time,
Location, …)
…
MULTIWAY DATA ANALYSIS
DIGITAL TRACES FROM SENSORS AND IOT
USER
POSITION
TIME
…
TENSORS
WHAT IS A TENSOR?
A tensor is a multidimensional array

E.g., three-way tensor:
Mode-1
Mode-2
Mode-3
651a
FIBERS AND SLICES
Cichocki et al. Nonnegative Matrix and Tensor Factorizations
Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers
Horizontal Slices Lateral Slices Frontal Slices
A[:, 4, 1] A[1, :, 4] A[1, 3, :]
A[1, :, :] A[:, :, 1]A[:, 1, :]
TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION
Matricization: convert a tensor to a matrix
Vectorization: convert a tensor to a vector
>>> T = np.arange(0, 24).reshape((3, 4, 2))
>>> T
array([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]],
[[16, 17],
[18, 19],
[20, 21],
[22, 23]]])
OK for dense tensors: use a combination 

of transpose() and reshape()
Not simple for sparse datasets (e.g.: <authors, terms, time>)
for j in range(T.shape[1]):
for i in range(T.shape[2]):
print T[:, i, j]
[ 0 8 16]
[ 2 10 18]
[ 4 12 20]
[ 6 14 22]
[ 1 9 17]
[ 3 11 19]
[ 5 13 21]
[ 7 15 23]
# supposing the existence of unfold
>>> T.unfold(0)
array([[ 0, 2, 4, 6, 1, 3, 5, 7],
[ 8, 10, 12, 14, 9, 11, 13, 15],
[16, 18, 20, 22, 17, 19, 21, 23]])
>>> T.unfold(1)
array([[ 0, 8, 16, 1, 9, 17],
[ 2, 10, 18, 3, 11, 19],
[ 4, 12, 20, 5, 13, 21],
[ 6, 14, 22, 7, 15, 23]])
>>> T.unfold(2)
array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22],
[ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
RANK-1 TENSOR
The outer product of N vectors results in a rank-1 tensor
array([[[ 1., 2.],
[ 2., 4.],
[ 3., 6.],
[ 4., 8.]],
[[ 2., 4.],
[ 4., 8.],
[ 6., 12.],
[ 8., 16.]],
[[ 3., 6.],
[ 6., 12.],
[ 9., 18.],
[ 12., 24.]]])
a = np.array([1, 2, 3])
b = np.array([1, 2, 3, 4])
c = np.array([1, 2])
T = np.zeros((a.shape[0], b.shape[0], c.shape[0]))
for i in range(a.shape[0]):
for j in range(b.shape[0]):
for k in range(c.shape[0]):
T[i, j, k] = a[i] * b[j] * c[k]
T = a(1)
· · · a(N)
=
a
c
b
Ti,j,k = a
(1)
i a
(2)
j a
(3)
k
TENSOR RANK
▸ Every tensor can be written as a sum of rank-1 tensors
=
a1 aJ
c1 cJ
b1 bJ
+ +
▸ Tensor rank: smallest number of rank-1 tensors 

that can generate it by summing up
X ⇡
RX
r=1
a(1)
r a(2)
r · · · a(N)
r ⌘ JA(1)
, A(2)
, · · · , A(N)
K
T ⇡
RX
r=1
ar br cr ⌘ JA, B, CK
array([[[ 61., 82.],
[ 74., 100.],
[ 87., 118.],
[ 100., 136.]],
[[ 77., 104.],
[ 94., 128.],
[ 111., 152.],
[ 128., 176.]],
[[ 93., 126.],
[ 114., 156.],
[ 135., 186.],
[ 156., 216.]]])
A = np.array([[1, 2, 3],
[4, 5, 6]]).T
B = np.array([[1, 2, 3, 4],
[5, 6, 7, 8]]).T
C = np.array([[1, 2],
[3, 4]]).T
T = np.zeros((A.shape[0], B.shape[0], C.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
for k in range(C.shape[0]):
for r in range(A.shape[1]):
T[i, j, k] += A[i, r] * B[j, r] * C[k, r]
T = np.einsum('ir,jr,kr->ijk', A, B, C)
: Kruskal Tensorbr cr ⌘ JA, B, CK
TENSOR FACTORIZATION
▸ CANDECOMP/PARAFAC factorization (CP)
▸ extensions of SVD / PCA / NMF to tensors
NON-NEGATIVE TENSOR FACTORIZATION
▸ Decompose a non-negative tensor to 

a sum of R non-negative rank-1 tensors
arg min
A,B,C
kT JA, B, CKk
with JA, B, CK ⌘
RX
r=1
ar br cr
subject to A 0, B 0, C 0
TENSOR FACTORIZATION: HOW TO
Alternating Least Squares(ALS):

Fix all but one factor matrix to which LS is applied
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
denotes the Khatri-Rao product, which is a
column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br]
T(1) = ˆA(ˆC ˆB)T
T(2) = ˆB(ˆC ˆA)T
T(3) = ˆC(ˆB ˆA)T
Unfolded Tensor

on the kth mode
F = [zeros(n, r), zeros(m, r), zeros(o, r)]
FF_init = np.rand((len(F), r, r))
def iter_solver(T, F, FF_init):
# Update each factor
for k in range(len(F)):
# Compute the inner-product matrix
FF = ones((r, r))
for i in range(k) + range(k+1, len(F)):
FF = FF * FF_init[i]
# unfolded tensor times Khatri-Rao product
XF = T.uttkrp(F, k)
F[k] = F[k]*XF/(F[k].dot(FF))
# F[k] = nnls(FF, XF.T).T
FF_init[k] = (F[k].T.dot(F[k]))
return F, FF_init
min
A 0
kT(1) A(C B)T
k
min
B 0
kT(2) B(C A)T
k
min
C 0
kT(3) C(B A)T
k
arg min
W,H
kX WHk s.
J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method.

In High-Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326.
W W •
XHT
WHHT
T(1)(C B)
HOW TO INTERPRET: USER X TERM X TIME
X is a 3-way tensor in which
xnmt is 1 if the term m was used by user n at interval t,
0 otherwise
ANxK
is the the association of each user n to a factor k
BMxK
is the association of each term m to a factor k
CTxK
shows the time activity of each factor
users
users
C
=
X
A
B
(N×M×T)
(T×K)
(N×K)
(M×K)
terms
tim
e
tim
e
terms
factors
http://www.datainterfaces.org/2013/06/twitter-topic-explorer/
TOOLS FOR TENSOR DECOMPOSITION
TOOLS FOR TENSOR FACTORIZATION
TOOLS: THE PYTHON WORLD
NumPy SciPy
Scikit-Tensor (under development):
github.com/mnick/scikit-tensor
NTF: gist.github.com/panisson/7719245
TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
direct proximity
sensing
primary
school
Lyon, France
primary school
231 students
10 teachers
TENSORS
0 1 0
1 0 1
0 1 0
FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
temporal network
tensorial
representation
tensor factorization
factors
communities temporal activity
factorization
quality
A,B C
tuning the complexity
of the model
nodes
communities
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
50
60
70
80
35
40
45
50
35
40
45
50
50
60
0
10
20
30
4040
0
5
10
15
20
25
30
50
60
70
80
35
40
45
50
40
45
50
50
60
0
10
20
30
4040
0
5
10
15
20
25
30
50
60
70
80
35
40
45
50
50 60
0
10
20
30
4040
0
5
10
15
20
25
30
structures in temporal networks
components
nodes
time
time interval
quality metrics
component
L. Gauvin et al., PLoS ONE 9(1), e86028 (2014)
1B
5A
3B
5B
2B
2A
3A
4A
1A
4B
TENSOR DECOMPOSITION OF SCHOOL NETWORK
https://github.com/panisson/ntf-school
Laetitia Gauvin Ciro Cattuto Anna Sapienza
.fit().predict()
( )
@apanisson
panisson@gmail.com
thank you

More Related Content

What's hot

tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
Kenta Oono
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
Motoya Wakiyama
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
Masahiro Suzuki
 
はじパタ8章 svm
はじパタ8章 svmはじパタ8章 svm
はじパタ8章 svm
tetsuro ito
 
はじパタ2章
はじパタ2章はじパタ2章
はじパタ2章
tetsuro ito
 

What's hot (20)

tensor-decomposition
tensor-decompositiontensor-decomposition
tensor-decomposition
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)はじめてのパターン認識 第5章 k最近傍法(k_nn法)
はじめてのパターン認識 第5章 k最近傍法(k_nn法)
 
[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎[DL輪読会]Deep Learning 第5章 機械学習の基礎
[DL輪読会]Deep Learning 第5章 機械学習の基礎
 
機械学習を民主化する取り組み
機械学習を民主化する取り組み機械学習を民主化する取り組み
機械学習を民主化する取り組み
 
統計モデリングで癌の5年生存率データから良い病院を探す
統計モデリングで癌の5年生存率データから良い病院を探す統計モデリングで癌の5年生存率データから良い病院を探す
統計モデリングで癌の5年生存率データから良い病院を探す
 
クラシックな機械学習の入門  8. クラスタリング
クラシックな機械学習の入門  8. クラスタリングクラシックな機械学習の入門  8. クラスタリング
クラシックな機械学習の入門  8. クラスタリング
 
A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
マルコフ連鎖モンテカルロ法 (2/3はベイズ推定の話)
マルコフ連鎖モンテカルロ法 (2/3はベイズ推定の話)マルコフ連鎖モンテカルロ法 (2/3はベイズ推定の話)
マルコフ連鎖モンテカルロ法 (2/3はベイズ推定の話)
 
はじパタ8章 svm
はじパタ8章 svmはじパタ8章 svm
はじパタ8章 svm
 
LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要LSTM (Long short-term memory) 概要
LSTM (Long short-term memory) 概要
 
【材料力学】ひずみエネルギー (II-04 2018)
【材料力学】ひずみエネルギー  (II-04 2018)【材料力学】ひずみエネルギー  (II-04 2018)
【材料力学】ひずみエネルギー (II-04 2018)
 
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料 「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
「内積が見えると統計学も見える」第5回 プログラマのための数学勉強会 発表資料
 
時系列予測にTransformerを使うのは有効か?
時系列予測にTransformerを使うのは有効か?時系列予測にTransformerを使うのは有効か?
時系列予測にTransformerを使うのは有効か?
 
はじパタ2章
はじパタ2章はじパタ2章
はじパタ2章
 
統計的学習の基礎6章前半 #カステラ本
統計的学習の基礎6章前半 #カステラ本統計的学習の基礎6章前半 #カステラ本
統計的学習の基礎6章前半 #カステラ本
 
第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知
 
相関係数は傾きに影響される
相関係数は傾きに影響される相関係数は傾きに影響される
相関係数は傾きに影響される
 
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析有向グラフに対する 非線形ラプラシアンと ネットワーク解析
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
 

Similar to TENSOR DECOMPOSITION WITH PYTHON

FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docxFINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
voversbyobersby
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Vyacheslav Arbuzov
 
Stability of Iteration for Some General Operators in b-Metric
Stability of Iteration for Some General Operators in b-MetricStability of Iteration for Some General Operators in b-Metric
Stability of Iteration for Some General Operators in b-Metric
Komal Goyal
 

Similar to TENSOR DECOMPOSITION WITH PYTHON (20)

AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
AGGREGATION OF OPINIONS FOR SYSTEM SELECTION USING APPROXIMATIONS OF FUZZY NU...
 
DSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptxDSP_DiscSignals_LinearS_150417.pptx
DSP_DiscSignals_LinearS_150417.pptx
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
pradeepbishtLecture13 div conq
pradeepbishtLecture13 div conqpradeepbishtLecture13 div conq
pradeepbishtLecture13 div conq
 
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...Incremental and Multi-feature Tensor Subspace Learning applied for Background...
Incremental and Multi-feature Tensor Subspace Learning applied for Background...
 
Week 4
Week 4Week 4
Week 4
 
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docxFINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
FINAL PROJECT, MATH 251, FALL 2015[The project is Due Mond.docx
 
LINEAR SYSTEMS
LINEAR SYSTEMSLINEAR SYSTEMS
LINEAR SYSTEMS
 
multiscale_tutorial.pdf
multiscale_tutorial.pdfmultiscale_tutorial.pdf
multiscale_tutorial.pdf
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Ijmet 10 01_046
Ijmet 10 01_046Ijmet 10 01_046
Ijmet 10 01_046
 
Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics Using R Tool for Probability and Statistics
Using R Tool for Probability and Statistics
 
Stability of Iteration for Some General Operators in b-Metric
Stability of Iteration for Some General Operators in b-MetricStability of Iteration for Some General Operators in b-Metric
Stability of Iteration for Some General Operators in b-Metric
 
A Note on Hessen berg of Trapezoidal Fuzzy Number Matrices
A Note on Hessen berg of Trapezoidal Fuzzy Number MatricesA Note on Hessen berg of Trapezoidal Fuzzy Number Matrices
A Note on Hessen berg of Trapezoidal Fuzzy Number Matrices
 

Recently uploaded

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 

Recently uploaded (20)

Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 

TENSOR DECOMPOSITION WITH PYTHON

  • 1. TENSOR DECOMPOSITION WITH PYTHON LEARNING STRUCTURES FROM MULTIDIMENSIONAL DATA ANDRÉ PANISSON @apanisson ISI Foundation, Torino & New York City
  • 2. WHAT IS DATA DECOMPOSITION? DECOMPOSITION == FACTORIZATION Representation a dataset as a sum of (interpretable) parts ▸ Represent data as the combination of many components / factors ▸ Dimensionality reduction: each new dimension
 represents a latent variable: ▸ text corpus => topics ▸ shopping behaviour => segments (user segmentation) ▸ social network => groups, communities ▸ psychology surveys => personality traits ▸ electronic medical records => health conditions ▸ chemical solutions => chemical ingredients
  • 4. DATA DECOMPOSITION ▸ Decomposition of data represented in two dimensions:
 MATRIX FACTORIZATION ▸ text: documents X terms ▸ surveys: subjects X questions ▸ electronic medical records: patients X diagnosis/drugs ▸ Decomposition of data represented in more dimensions:
 TENSOR FACTORIZATION ▸ social networks: user X user (adjacency matrix) X time ▸ text: authors X terms X time ▸ spectroscopy:
 solution sample X wavelength (emission) X wavelength (excitation)
  • 5. WHY TENSOR FACTORIZATION + PYTHON? ▸ Matrix Factorization is already used in many fields ▸ Tensor Factorization is becoming very popular
 for multiway data analysis ▸ TF is very useful to explore time-varying network data ▸ But still, the most used tool is Matlab ▸ There’s room for improvement in 
 the Python libraries for TF
  • 7. FACTOR ANALYSIS Spearman ~1900 X≈WH Xtests x subjects ≈ Wtests x intelligences Hintelligences x subjects Spearman, 1927: The abilities of man. ≈ tests subjects subjects tests Int. Int. X W H
  • 8. TOPIC MODELING / LATENT SEMANTIC ANALYSIS Blei, David M. "Probabilistic topic models." Communications of the ACM 55.4 (2012): 77-84. . , , . , , . . . gene dna genetic life evolve organism brai n neuron nerve data number computer . , , Topics Documents Topic proportions and assignments 0.04 0.02 0.01 0.04 0.02 0.01 0.02 0.01 0.01 0.02 0.02 0.01 data number computer . , , 0.02 0.02 0.01
  • 9. TOPIC MODELING / LATENT SEMANTIC ANALYSIS X≈WH Non-negative Matrix Factorization (NMF): (~1970 Lawson, ~1995 Paatero, ~2000 Lee & Seung) 2005 Gaussier et al. "Relation between PLSA and NMF and implications." arg min W,H kX WHk s. t. W, H 0 ≈ documents terms terms documents topic topic Sparse
 Matrix! W H
  • 10. NON-NEGATIVE MATRIX FACTORIZATION (NMF) NMF gives Part based representation
 (Lee & Seung – Nature 1999) NMF =× Original PCA × = NMF is similar to Spectral Clustering
 (Ding et al. - SDM 2005) arg min W,H kX WHk s. t. W, H 0 W W • XHT WHHT H H • WT X WTWH NMF brings interpretation!
  • 11. from sklearn import datasets, decomposition, utils digits = datasets.fetch_mldata('MNIST original') A = utils.shuffle(digits.data) nmf = decomposition.NMF(n_components=20) W = nmf.fit_transform(A) H = nmf.components_ plt.rc("image", cmap="binary") plt.figure(figsize=(8,4)) for i in range(20): plt.subplot(2,5,i+1) plt.imshow(H[i].reshape(28,28)) plt.xticks(()) plt.yticks(()) plt.tight_layout()
  • 12. TENSORS AND TENSOR DECOMPOSITION
  • 13. BEYOND MATRICES: HIGH DIMENSIONAL DATASETS Cichocki et al. Nonnegative Matrix and Tensor Factorizations Environmental analysis ▸ Measurement as a function of (Location, Time, Variable) Sensory analysis ▸ Score as a function of (Wine sample, Judge, Attribute) Process analysis ▸ Measurement as a function of (Batch, Variable, time) Spectroscopy ▸ Intensity as a function of (Wavelength, Retention, Sample, Time, Location, …) … MULTIWAY DATA ANALYSIS
  • 14. DIGITAL TRACES FROM SENSORS AND IOT USER POSITION TIME …
  • 16. WHAT IS A TENSOR? A tensor is a multidimensional array
 E.g., three-way tensor: Mode-1 Mode-2 Mode-3 651a
  • 17.
  • 18. FIBERS AND SLICES Cichocki et al. Nonnegative Matrix and Tensor Factorizations Column (Mode-1) Fibers Row (Mode-2) Fibers Tube (Mode-3) Fibers Horizontal Slices Lateral Slices Frontal Slices A[:, 4, 1] A[1, :, 4] A[1, 3, :] A[1, :, :] A[:, :, 1]A[:, 1, :]
  • 19. TENSOR UNFOLDINGS: MATRICIZATION AND VECTORIZATION Matricization: convert a tensor to a matrix Vectorization: convert a tensor to a vector
  • 20. >>> T = np.arange(0, 24).reshape((3, 4, 2)) >>> T array([[[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11], [12, 13], [14, 15]], [[16, 17], [18, 19], [20, 21], [22, 23]]]) OK for dense tensors: use a combination 
 of transpose() and reshape() Not simple for sparse datasets (e.g.: <authors, terms, time>) for j in range(T.shape[1]): for i in range(T.shape[2]): print T[:, i, j] [ 0 8 16] [ 2 10 18] [ 4 12 20] [ 6 14 22] [ 1 9 17] [ 3 11 19] [ 5 13 21] [ 7 15 23] # supposing the existence of unfold >>> T.unfold(0) array([[ 0, 2, 4, 6, 1, 3, 5, 7], [ 8, 10, 12, 14, 9, 11, 13, 15], [16, 18, 20, 22, 17, 19, 21, 23]]) >>> T.unfold(1) array([[ 0, 8, 16, 1, 9, 17], [ 2, 10, 18, 3, 11, 19], [ 4, 12, 20, 5, 13, 21], [ 6, 14, 22, 7, 15, 23]]) >>> T.unfold(2) array([[ 0, 8, 16, 2, 10, 18, 4, 12, 20, 6, 14, 22], [ 1, 9, 17, 3, 11, 19, 5, 13, 21, 7, 15, 23]])
  • 21. RANK-1 TENSOR The outer product of N vectors results in a rank-1 tensor array([[[ 1., 2.], [ 2., 4.], [ 3., 6.], [ 4., 8.]], [[ 2., 4.], [ 4., 8.], [ 6., 12.], [ 8., 16.]], [[ 3., 6.], [ 6., 12.], [ 9., 18.], [ 12., 24.]]]) a = np.array([1, 2, 3]) b = np.array([1, 2, 3, 4]) c = np.array([1, 2]) T = np.zeros((a.shape[0], b.shape[0], c.shape[0])) for i in range(a.shape[0]): for j in range(b.shape[0]): for k in range(c.shape[0]): T[i, j, k] = a[i] * b[j] * c[k] T = a(1) · · · a(N) = a c b Ti,j,k = a (1) i a (2) j a (3) k
  • 22. TENSOR RANK ▸ Every tensor can be written as a sum of rank-1 tensors = a1 aJ c1 cJ b1 bJ + + ▸ Tensor rank: smallest number of rank-1 tensors 
 that can generate it by summing up X ⇡ RX r=1 a(1) r a(2) r · · · a(N) r ⌘ JA(1) , A(2) , · · · , A(N) K T ⇡ RX r=1 ar br cr ⌘ JA, B, CK
  • 23. array([[[ 61., 82.], [ 74., 100.], [ 87., 118.], [ 100., 136.]], [[ 77., 104.], [ 94., 128.], [ 111., 152.], [ 128., 176.]], [[ 93., 126.], [ 114., 156.], [ 135., 186.], [ 156., 216.]]]) A = np.array([[1, 2, 3], [4, 5, 6]]).T B = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]).T C = np.array([[1, 2], [3, 4]]).T T = np.zeros((A.shape[0], B.shape[0], C.shape[0])) for i in range(A.shape[0]): for j in range(B.shape[0]): for k in range(C.shape[0]): for r in range(A.shape[1]): T[i, j, k] += A[i, r] * B[j, r] * C[k, r] T = np.einsum('ir,jr,kr->ijk', A, B, C) : Kruskal Tensorbr cr ⌘ JA, B, CK
  • 24. TENSOR FACTORIZATION ▸ CANDECOMP/PARAFAC factorization (CP) ▸ extensions of SVD / PCA / NMF to tensors NON-NEGATIVE TENSOR FACTORIZATION ▸ Decompose a non-negative tensor to 
 a sum of R non-negative rank-1 tensors arg min A,B,C kT JA, B, CKk with JA, B, CK ⌘ RX r=1 ar br cr subject to A 0, B 0, C 0
  • 25. TENSOR FACTORIZATION: HOW TO Alternating Least Squares(ALS):
 Fix all but one factor matrix to which LS is applied min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k denotes the Khatri-Rao product, which is a column-wise Kronecker product, i.e., C B = [c1 ⌦ b1, c2 ⌦ b2, . . . , cr ⌦ br] T(1) = ˆA(ˆC ˆB)T T(2) = ˆB(ˆC ˆA)T T(3) = ˆC(ˆB ˆA)T Unfolded Tensor
 on the kth mode
  • 26. F = [zeros(n, r), zeros(m, r), zeros(o, r)] FF_init = np.rand((len(F), r, r)) def iter_solver(T, F, FF_init): # Update each factor for k in range(len(F)): # Compute the inner-product matrix FF = ones((r, r)) for i in range(k) + range(k+1, len(F)): FF = FF * FF_init[i] # unfolded tensor times Khatri-Rao product XF = T.uttkrp(F, k) F[k] = F[k]*XF/(F[k].dot(FF)) # F[k] = nnls(FF, XF.T).T FF_init[k] = (F[k].T.dot(F[k])) return F, FF_init min A 0 kT(1) A(C B)T k min B 0 kT(2) B(C A)T k min C 0 kT(3) C(B A)T k arg min W,H kX WHk s. J. Kim and H. Park. Fast Nonnegative Tensor Factorization with an Active-set-like Method.
 In High-Performance Scientific Computing: Algorithms and Applications, Springer, 2012, pp. 311-326. W W • XHT WHHT T(1)(C B)
  • 27. HOW TO INTERPRET: USER X TERM X TIME X is a 3-way tensor in which xnmt is 1 if the term m was used by user n at interval t, 0 otherwise ANxK is the the association of each user n to a factor k BMxK is the association of each term m to a factor k CTxK shows the time activity of each factor users users C = X A B (N×M×T) (T×K) (N×K) (M×K) terms tim e tim e terms factors
  • 29. TOOLS FOR TENSOR DECOMPOSITION
  • 30. TOOLS FOR TENSOR FACTORIZATION
  • 31. TOOLS: THE PYTHON WORLD NumPy SciPy Scikit-Tensor (under development): github.com/mnick/scikit-tensor NTF: gist.github.com/panisson/7719245
  • 32. TENSOR DECOMPOSITION OF WEARABLE SENSOR DATA
  • 33.
  • 37. 0 1 0 1 0 1 0 1 0 FROM TEMPORAL GRAPHS TO 3-WAY TENSORS
  • 38. temporal network tensorial representation tensor factorization factors communities temporal activity factorization quality A,B C tuning the complexity of the model nodes communities 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B 50 60 70 80 35 40 45 50 35 40 45 50 50 60 0 10 20 30 4040 0 5 10 15 20 25 30 50 60 70 80 35 40 45 50 40 45 50 50 60 0 10 20 30 4040 0 5 10 15 20 25 30 50 60 70 80 35 40 45 50 50 60 0 10 20 30 4040 0 5 10 15 20 25 30 structures in temporal networks components nodes time time interval quality metrics component
  • 39. L. Gauvin et al., PLoS ONE 9(1), e86028 (2014) 1B 5A 3B 5B 2B 2A 3A 4A 1A 4B TENSOR DECOMPOSITION OF SCHOOL NETWORK
  • 41. Laetitia Gauvin Ciro Cattuto Anna Sapienza .fit().predict() ( )