SlideShare a Scribd company logo
1 of 60
Download to read offline
Tighter Upper Bound of Real Log
Canonical Threshold of Non-negative
Matrix Factorization and its Application
to Bayesian Inference
Naoki Hayashi* (TokyoTech, Dept. of MCS)
Sumio Watanabe (TokyoTech, Dept. of MCS)
12017/11/28 IEEE SSCI 2017 FOCI, Hawaii
Slide
• This slide is available at
http://watanabe-www.math.dis.titech.ac.jp/~nhayashi
/pdf/hayashi1039.pdf
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 2
Index
• Introduction
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 3
1. INTRODUCTION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 4
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 5
NMF has been applied
• Non-negative Matrix Factorization (NMF) has
been applied to many field
• E. g.
– Purchase basket data → Consumer analysis
– Image, sound,… → Signal processing
– Text data → Text mining
– Microarray data → Bioinformatics
↑ Knowledge/Structure Discovery
NMF: data → knowledge
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 6
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 7
HOWEVER
AIC BIC
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
Hierarchical structure causes non-identifibility :
𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟐 −𝟑
𝟏 𝟐
𝟐 −𝟑
𝟏 𝟐
−𝟏
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏
𝟕
𝟓 𝟑
𝟓 𝟑
𝟔 𝟓
𝟏𝟕 𝟓 𝟐𝟎
𝟗 𝟏 𝟒
=
𝟏𝟔 𝟒 𝟏𝟔
𝟏𝟔 𝟒 𝟏𝟔
𝟐𝟏 𝟓 𝟐𝟎
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 8
HOWEVER
AIC BIC
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
Hierarchical structure causes non-identifibility :
𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏 𝟑
𝟏 𝟑
𝟏 𝟒
𝟐 −𝟑
𝟏 𝟐
𝟐 −𝟑
𝟏 𝟐
−𝟏
𝟏 𝟏 𝟒
𝟓 𝟏 𝟒
=
𝟏
𝟕
𝟓 𝟑
𝟓 𝟑
𝟔 𝟓
𝟏𝟕 𝟓 𝟐𝟎
𝟗 𝟏 𝟒
=
𝟏𝟔 𝟒 𝟏𝟔
𝟏𝟔 𝟒 𝟏𝟔
𝟐𝟏 𝟓 𝟐𝟎
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 9
HOWEVER
AIC BIC
One matrix,
(at least) pairs
about the NMF
Suffering
• NMF has hierarchical structure
• Likelihood cannot be approximated
by Gaussian function
• Traditional statistics cannot be used
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 10
HOWEVER
• Strongly depending on initial value
• Suffering from many local minima
– It seldom reaches to the global minimum.
In Addition +
AIC BIC
Learning Theory of NMF
• NMF has been used for ``data → knowledge’’
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 11
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been yet established
– Prediction accuracy has not been yet clarified
No guarantee for correctness of
numerical calculation
No method for theoretical
hyperparameter tuning
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 12
Learning Theory of NMF
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been yet established
– Prediction accuracy has not been yet clarified
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 13
Constructing its theory is
an important problem
Learning Theory of NMF
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sktech of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 14
• In general [Watanabe, 2001]
– Let n be the sample size
– Bayesian generalization error 𝑮 𝒏 has
an asymptotic behavior:
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
• Learning coefficient
𝝀 depends on the model
• 𝝀 is called real log canonical
threshold (RLCT)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 15
When does RLCT appear?
Error: Bayes<<Freq.
• In hierarchical structure model, Bayesian 𝝀 is
smaller than frequentist’s one and maximum
posterior one [Watanabe,2001 and 2009]
• Bayesian inference is effective for reducing
the generalization error
• We consider Bayesian inference framework
– Bayesian inference for NMF has been proposed
[Cemgil, 2009] Rem: ← is only discrete
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 16
RLCT of NMF is unknown
• NMF has been used for ``data → knowledge’’
• Mathematical property is unknown
– Learning theory has not been established
– Prediction accuracy has not been clarified
↑ means that the RLCT of NMF has
not been clarified
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 17
Def. RLCT
• RLCT is characterized as a learning coefficient
• It is defined by the largest pole of the
following complex function:
𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘,
where 𝑲 is KL-divergence from true distribution
to learning machine and 𝝋 is prior.
• A statistical model selection method that uses
RLCTs has been proposed [Drton, et al. 2017]
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 18
Def. RLCT
• RLCT is characterized as a learning coefficient
• It is defined by the largest pole of the
following complex function:
𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘,
where 𝑲 is KL-divergence from true distribution
to learning machine and 𝝋 is prior.
• A statistical model selection method that uses
RLCTs has been proposed [Drton, et al. 2017]
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 19
known as sBIC
(singular BIC)
Index
• Introduction
– Non-negative Matrix Factorization
– Real Log Canonical Threshold
– Research Goal
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sktech of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 20
Research Goal
• Constructing learning theory of NMF
→focus theoretical generalization error
→focus RLCT of NMF
• Recently, we derived an upper bound of
RLCT [Hayashi, et. al. 2017]
• We used algebraic geometrical method
(singularity resolution)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 21
Research Goal
• Constructing learning theory of NMF
→focus theoretical generalization error
→focus RLCT of NMF
• In this research, we newly derive the exact
value of the RLCT of NMF in the case rank ≦ 2
• Using the above exact value, we make the
upper bound tighter than previous one
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 22
2. MAIN THEOREM
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 23
Index
• Introduction
• Main Theorem
– Bayesian Framework of NMF
– Main Result
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 24
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 25
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
• True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵
• Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 26
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵 𝑯 𝟎 𝑵
𝑯 𝟎
𝑴 𝑨
𝑩
Formalizing and Setting
• Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏)
– For general, we treat not only n=1 but also n>1.
• True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵
• Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵
• What is the Bayesian framework of ↑?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 27
[Kohjima et al. 2016/6, modified]
𝑾𝑴
𝑵 𝑯 𝟎 𝑵
𝑯 𝟎
𝑴 𝑨
𝑩
• Notation of probability density function (PDF)
– 𝑞 𝑊 : true distribution,
– 𝑝 𝑊 𝑋, 𝑌 : learning machine,
– 𝑝∗ 𝑊 : predictive distribution,
whose domains are Euclidian sp.
– 𝜑 𝑋, 𝑌 : prior distribution,
– 𝑝 𝑋, 𝑌 𝑊 𝑛 : posterior distribution given data,
whose domains are compact subsets of Euclidian sp.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 28
Formalizing and Setting
data
parameter
• Assume
𝒒 𝑾 ∝ 𝐞𝐱𝐩 −
𝟏
𝟐
𝑾 − 𝑨𝑩 𝟐 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐞𝐱𝐩 −
𝟏
𝟐
𝑾 − 𝑿𝒀 𝟐
,
and prior 𝝋 is strictly positive and bounded in a
neighborhood of 𝑨, 𝑩 .
• Remark: Poisson and exponential dist. can be
also applied [Hayashi, et al. 2017].
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 29
Formalizing and Setting
Bayesian Framework
• The posterior is defined by
𝒑 𝑿, 𝒀 𝑾 𝒏 =
𝟏
𝒁 𝒏
ෑ
𝒊=𝟏
𝒏
𝒑 𝑾𝒊 𝑿, 𝒀 𝝋 𝑿, 𝒀
where 𝒁 𝒏 is normalizing constant.
• The predictive distribution is defined by
𝒑∗
𝑾 = න𝒑 𝑾 𝑿, 𝒀 𝒑 𝑿, 𝒀 𝑾 𝒏
)𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 30
Bayesian Framework
• The Bayesian generalization error is defined by
KL-divergence from true to predictive dist. :
𝑮 𝒏 = න 𝒒 𝑾 𝐥𝐨𝐠
𝒒 𝑾
𝒑∗ 𝑾
𝒅𝑾.
• This depends on the training data thus it is a
random variable.
• Its expected value among the overall data has
an asymptotic behavior:
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 31
Index
• Introduction
• Main Theorem
– Bayesian Framework of NMF
– Main Result
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 32
Def. RLCT of NMF
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 33
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
• 𝜻 𝒛 can be analytically continued to the entire
complex plane and its poles are negative
rational numbers.
• The largest pole of 𝜻 𝒛
equals −𝝀 . Then,
𝝀 is the RLCT of NMF.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 34
Def. RLCT of NMF
• The RLCT of NMF is defined by the minus
maximum pole of the following zeta function:
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
• 𝜻 𝒛 can be analytically continued to the entire
complex plane and its poles are negative
rational numbers.
• The largest pole of 𝜻 𝒛
equals −𝝀 . Then,
𝝀 is the RLCT of NMF.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 35
𝐎
𝐗 𝐗 𝐗 𝐗 𝐗
𝒛 = −𝝀
Def. RLCT of NMF
ℂ
Main Theorem
• The RLCT of NMF 𝝀 satisfies the following
inequality:
𝝀 ≤
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
,
where
𝜹 𝑯 𝟎
= ቊ
𝟏 (𝑯 𝟎 ≅ 𝟏, 𝐦𝐨𝐝 𝟐)
𝟎 (𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆)
.
The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 𝐨𝐫 𝟐 or 𝑯 𝟎 = 𝟎.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 36
3. DISCUSSION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 37
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 38
Tighter than previous
• Main Theorem shows an upper bound of the
RLCT of NMF.
• We have derived another bound of it in
previous research.
• How tight is the new bound?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 39
Tighter than previous
• In previous work,
𝝀 ≤ 𝝀 𝒑𝒓𝒗 =
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 .
• In this paper,
𝝀 ≤ 𝝀 𝒏𝒆𝒘 =
𝟏
𝟐
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
.
• By comparing them, we improve the bound
𝝀 𝒏𝒆𝒘 − 𝝀 𝒑𝒓𝒗 =
𝟏
𝟐
𝑯 𝟎 − 𝜹 𝑯 𝟎
.
• True dist. is more complex, bound is tighter.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 40
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 41
Bound of Error
• Main Theorem shows an upper bound of the
Bayesian generalization error via
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
.
• Actually, we have
𝔼 𝑮 𝒏 ≤
𝟏
𝟐𝒏
𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎
+ 𝒐
𝟏
𝒏
.
– This gives guarantee of accuracy!
• What distribution can we bound the error?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 42
Robustness on Dist.
• Main Theorem assumes that elements of
parameter matrices are subject to normal
distribution:
𝒒 𝑾 ∝ 𝓝 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝓝 𝑾 𝑿𝒀 .
• Can Main Theorem be used even for other
distributions?
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 43
Robustness on Dist.
• In the prior work [Hayashi, et. al. 2017],
we proved that same zeta function can be
applied to Poisson and exponential distribution:
𝒒 𝑾 ∝ 𝐏𝐨𝐢 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐏𝐨𝐢 𝑾 𝑿𝒀 ,
𝒒 𝑾 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑨𝑩 ,
𝒑 𝑾 𝑿, 𝒀 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑿𝒀 ,
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀 .
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 44
Even if
We can use
Robustness on Dist.
• The above result is derived by the fact that
I-divergence and Itakura Saito-divergence have
same RLCT as square error.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 45
Distribution Normal Poisson Exponential
Similarity Sq. error I-divergence IS-divergence
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀
We can use
Same RLCTs
Robustness on Dist.
• The above result is derived by the fact that
I-divergence and Itakura Saito-divergence have
same RLCT as square error.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 46
Distribution Normal Poisson Exponential
Similarity Sq. error I-divergence IS-divergence
𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛
𝒅𝑿𝒅𝒀
We can use
Same RLCTs
Main Thm.
is attained!
Index
• Introduction
• Main Theorem
• Discussion
– Tightness
– Theoretical Application
– Numerical Experiment and Conjecture
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 47
• We carried out experiment to estimate the
exact value of RLCT.
– (+) to compare with the RLCT of reduced rank
regression (RRR); non-restricted matrix factorization.
• The posterior cannot be analytically derived.
→Markov Chain Monte Carlo(MCMC)
– We used Metropolis Hastings method.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 48
Numerical Experiment
• We made artificial data and set the following
cases:
– 1. exact value of RLCT of NMF is known.
– 2. exact value is unknown and rank = rank+.
– 3. exact value is unknown and rank ≠ rank+.
• rank+ : minimal inner dimension of NMF
– ↑ is called non-negative rank.
– In general, rank+≧ rank holds.
– If min{rows, columns} ≦ 3 or rank ≦2, rank =rank+.
– There is a non-negative matrix s.t. rank<rank+.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 49
Non-negative Rank
• Sample size n=200 (parameter dimension≦50)
• The number of data sets D=100
→ we empirically calculated the RLCT using
𝔼 𝑮 𝒏 =
𝝀
𝒏
+ 𝒐
𝟏
𝒏
→ 𝝀 ≈ 𝒏𝔼 𝑮 𝒏 ≈
𝒏
𝑫
෍
𝒋=𝟏
𝑫
𝑮 𝒏
𝒋
• MCMC sample size K=1,000
– Burn-in=20,000, thin=20, i.e. sampling iteration is 40,000.
• For calculating 𝑮 𝒏, we generated T=20,000 test
datas from the true distribution.
• Total: 100*(40,000+1,000*20,000) ≈ 𝑶 𝑫𝑲𝑻
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 50
Condition of Experiments
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 51
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 52
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
Numerical results equal
theoretical value: 𝝀 𝑵 = 𝝀.
Numerical calculation is correct!
r: true rank
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 53
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
If rank = rank+, then numerical
results equal RRR case: 𝝀 𝑵 = 𝝀 𝑹.
It seems that if rank = rank+,
then the RLCT of NMF 𝝀 = 𝝀 𝑹.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 54
Numerical Result
𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹
Numerical
calculated
The exact
value in NMF
The upper
bound in NMF
The exact
value in RRR
r: true rank
If rank ≠ rank+, then num. results
are larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹.
If rank ≠ rank+,
then the RLCT of NMF is
larger than RRR case:
𝝀 𝑵 > 𝝀 𝑹.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 55
ConjectureFrom the paper:
4. CONCLUSION
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 56
Index
• Introduction
• Main Theorem
• Discussion
• Conclusion
• (Appendix: Sketch of Proof)
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 57
• (Main contribution) We mathematically
improved the upper bound of the RLCT of NMF.
– This made the bound of the generalization error in
Bayesian NMF tighter.
• (Minor contribution) We carried out experiments
and suggested conjecture about the exact value
of the RLCT:
– ・ rank = rank+ ⇒ RLCT of NMF = RLCT of RRR.
– ・ rank ≠ rank+ ⇒ RLCT of NMF > RLCT of RRR.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 58
Conclusion
APPENDIX: SKETCH OF
PROOF FOR MAIN THEOREM
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 59
• We have already derived the exact value in the
case 𝑯 𝟎 = 𝟎 and 𝑯 = 𝑯 𝟎 = 𝟏.
• We newly clarified the exact value in the case
𝑯 = 𝑯 𝟎 = 𝟐 by considering the dimension of
algebraic subvariety in the parameter space.
• We bound the RLCT in the case 𝑯 = 𝑯 𝟎 by using
the clarified exact value mentioned above.
• We bound the RLCT in general case by using the
above results.
2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 60
Sketch of Proof

More Related Content

What's hot

Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsAnmol Dwivedi
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Anmol Dwivedi
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesFörderverein Technische Fakultät
 
17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis FunctionsAndres Mendez-Vazquez
 
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...ijscai
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot LearningMasahiro Suzuki
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmLoc Nguyen
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...Jeongmin Cha
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix FactorizationTatsuya Yokota
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization csandit
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 

What's hot (20)

Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision ApplicationsTutorial on Markov Random Fields (MRFs) for Computer Vision Applications
Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications
 
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
Inference & Learning in Linear Chain Conditional Random Fields (CRFs)
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Random Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application ExamplesRandom Matrix Theory in Array Signal Processing: Application Examples
Random Matrix Theory in Array Signal Processing: Application Examples
 
17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions17 Machine Learning Radial Basis Functions
17 Machine Learning Radial Basis Functions
 
ppt0320defenseday
ppt0320defensedayppt0320defenseday
ppt0320defenseday
 
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...STUDY OF Ε-SMOOTH SUPPORT VECTOR  REGRESSION AND COMPARISON WITH Ε- SUPPORT  ...
STUDY OF Ε-SMOOTH SUPPORT VECTOR REGRESSION AND COMPARISON WITH Ε- SUPPORT ...
 
Joco pavone
Joco pavoneJoco pavone
Joco pavone
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
Nonnegative Matrix Factorization
Nonnegative Matrix FactorizationNonnegative Matrix Factorization
Nonnegative Matrix Factorization
 
Sparse codes for natural images
Sparse codes for natural imagesSparse codes for natural images
Sparse codes for natural images
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization Modified Vortex Search Algorithm for Real Parameter Optimization
Modified Vortex Search Algorithm for Real Parameter Optimization
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 

Similar to IEEESSCI2017-FOCI4-1039

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7tingyuansenastro
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017SERC at Carleton College
 
Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Kannan Kathiravan
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Manifold learning
Manifold learningManifold learning
Manifold learningWei Yang
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Camille Maumet
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
Uncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationUncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationSubhashis Hazarika
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsHenry Schreiner
 
Matching Network
Matching NetworkMatching Network
Matching NetworkSuwhanBaek
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptxadnansbp
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsJason Riedy
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIWanjin Yu
 
Evoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeEvoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeFabio Caraffini
 
A principled way to principal components analysis
A principled way to principal components analysisA principled way to principal components analysis
A principled way to principal components analysisSERC at Carleton College
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep LearningStefan Kühn
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...ssuser4b1f48
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
 

Similar to IEEESSCI2017-FOCI4-1039 (20)

ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 7
 
Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017Developing Computational Skills in the Sciences with Matlab Webinar 2017
Developing Computational Skills in the Sciences with Matlab Webinar 2017
 
Optimization for-power-sy-8631549
Optimization for-power-sy-8631549Optimization for-power-sy-8631549
Optimization for-power-sy-8631549
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Manifold learning
Manifold learningManifold learning
Manifold learning
 
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
Supporting image-based meta-analysis with NIDM: Standardized reporting of neu...
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
Uncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and explorationUncertainty aware multidimensional ensemble data visualization and exploration
Uncertainty aware multidimensional ensemble data visualization and exploration
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNs
 
Matching Network
Matching NetworkMatching Network
Matching Network
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
 
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming GraphsScalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Evoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in PracticeEvoknow17 Large Scale Problems in Practice
Evoknow17 Large Scale Problems in Practice
 
A principled way to principal components analysis
A principled way to principal components analysisA principled way to principal components analysis
A principled way to principal components analysis
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...NS-CUK Seminar: H.B.Kim,  Review on "Sequential Recommendation with Graph Neu...
NS-CUK Seminar: H.B.Kim, Review on "Sequential Recommendation with Graph Neu...
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single image
 

More from Naoki Hayashi

【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】Naoki Hayashi
 
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】Naoki Hayashi
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介Naoki Hayashi
 
ベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-oldベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-oldNaoki Hayashi
 
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」Naoki Hayashi
 
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.Naoki Hayashi
 
すずかけはいいぞ
すずかけはいいぞすずかけはいいぞ
すずかけはいいぞNaoki Hayashi
 
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)Naoki Hayashi
 
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)Naoki Hayashi
 
Rogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublicRogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublicNaoki Hayashi
 
Rogyゼミスライド6th
Rogyゼミスライド6thRogyゼミスライド6th
Rogyゼミスライド6thNaoki Hayashi
 
Rogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけRogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけNaoki Hayashi
 
ぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつNaoki Hayashi
 
情報統計力学のすすめ
情報統計力学のすすめ情報統計力学のすすめ
情報統計力学のすすめNaoki Hayashi
 

More from Naoki Hayashi (19)

【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
【招待講演】パラメータ制約付き行列分解のベイズ汎化誤差解析【StatsML若手シンポ2020】
 
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
【学会発表】LDAにおけるベイズ汎化誤差の厳密な漸近形【IBIS2020】
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介
 
ベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-oldベイズ統計学の概論的紹介-old
ベイズ統計学の概論的紹介-old
 
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
修士論文発表:「非負値行列分解における漸近的Bayes汎化誤差」
 
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
諸君,じゃんけんに負けたからといって落ち込むことはない.長津田にも飯はある.
 
201803NC
201803NC201803NC
201803NC
 
201703NC
201703NC201703NC
201703NC
 
201709ibisml
201709ibisml201709ibisml
201709ibisml
 
すずかけはいいぞ
すずかけはいいぞすずかけはいいぞ
すずかけはいいぞ
 
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
RPG世界の形状及び距離の幾何学的考察(#rogyconf61)
 
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)RPG世界の形状及び距離の幾何学的考察(rogyconf61)
RPG世界の形状及び距離の幾何学的考察(rogyconf61)
 
Rogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublicRogyゼミ7thスライドpublic
Rogyゼミ7thスライドpublic
 
Rogyゼミスライド6th
Rogyゼミスライド6thRogyゼミスライド6th
Rogyゼミスライド6th
 
Rogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけRogy目覚まし(仮)+おまけ
Rogy目覚まし(仮)+おまけ
 
ぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつぼくのつくったこうだいさいてんじぶつ
ぼくのつくったこうだいさいてんじぶつ
 
情報統計力学のすすめ
情報統計力学のすすめ情報統計力学のすすめ
情報統計力学のすすめ
 
Rogyゼミ2014 10
Rogyゼミ2014 10Rogyゼミ2014 10
Rogyゼミ2014 10
 
Rogyzemi
RogyzemiRogyzemi
Rogyzemi
 

Recently uploaded

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 

Recently uploaded (20)

Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 

IEEESSCI2017-FOCI4-1039

  • 1. Tighter Upper Bound of Real Log Canonical Threshold of Non-negative Matrix Factorization and its Application to Bayesian Inference Naoki Hayashi* (TokyoTech, Dept. of MCS) Sumio Watanabe (TokyoTech, Dept. of MCS) 12017/11/28 IEEE SSCI 2017 FOCI, Hawaii
  • 2. Slide • This slide is available at http://watanabe-www.math.dis.titech.ac.jp/~nhayashi /pdf/hayashi1039.pdf 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 2
  • 3. Index • Introduction • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 3
  • 4. 1. INTRODUCTION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 4
  • 5. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 5
  • 6. NMF has been applied • Non-negative Matrix Factorization (NMF) has been applied to many field • E. g. – Purchase basket data → Consumer analysis – Image, sound,… → Signal processing – Text data → Text mining – Microarray data → Bioinformatics ↑ Knowledge/Structure Discovery NMF: data → knowledge 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 6
  • 7. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 7 HOWEVER AIC BIC
  • 8. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used Hierarchical structure causes non-identifibility : 𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟐 −𝟑 𝟏 𝟐 𝟐 −𝟑 𝟏 𝟐 −𝟏 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟕 𝟓 𝟑 𝟓 𝟑 𝟔 𝟓 𝟏𝟕 𝟓 𝟐𝟎 𝟗 𝟏 𝟒 = 𝟏𝟔 𝟒 𝟏𝟔 𝟏𝟔 𝟒 𝟏𝟔 𝟐𝟏 𝟓 𝟐𝟎 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 8 HOWEVER AIC BIC
  • 9. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used Hierarchical structure causes non-identifibility : 𝑿𝒀 = 𝑿𝑷𝑷−𝟏 𝒀; 𝐟𝐨𝐫 ∃𝑷 ≠ 𝑰; 𝑿, 𝒀, 𝑿𝑷, 𝑷−𝟏 𝒀 ≥ 𝟎 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟑 𝟏 𝟑 𝟏 𝟒 𝟐 −𝟑 𝟏 𝟐 𝟐 −𝟑 𝟏 𝟐 −𝟏 𝟏 𝟏 𝟒 𝟓 𝟏 𝟒 = 𝟏 𝟕 𝟓 𝟑 𝟓 𝟑 𝟔 𝟓 𝟏𝟕 𝟓 𝟐𝟎 𝟗 𝟏 𝟒 = 𝟏𝟔 𝟒 𝟏𝟔 𝟏𝟔 𝟒 𝟏𝟔 𝟐𝟏 𝟓 𝟐𝟎 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 9 HOWEVER AIC BIC One matrix, (at least) pairs about the NMF
  • 10. Suffering • NMF has hierarchical structure • Likelihood cannot be approximated by Gaussian function • Traditional statistics cannot be used 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 10 HOWEVER • Strongly depending on initial value • Suffering from many local minima – It seldom reaches to the global minimum. In Addition + AIC BIC
  • 11. Learning Theory of NMF • NMF has been used for ``data → knowledge’’ 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 11
  • 12. • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been yet established – Prediction accuracy has not been yet clarified No guarantee for correctness of numerical calculation No method for theoretical hyperparameter tuning 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 12 Learning Theory of NMF
  • 13. • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been yet established – Prediction accuracy has not been yet clarified 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 13 Constructing its theory is an important problem Learning Theory of NMF
  • 14. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sktech of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 14
  • 15. • In general [Watanabe, 2001] – Let n be the sample size – Bayesian generalization error 𝑮 𝒏 has an asymptotic behavior: 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 • Learning coefficient 𝝀 depends on the model • 𝝀 is called real log canonical threshold (RLCT) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 15 When does RLCT appear?
  • 16. Error: Bayes<<Freq. • In hierarchical structure model, Bayesian 𝝀 is smaller than frequentist’s one and maximum posterior one [Watanabe,2001 and 2009] • Bayesian inference is effective for reducing the generalization error • We consider Bayesian inference framework – Bayesian inference for NMF has been proposed [Cemgil, 2009] Rem: ← is only discrete 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 16
  • 17. RLCT of NMF is unknown • NMF has been used for ``data → knowledge’’ • Mathematical property is unknown – Learning theory has not been established – Prediction accuracy has not been clarified ↑ means that the RLCT of NMF has not been clarified 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 17
  • 18. Def. RLCT • RLCT is characterized as a learning coefficient • It is defined by the largest pole of the following complex function: 𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘, where 𝑲 is KL-divergence from true distribution to learning machine and 𝝋 is prior. • A statistical model selection method that uses RLCTs has been proposed [Drton, et al. 2017] 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 18
  • 19. Def. RLCT • RLCT is characterized as a learning coefficient • It is defined by the largest pole of the following complex function: 𝜻 𝒛 = න𝑲 𝒘 𝒛 𝝋 𝒘 𝒅𝒘, where 𝑲 is KL-divergence from true distribution to learning machine and 𝝋 is prior. • A statistical model selection method that uses RLCTs has been proposed [Drton, et al. 2017] 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 19 known as sBIC (singular BIC)
  • 20. Index • Introduction – Non-negative Matrix Factorization – Real Log Canonical Threshold – Research Goal • Main Theorem • Discussion • Conclusion • (Appendix: Sktech of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 20
  • 21. Research Goal • Constructing learning theory of NMF →focus theoretical generalization error →focus RLCT of NMF • Recently, we derived an upper bound of RLCT [Hayashi, et. al. 2017] • We used algebraic geometrical method (singularity resolution) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 21
  • 22. Research Goal • Constructing learning theory of NMF →focus theoretical generalization error →focus RLCT of NMF • In this research, we newly derive the exact value of the RLCT of NMF in the case rank ≦ 2 • Using the above exact value, we make the upper bound tighter than previous one 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 22
  • 23. 2. MAIN THEOREM 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 23
  • 24. Index • Introduction • Main Theorem – Bayesian Framework of NMF – Main Result • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 24
  • 25. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 25 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵
  • 26. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. • True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵 • Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 26 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵 𝑯 𝟎 𝑵 𝑯 𝟎 𝑴 𝑨 𝑩
  • 27. Formalizing and Setting • Data matrices: 𝑾 𝒏 = 𝑾 𝟏, … , 𝑾 𝒏 ; 𝑴 × 𝑵(× 𝒏) – For general, we treat not only n=1 but also n>1. • True factorization: 𝑨; 𝑴 × 𝑯 𝟎, 𝑩; 𝑯 𝟎 × 𝑵 • Learner factorization: 𝑿; 𝑴 × 𝑯, 𝒀; 𝑯 × 𝑵 • What is the Bayesian framework of ↑? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 27 [Kohjima et al. 2016/6, modified] 𝑾𝑴 𝑵 𝑯 𝟎 𝑵 𝑯 𝟎 𝑴 𝑨 𝑩
  • 28. • Notation of probability density function (PDF) – 𝑞 𝑊 : true distribution, – 𝑝 𝑊 𝑋, 𝑌 : learning machine, – 𝑝∗ 𝑊 : predictive distribution, whose domains are Euclidian sp. – 𝜑 𝑋, 𝑌 : prior distribution, – 𝑝 𝑋, 𝑌 𝑊 𝑛 : posterior distribution given data, whose domains are compact subsets of Euclidian sp. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 28 Formalizing and Setting data parameter
  • 29. • Assume 𝒒 𝑾 ∝ 𝐞𝐱𝐩 − 𝟏 𝟐 𝑾 − 𝑨𝑩 𝟐 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐞𝐱𝐩 − 𝟏 𝟐 𝑾 − 𝑿𝒀 𝟐 , and prior 𝝋 is strictly positive and bounded in a neighborhood of 𝑨, 𝑩 . • Remark: Poisson and exponential dist. can be also applied [Hayashi, et al. 2017]. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 29 Formalizing and Setting
  • 30. Bayesian Framework • The posterior is defined by 𝒑 𝑿, 𝒀 𝑾 𝒏 = 𝟏 𝒁 𝒏 ෑ 𝒊=𝟏 𝒏 𝒑 𝑾𝒊 𝑿, 𝒀 𝝋 𝑿, 𝒀 where 𝒁 𝒏 is normalizing constant. • The predictive distribution is defined by 𝒑∗ 𝑾 = න𝒑 𝑾 𝑿, 𝒀 𝒑 𝑿, 𝒀 𝑾 𝒏 )𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 30
  • 31. Bayesian Framework • The Bayesian generalization error is defined by KL-divergence from true to predictive dist. : 𝑮 𝒏 = න 𝒒 𝑾 𝐥𝐨𝐠 𝒒 𝑾 𝒑∗ 𝑾 𝒅𝑾. • This depends on the training data thus it is a random variable. • Its expected value among the overall data has an asymptotic behavior: 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 31
  • 32. Index • Introduction • Main Theorem – Bayesian Framework of NMF – Main Result • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 32
  • 33. Def. RLCT of NMF • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 33
  • 34. • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . • 𝜻 𝒛 can be analytically continued to the entire complex plane and its poles are negative rational numbers. • The largest pole of 𝜻 𝒛 equals −𝝀 . Then, 𝝀 is the RLCT of NMF. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 34 Def. RLCT of NMF
  • 35. • The RLCT of NMF is defined by the minus maximum pole of the following zeta function: 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . • 𝜻 𝒛 can be analytically continued to the entire complex plane and its poles are negative rational numbers. • The largest pole of 𝜻 𝒛 equals −𝝀 . Then, 𝝀 is the RLCT of NMF. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 35 𝐎 𝐗 𝐗 𝐗 𝐗 𝐗 𝒛 = −𝝀 Def. RLCT of NMF ℂ
  • 36. Main Theorem • The RLCT of NMF 𝝀 satisfies the following inequality: 𝝀 ≤ 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 , where 𝜹 𝑯 𝟎 = ቊ 𝟏 (𝑯 𝟎 ≅ 𝟏, 𝐦𝐨𝐝 𝟐) 𝟎 (𝒐𝒕𝒉𝒆𝒓𝒘𝒊𝒔𝒆) . The equality holds if 𝑯 = 𝑯 𝟎 = 𝟏 𝐨𝐫 𝟐 or 𝑯 𝟎 = 𝟎. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 36
  • 37. 3. DISCUSSION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 37
  • 38. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 38
  • 39. Tighter than previous • Main Theorem shows an upper bound of the RLCT of NMF. • We have derived another bound of it in previous research. • How tight is the new bound? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 39
  • 40. Tighter than previous • In previous work, 𝝀 ≤ 𝝀 𝒑𝒓𝒗 = 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟏 . • In this paper, 𝝀 ≤ 𝝀 𝒏𝒆𝒘 = 𝟏 𝟐 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 . • By comparing them, we improve the bound 𝝀 𝒏𝒆𝒘 − 𝝀 𝒑𝒓𝒗 = 𝟏 𝟐 𝑯 𝟎 − 𝜹 𝑯 𝟎 . • True dist. is more complex, bound is tighter. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 40
  • 41. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 41
  • 42. Bound of Error • Main Theorem shows an upper bound of the Bayesian generalization error via 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 . • Actually, we have 𝔼 𝑮 𝒏 ≤ 𝟏 𝟐𝒏 𝑯 − 𝑯 𝟎 𝐦𝐢𝐧 𝑴, 𝑵 + 𝑯 𝟎 𝑴 + 𝑵 − 𝟐 + 𝜹 𝑯 𝟎 + 𝒐 𝟏 𝒏 . – This gives guarantee of accuracy! • What distribution can we bound the error? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 42
  • 43. Robustness on Dist. • Main Theorem assumes that elements of parameter matrices are subject to normal distribution: 𝒒 𝑾 ∝ 𝓝 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝓝 𝑾 𝑿𝒀 . • Can Main Theorem be used even for other distributions? 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 43
  • 44. Robustness on Dist. • In the prior work [Hayashi, et. al. 2017], we proved that same zeta function can be applied to Poisson and exponential distribution: 𝒒 𝑾 ∝ 𝐏𝐨𝐢 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐏𝐨𝐢 𝑾 𝑿𝒀 , 𝒒 𝑾 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑨𝑩 , 𝒑 𝑾 𝑿, 𝒀 ∝ 𝐄𝐱𝐩𝐨 𝑾 𝑿𝒀 , 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 . 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 44 Even if We can use
  • 45. Robustness on Dist. • The above result is derived by the fact that I-divergence and Itakura Saito-divergence have same RLCT as square error. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 45 Distribution Normal Poisson Exponential Similarity Sq. error I-divergence IS-divergence 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 We can use Same RLCTs
  • 46. Robustness on Dist. • The above result is derived by the fact that I-divergence and Itakura Saito-divergence have same RLCT as square error. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 46 Distribution Normal Poisson Exponential Similarity Sq. error I-divergence IS-divergence 𝜻 𝒛 = ඵ 𝑿𝒀 − 𝑨𝑩 𝟐 𝒛 𝒅𝑿𝒅𝒀 We can use Same RLCTs Main Thm. is attained!
  • 47. Index • Introduction • Main Theorem • Discussion – Tightness – Theoretical Application – Numerical Experiment and Conjecture • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 47
  • 48. • We carried out experiment to estimate the exact value of RLCT. – (+) to compare with the RLCT of reduced rank regression (RRR); non-restricted matrix factorization. • The posterior cannot be analytically derived. →Markov Chain Monte Carlo(MCMC) – We used Metropolis Hastings method. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 48 Numerical Experiment
  • 49. • We made artificial data and set the following cases: – 1. exact value of RLCT of NMF is known. – 2. exact value is unknown and rank = rank+. – 3. exact value is unknown and rank ≠ rank+. • rank+ : minimal inner dimension of NMF – ↑ is called non-negative rank. – In general, rank+≧ rank holds. – If min{rows, columns} ≦ 3 or rank ≦2, rank =rank+. – There is a non-negative matrix s.t. rank<rank+. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 49 Non-negative Rank
  • 50. • Sample size n=200 (parameter dimension≦50) • The number of data sets D=100 → we empirically calculated the RLCT using 𝔼 𝑮 𝒏 = 𝝀 𝒏 + 𝒐 𝟏 𝒏 → 𝝀 ≈ 𝒏𝔼 𝑮 𝒏 ≈ 𝒏 𝑫 ෍ 𝒋=𝟏 𝑫 𝑮 𝒏 𝒋 • MCMC sample size K=1,000 – Burn-in=20,000, thin=20, i.e. sampling iteration is 40,000. • For calculating 𝑮 𝒏, we generated T=20,000 test datas from the true distribution. • Total: 100*(40,000+1,000*20,000) ≈ 𝑶 𝑫𝑲𝑻 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 50 Condition of Experiments
  • 51. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 51 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank
  • 52. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 52 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR Numerical results equal theoretical value: 𝝀 𝑵 = 𝝀. Numerical calculation is correct! r: true rank
  • 53. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 53 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank If rank = rank+, then numerical results equal RRR case: 𝝀 𝑵 = 𝝀 𝑹. It seems that if rank = rank+, then the RLCT of NMF 𝝀 = 𝝀 𝑹.
  • 54. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 54 Numerical Result 𝝀 𝑵 𝝀 𝝀 𝑩 𝝀 𝑹 Numerical calculated The exact value in NMF The upper bound in NMF The exact value in RRR r: true rank If rank ≠ rank+, then num. results are larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹. If rank ≠ rank+, then the RLCT of NMF is larger than RRR case: 𝝀 𝑵 > 𝝀 𝑹.
  • 55. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 55 ConjectureFrom the paper:
  • 56. 4. CONCLUSION 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 56
  • 57. Index • Introduction • Main Theorem • Discussion • Conclusion • (Appendix: Sketch of Proof) 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 57
  • 58. • (Main contribution) We mathematically improved the upper bound of the RLCT of NMF. – This made the bound of the generalization error in Bayesian NMF tighter. • (Minor contribution) We carried out experiments and suggested conjecture about the exact value of the RLCT: – ・ rank = rank+ ⇒ RLCT of NMF = RLCT of RRR. – ・ rank ≠ rank+ ⇒ RLCT of NMF > RLCT of RRR. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 58 Conclusion
  • 59. APPENDIX: SKETCH OF PROOF FOR MAIN THEOREM 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 59
  • 60. • We have already derived the exact value in the case 𝑯 𝟎 = 𝟎 and 𝑯 = 𝑯 𝟎 = 𝟏. • We newly clarified the exact value in the case 𝑯 = 𝑯 𝟎 = 𝟐 by considering the dimension of algebraic subvariety in the parameter space. • We bound the RLCT in the case 𝑯 = 𝑯 𝟎 by using the clarified exact value mentioned above. • We bound the RLCT in general case by using the above results. 2017/11/28 IEEE SSCI 2017 FOCI, Hawaii 60 Sketch of Proof