2. Overview
Differential privacy (DP)
• Degrees of privacy protection [Dwork+06]
Gibbs posterior
• A generalization of the Bayesian posterior
Contribution
We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness
of the loss
2
3. Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
3
4. Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
4
5. Privacy constraint in ML & statistics
5
𝑋1 𝑋2 𝑋 𝑛
⋯
User’s data 𝐷 Curator Statistic 𝜃
6. Privacy constraint in ML & statistics
6
𝑋1 𝑋2 𝑋 𝑛
⋯
User’s data 𝐷 Curator Statistic 𝜃
In many applications of ML & statistics, the data 𝐷 =
{𝑋1, … , 𝑋 𝑛} contains user’s personal information
Problem: Calculate a statistic of interest 𝜃 privately
TBD.
8. Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
8
𝑋1 𝑋2 𝑋 𝑛
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Auxiliary info. 𝐷′
9. Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
9
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
10. Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
10
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯
11. Adversarial formulation of privacy
Example: Mean of binary-valued query (Yes: 1, No: 0)
11
𝑋1 𝑋2 𝑋 𝑛
⋯
Noise
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Small noise for 𝜃
Adding noise may not
deteriorate the accuracy
Large noise for 𝑋𝑖
Privacy preservation
13. Differential privacy
Idea:
2. Two “adjacent” datasets differing in a single individual
should be statistically indistinguishable
13
𝑋1 𝑋2 𝑋 𝑛
⋯
𝑋1
′
𝑋2 𝑋 𝑛
⋯
Close in the sense of
a “statistical distance”
14. Differential privacy
Def: Differential Privacy [Dwork+06]
• 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters
• 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if
1. for any adjacent datasets 𝐷, 𝐷′, and
2. for any set 𝐴 ⊂ Θ of outputs,
the following inequality holds:
14
15. Interpretation of DP
• DP prevents identification with statistical significance
• e.g. Adversary cannot construct power 𝛾-test for
𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋
at 5% significance level
• See also:
15
16. DP and statistical learning
Example: Linear classification
• Find a 𝜀, 𝛿 -DP distribution of hyperplanes
that minimizes the expected classification error
16
17. Differentially private learning
Question: What kind of random estimators should we use?
1. Noise addition to a deterministic estimator
• e.g. maximum likelihood estimator + noise
2. Modification of the Bayesian posterior (this work)
17
18. Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
18
20. Gibbs posterior
A natural data-dependent distribution in statistics & ML
• Contains the Bayesian posterior
ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1
• Important in PAC-Bayes theory [Catoni07][Zhang06]
20
Loss function
ℓ(𝜃, 𝑥)
Prior distribution
𝜋
Inverse temperature
𝛽 > 0
22. Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
22
𝛽 → 0
23. Gibbs posterior
Problem
• If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior
• Is DP satisfied if we choose 𝛽 > 0 sufficiently small?
23
𝛽 → 0
Answer
Yes, if…
• ℓ is bounded (Previously known)
• 𝛻ℓ is bounded (This work)
25. The exponential mechanism
Theorem [MT07]
An algorithm that draws 𝜃 from a distribution
satisfies (𝜀, 0)-DP
• This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1
𝑛
ℓ(𝜃, 𝑥𝑖)
• 𝛽 has to satisfy
𝛽 ≤
𝜀
2Δℒ
• Δℒ: sensitivity (TBD.)
25
28. Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
28𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞
29. Loss function that does not satisfy (𝜀, 0)-
DP
• Logistic loss
ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧
• The max difference of loss (≈ 𝑀) grows toward +∞
as DiamΘ → ∞
29𝜃
𝑀
ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 )
+∞
We need differential privacy
without sensitivity!
30. From bounded to Lipschitz
• In the example of logistic loss, the 1st derivative is
bounded
• The Lipschitz constant 𝐿 is not influenced by
the size of parameter space DiamΘ
30
31. Main theorem
31
Theorem [Minami+16]
Assumption:
1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex
2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex
3. Θ = ℝ 𝑑
The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as
(1)
Independent of the sensitivity!
32. Outline
1. Differential privacy
2. Differentially private learning
1. Background
2. Main result Differential privacy of Gibbs posterior [Minami+16]
3. Applications
1. Logistic regression
2. Posterior approximation method
32
34. Example: Logistic Loss
• Gaussian prior
𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼
• The Gibbs posterior is given by:
• 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if
34
35. Langevin Monte Carlo method
• In practice, sampling from the Gibbs posterior can be a
computationally hard problem
• Some approximate sampling methods are used
(e.g. MCMC, VB)
35
37. Langevin Monte Carlo method
• “Mixing-time” results have been derived for log-concave
distributions [Dalalyan14][Durmus & Moulines15]
• LMC can attain 𝛾-approximation after finite 𝑇 iterations
• Polynomial time in 𝑛 and 𝛾−1
:
𝑇 ∼ 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
37
38. • I have a Privacy Preservation guarantee
• I have an Approximate Posterior
• (Ah…)
38
39. Privacy Preserving Approximate Posterior (PPAP)
• We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior
Proposition [Minami+16]
• Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem.
• We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳
• After 𝑂
𝑛
𝛾
2
log
𝑛
𝛾
2
iterations, the output of the LMC satisfies
(𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP.
39
40. Summary
1. Differentially private learning
= Differential privacy + Statistical learning
2. We developed a new method to prove (𝜀, 𝛿)-DP
for Gibbs posteriors without “sensitivity”
• Applicable to Lipschitz & convex losses
• (+) Guarantee for an approximate sampling method
Thank you!
40
Editor's Notes
009F91
In practical data analysis or machine learning setting, the dataset, denoted by D,contains user’s personal information
So we want to protect user’s data by DP
I now introduce the formal definition of differential privacy for data-dependent distributions
(Differential privacy defines the robustness of randomized statistics)
rho_D is a randomized statsitics, OR similarly a data-dependent prob. measure on a certain param. space
we say that rho_D satisfies (e,d)-DP if…
here adjacent means “Hamming distance 1”
The figure is an example of linear classification
Here the dataset D consists of binary labeled points, and our classifier, theta, is a hyperplane
In the differentially private manner, we release a random hyperplane,instead of a usual deterministic one
\inf_{\rho_D: \; (\varepsilon, \delta)\text{-DP}} \mathbb{E}_{\theta \sim \rho_D} R(\theta)
So our problem (in general) is stated like this