SlideShare a Scribd company logo
1 of 19
Download to read offline
PriPeARL: A Framework for Privacy-
Preserving Analytics and Reporting at
LinkedIn
CIKM 2018
Krishnaram Kenthapadi, Thanh Tran
Data @ LinkedIn
1
Analytics Products at LinkedIn
Profile View Analytics
2
Content Analytics
Ad Campaign Analytics
All showing demographics
of members engaging with
the product
Product Requirements: Utility and Privacy
3
• Insights into the audience engaging with
the product (e.g., profile, article, or ad)
→ Desirable for the aggregate statistics
to be available and accurate.
• Different aspects of data consistency:
- Repeated queries
- Over time
- Total vs. Demographic breakdowns
- Hierarchy (e.g., time, entity)
Utility Privacy
• Member actions could be considered
sensitive information (e.g., click on an
article or an ad).
→ Individual’s action cannot be
inferred from the results of analytics.
• Assume malicious use cases, e.g.,
attacker can set up ad campaigns to
infer the behavior of a certain member.
LMS
Application: LinkedIn Ads Analytics
4
Objective:
Compute robust, reliable analytics in a privacy-preserving
manner, while addressing the product desiderata such as utility,
coverage, and consistency.
Ad
Ad
Targeting
LI Ad
Serving
Ad
Analytics
Advertiser
Possible Attacks
5
Targeting:
Senior directors in US, who studied at Cornell
Matches ~16k LinkedIn members
→ over minimum targeting threshold
Demographic breakdown:
E.g., company = X
Matches exactly one person
→ can determine whether the person
clicks on the ad or not
Enforcing minimum reporting threshold
Attacker could create fake profiles
E.g., if threshold is 10, create 9 fake
profiles that all click.
Rounding mechanism
E.g., report incremental of 10
Still amenable to attacks
E.g., using incremental counts over time
to infer individuals’ actions
Need rigorous techniques to preserve member privacy, not
revealing exact aggregate counts
Differential Privacy: Definition
6
● ε-Differential Privacy: For neighboring databases D and D’ (differ by one record),
the distribution of the curator’s outputs on both databases are nearly the same .
● Parameter ε (ε > 0) quantifies information leakage
○ Smaller ε, more private
Dwork, McSherry, Nissim, Smith [TCC 2006]
Differential Privacy: Random Noise Addition
7
● Achieving differential privacy via random noise addition.
● Common approach: noise draw from the Laplace distribution.
○ Let s be L1 sensitivity of the query function f
s = max D, D’ || f(D) - f(D’) ||, D and D’ differ by one record
○ and ε the privacy parameter.
○ Then the parameter for Laplace distribution is (s/ε)
Dwork, McSherry, Nissim, Smith [TCC 2006]
● This query form also applies for other analytics applications
Ad Analytics Canonical Queries
8
SELECT COUNT(*)
FROM table(stateType, entity)
WHERE timestamp ≥ startTime AND timestamp ≤ endTime
AND dAttr = dVal
E.g., clicks on a given ad
E.g., Title = “Senior Director”
● Application admits a predetermined query form.
● Preserving privacy by adding Laplace noise
○ Protect privacy at the event level
PriPeARL: A Framework for Privacy-Preserving Analytics
9
Pseudo-random noise generation, inspired by differential privacy
● Entity id (creative/campaign/
campaign group/account)
● Demographic dimension
● Stat type (impressions, clicks)
● Time range
● Fixed secret seed
Uniformly Random
Fraction
● Cryptographic
hash
● Normalize to
(0,1)
Random
Noise
Laplace
Noise
● Fixed ε
True
count
Reported
count
To satisfy consistency
requirements
● Pseudo-random noise → same query has same result over time, avoid
averaging attack.
● For non-canonical queries (e.g., time ranges, aggregate multiple entities)
○ Use the hierarchy and partition into canonical queries
○ Compute noise for each canonical queries and sum up the noisy counts
System Architecture
10
Implemented and integrated into Ads Analytics product.
Can be used for general analytics product.
Performance Evaluation: Setup
11
● Experiments using LinkedIn ad analytics data
○ Consider distribution of impression and click queries
across (account, ad campaign) and demographic
breakdowns.
● Examine
○ Tradeoff between privacy and utility
○ Effect of varying minimum threshold (non-negative)
○ Top-n queries
Performance Evaluation: Results
12
Privacy and Utility Tradeoff
● For ε = 1, average absolute and signed errors
are small for both queries.
● Variance is also small, ~95% of queries have
error of at most 2.
Top-N Queries
● Common use case in LinkedIn applications.
● Jaccard distance as a function of ε and n.
● (This shows the worst case as queries with
return sets ≤ n and error=0 were omitted.)
Lessons Learned
13
● Lessons from privacy breaches → need “Privacy by Design”
● Consider business requirements and usability
○ Various consistency desiderata to ensure results useful and insightful
● Scaling across analytics applications
○ Abstract away application specifics, build libraries, and optimize for
performance
Acknowledgements
▹ Team:
▸ AI/ML: Krishnaram Kenthapadi, Thanh T. L. Tran
▸ Ad Analytics Product & Engineering: Mark Dietz, Taylor Greason, Ian
Koeppe
▸ Legal / Security: Sara Harrington, Sharon Lee, Rohit Pitke
▹ Additional Acknowledgements
▸ Deepak Agarwal, Igor Perisic, Arun Swami, Ya Xu, Yang Zhou
14
▹ Framework to compute robust, privacy-preserving analytics
▸ Addressing challenges such as preserving member privacy, product
coverage, utility, and data consistency
▹ Future
▸ Utility maximization problem given constraints on the ‘privacy loss budget’ per user
⬩ E.g., noise with larger variance to impressions but less noise to clicks (or
conversions)
⬩ E.g., more noise to broader time range sub-queries and less noise to granular
time range sub-queries
▹ Tech Report: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy-
Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018
(https://arxiv.org/pdf/1809.07754)
Summary
15
What’s Next: Privacy for ML / Data Applications
▹ Hard open questions
▸ Can we simultaneously develop highly personalized models
and ensure that the models do not encode private information
of members?
▸ How do we guarantee member privacy over time without
exhausting the “privacy loss budget”?
▸ How do we enable privacy-preserving mechanisms for data
marketplaces?
▹ Thanks!
16
Appendix
17
Algorithm to Computing Noisy Analytics
18
Performance Evaluation: Results
19
Varying minimum thresholds

More Related Content

More from Krishnaram Kenthapadi

Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Krishnaram Kenthapadi
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInPrivacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInKrishnaram Kenthapadi
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 

More from Krishnaram Kenthapadi (12)

Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...
 
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial)
 
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
 
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Privacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedInPrivacy-preserving Analytics and Data Mining at LinkedIn
Privacy-preserving Analytics and Data Mining at LinkedIn
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 

Recently uploaded

Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxBipin Adhikari
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 

Recently uploaded (20)

Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptx
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 

PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn

  • 1. PriPeARL: A Framework for Privacy- Preserving Analytics and Reporting at LinkedIn CIKM 2018 Krishnaram Kenthapadi, Thanh Tran Data @ LinkedIn 1
  • 2. Analytics Products at LinkedIn Profile View Analytics 2 Content Analytics Ad Campaign Analytics All showing demographics of members engaging with the product
  • 3. Product Requirements: Utility and Privacy 3 • Insights into the audience engaging with the product (e.g., profile, article, or ad) → Desirable for the aggregate statistics to be available and accurate. • Different aspects of data consistency: - Repeated queries - Over time - Total vs. Demographic breakdowns - Hierarchy (e.g., time, entity) Utility Privacy • Member actions could be considered sensitive information (e.g., click on an article or an ad). → Individual’s action cannot be inferred from the results of analytics. • Assume malicious use cases, e.g., attacker can set up ad campaigns to infer the behavior of a certain member.
  • 4. LMS Application: LinkedIn Ads Analytics 4 Objective: Compute robust, reliable analytics in a privacy-preserving manner, while addressing the product desiderata such as utility, coverage, and consistency. Ad Ad Targeting LI Ad Serving Ad Analytics Advertiser
  • 5. Possible Attacks 5 Targeting: Senior directors in US, who studied at Cornell Matches ~16k LinkedIn members → over minimum targeting threshold Demographic breakdown: E.g., company = X Matches exactly one person → can determine whether the person clicks on the ad or not Enforcing minimum reporting threshold Attacker could create fake profiles E.g., if threshold is 10, create 9 fake profiles that all click. Rounding mechanism E.g., report incremental of 10 Still amenable to attacks E.g., using incremental counts over time to infer individuals’ actions Need rigorous techniques to preserve member privacy, not revealing exact aggregate counts
  • 6. Differential Privacy: Definition 6 ● ε-Differential Privacy: For neighboring databases D and D’ (differ by one record), the distribution of the curator’s outputs on both databases are nearly the same . ● Parameter ε (ε > 0) quantifies information leakage ○ Smaller ε, more private Dwork, McSherry, Nissim, Smith [TCC 2006]
  • 7. Differential Privacy: Random Noise Addition 7 ● Achieving differential privacy via random noise addition. ● Common approach: noise draw from the Laplace distribution. ○ Let s be L1 sensitivity of the query function f s = max D, D’ || f(D) - f(D’) ||, D and D’ differ by one record ○ and ε the privacy parameter. ○ Then the parameter for Laplace distribution is (s/ε) Dwork, McSherry, Nissim, Smith [TCC 2006]
  • 8. ● This query form also applies for other analytics applications Ad Analytics Canonical Queries 8 SELECT COUNT(*) FROM table(stateType, entity) WHERE timestamp ≥ startTime AND timestamp ≤ endTime AND dAttr = dVal E.g., clicks on a given ad E.g., Title = “Senior Director” ● Application admits a predetermined query form. ● Preserving privacy by adding Laplace noise ○ Protect privacy at the event level
  • 9. PriPeARL: A Framework for Privacy-Preserving Analytics 9 Pseudo-random noise generation, inspired by differential privacy ● Entity id (creative/campaign/ campaign group/account) ● Demographic dimension ● Stat type (impressions, clicks) ● Time range ● Fixed secret seed Uniformly Random Fraction ● Cryptographic hash ● Normalize to (0,1) Random Noise Laplace Noise ● Fixed ε True count Reported count To satisfy consistency requirements ● Pseudo-random noise → same query has same result over time, avoid averaging attack. ● For non-canonical queries (e.g., time ranges, aggregate multiple entities) ○ Use the hierarchy and partition into canonical queries ○ Compute noise for each canonical queries and sum up the noisy counts
  • 10. System Architecture 10 Implemented and integrated into Ads Analytics product. Can be used for general analytics product.
  • 11. Performance Evaluation: Setup 11 ● Experiments using LinkedIn ad analytics data ○ Consider distribution of impression and click queries across (account, ad campaign) and demographic breakdowns. ● Examine ○ Tradeoff between privacy and utility ○ Effect of varying minimum threshold (non-negative) ○ Top-n queries
  • 12. Performance Evaluation: Results 12 Privacy and Utility Tradeoff ● For ε = 1, average absolute and signed errors are small for both queries. ● Variance is also small, ~95% of queries have error of at most 2. Top-N Queries ● Common use case in LinkedIn applications. ● Jaccard distance as a function of ε and n. ● (This shows the worst case as queries with return sets ≤ n and error=0 were omitted.)
  • 13. Lessons Learned 13 ● Lessons from privacy breaches → need “Privacy by Design” ● Consider business requirements and usability ○ Various consistency desiderata to ensure results useful and insightful ● Scaling across analytics applications ○ Abstract away application specifics, build libraries, and optimize for performance
  • 14. Acknowledgements ▹ Team: ▸ AI/ML: Krishnaram Kenthapadi, Thanh T. L. Tran ▸ Ad Analytics Product & Engineering: Mark Dietz, Taylor Greason, Ian Koeppe ▸ Legal / Security: Sara Harrington, Sharon Lee, Rohit Pitke ▹ Additional Acknowledgements ▸ Deepak Agarwal, Igor Perisic, Arun Swami, Ya Xu, Yang Zhou 14
  • 15. ▹ Framework to compute robust, privacy-preserving analytics ▸ Addressing challenges such as preserving member privacy, product coverage, utility, and data consistency ▹ Future ▸ Utility maximization problem given constraints on the ‘privacy loss budget’ per user ⬩ E.g., noise with larger variance to impressions but less noise to clicks (or conversions) ⬩ E.g., more noise to broader time range sub-queries and less noise to granular time range sub-queries ▹ Tech Report: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy- Preserving Analytics and Reporting at LinkedIn, ACM CIKM 2018 (https://arxiv.org/pdf/1809.07754) Summary 15
  • 16. What’s Next: Privacy for ML / Data Applications ▹ Hard open questions ▸ Can we simultaneously develop highly personalized models and ensure that the models do not encode private information of members? ▸ How do we guarantee member privacy over time without exhausting the “privacy loss budget”? ▸ How do we enable privacy-preserving mechanisms for data marketplaces? ▹ Thanks! 16
  • 18. Algorithm to Computing Noisy Analytics 18