Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Beauty of (Big) Data Privacy Engineering

Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy. In this talk, I will present our recent work on anonymization and privacy preserving analytics on large scale geo location datasets. In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.

  • Be the first to comment

  • Be the first to like this

The Beauty of (Big) Data Privacy Engineering

  1. 1. The Beauty of (Big) Data Privacy Engineering Yangcheng Huang Director of Software Engineering, Data & Analytics Truata
  2. 2. Who we are § Truata was founded in 2018, with investment by Mastercard and IBM § Our goal is to be the world’s leading provider of privacy- enhanced data analytics and management solutions § Based in Dublin, we have a team of 70 people with an R&D focus on developing cutting edge privacy enhancing technologies (PETs) § International client base across major industry verticals § Multiple EU regulators consulted on the Truata solution Truata Anonymization Service is a cutting-edge solution for GDPR-grade data anonymization & analytics, allowing companies to analyse and monetize customer data in fully-anonymized form. § Sophisticated and proprietary technologies for data anonymization and risk calibration § Able to generate fully anonymized data sets ideally suited for privacy- preserving analytics § Experienced in delivering large- scale anonymized data analytics projects § Able to drive significant value from data while maintaining customer trust § Delivered by our customer success team of data science and privacy experts § Fully focused on using privacy-centric techniques to generate value from data § Consulting solutions based on our proprietary methodologies, IP and expertise delivered by industry leading, subject matter experts.
  3. 3. A big-data privacy engineering problem • Geo privacy • Zip-level targeted advertising • Lat/Long GPS • Shapefile of zip codes • Using neighbouring zip’s shopping behaviour • Problems • Lat/Long mapping (generalisation of GPS information) & nearest 10 • (32m) customer’s Lat long mapping onto (1.7m) UK Zips • (1.7m) nearest 10 Zips out of 1.3m Zips (with Customer transactions) • A ‘Trillion’ Problem • 1,000,000,000,000 • Google processes 61.6 billion web pages today • Dublin Population 2019: 1,214,666 • Measure the similarity of any two Dubliners
  4. 4. Definition of beauty Definition of beauty : the quality of being physically attractive. : the qualities in a person or a thing that give pleasure to the senses or the mind. … Beauty | Definition of Beauty by Merriam-Webster https://www.merriam-webster.com › dictionary › beauty
  5. 5. Processing result is beautiful
  6. 6. Engineering (algorithm) is beautiful
  7. 7. Engineering (journey) is beautiful A combination of 99% passion and 1% skills (big data engineering, spatial engineering and software engineering)
  8. 8. Craftsmanship spirit is beautiful
  9. 9. Summary ▪ Beauty of big data privacy engineering ▪ Geo anonymization and geospatial analytics workloads with Spark ▪ Maximizing the performance by combining multi- dimensional spatial indexing with Spark in- memory computations ▪ Journey of productionizing the geo anonymization workloads ▪ Craftsmanship spirit ▪ Ongoing work ▪ Mobility trajectory anonymization (patents pending) ▪ Mobility pattern anonymization Contact Email: yangcheng.huang_AT_truata.com Linkedin: https://www.linkedin.com/in/yhuang www.truata.com
  10. 10. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

    Be the first to comment

Privacy engineering is an emerging discipline within the software and data engineering domains aiming to provide methodologies, tools, and techniques such that the engineered systems provide acceptable levels of privacy. In this talk, I will present our recent work on anonymization and privacy preserving analytics on large scale geo location datasets. In particular, the focus is on how to scale anonymization and geospatial analytics workloads with Spark, maximizing the performance by combining multi-dimensional spatial indexing with Spark in-memory computations.

Views

Total views

154

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×