Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

0

Share

Download to read offline

Webinar - Product Matching - Palombo (20160428)

Download to read offline

Presented by Alon Palombo

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Webinar - Product Matching - Palombo (20160428)

  1. 1. Dato Confidential1 Fraud Detection Webinar Alon Palombo Data Scientist alon@dato.com Product Matching Webinar
  2. 2. Dato Confidential2 Agenda • Who is Dato? • Data science workflow • What is product matching? • Demo using real public data • Questions
  3. 3. Dato Confidential3 Dato: We Intelligent Applications 45+ and growing fast!
  4. 4. Dato Confidential4 Customers
  5. 5. Dato Confidential Data Science workflow Ingest Transform Model Deploy Unstructured Data
  6. 6. Dato Confidential6 What is product matching? • In 2016, global e-commerce sales are expected to reach $1.92 Trillion. • Online retailers and price comparison sites curate product catalogues by aggregating from multiple sources. • Product matching is the task of keeping these catalogues free of duplicates, full of attributes per product, and consistent across different sites. 6
  7. 7. Dato Confidential Difficulty 7 Structured Attributes Reviews Images Description Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010. {Aggregate Multiple Sources
  8. 8. Dato Confidential Definition • Ironically, there are similar names for very similar problems: • Entity resolution • Record linking • De-duplication • Reference reconciliation • Data matching • and more… 8
  9. 9. Dato Confidential Definition • In GraphLab Create we distinguish between Record Linkage and De-duplication. • Record Linkage refers to matching structured query records to a fixed set of reference records with the same schema. • De-duplication refers to assigning an entity label to each row. Records with the same label are likely correspond to the same real-world entity. 9
  10. 10. Dato Confidential Product matching demo – using real public data
  11. 11. Dato Confidential11 Summary • Product matching is at the heart of e-commerce. • Many relevant similar problems with similar solutions. • Easy exploration, modeling, and evaluation using GraphLab Create.
  12. 12. Dato Confidential12 Our machine learning course https://www.coursera.org/learn/ml-foundations
  13. 13. Dato Confidential Questions? alon@dato.com

Presented by Alon Palombo

Views

Total views

319

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

15

Shares

0

Comments

0

Likes

0

×