You are experiencing the benefits of machine learning everyday through product recommendations on Amazon & Bol.com, credit card fraud prevention, etc… So how can we leverage machine learning together with SharePoint and Yammer. We will first look into the fundamentals of machine learning and big data solutions and next we will explore how we can combine tools such as Windows Azure HDInsight, R, Azure Machine Learning to extend and support collaboration and content management scenarios within your organization.
4. Agenda
Introduction to Delve
Office Graph
Big Data and Machine Learning
Building your own Delve - architectural concept
5. Agenda
Introduction to Delve
Office Graph
Big Data and Machine Learning
Building your own Delve - architectural concept
6. Stay In the Know Find What you Need Discover New Connections
Connect with the right experts and
learn more about their content.
Find just the right results from any
source and take action
Discover new information tailored
to you from your network
Delve – Search and Discovery Across O365
Powered by Office Graph
7.
8. Agenda
Introduction to Delve
Office Graph
Big Data and Machine Learning
Building your own Delve - architectural concept
9. What is The Office Graph?
User Documents People Conversations
10. What is The Office Graph?
Manager
Direct report
Works with
Shared with me
Viewed by me
Trending around me
Presented to me
Liked by me
13. Signals sent from Delve, Exchange, O365, …
Click person
Modify/Save
Elevate
Share
Follow
Like
Comments
Email
Ignore
Presented to
Shown document
Open document
Shown board
++
14. Content and signals across O365 auto-
populating the Office Graph insights
Insights derived with machine learning for proactive and intelligent experiences
15. Agenda
Introduction to Delve
Office Graph
Big Data and Machine Learning
Building your own Delve - architectural concept
16. Big data is what
happened
when the cost
of storing user data
became cheaper
than making the
decision
to throw it away
17. Transactions + Interactions +
Observations = Big Data
Megabytes
Gigabytes
Terabytes
Petabytes
Purchase detail
Purchase record
Payment record
ERP
CRM
WEB
Offer details
Support Contacts
Customer Touches
Segmentation
Web logs
Offer history
A/B testing
Dynamic Pricing
Affiliate Networks
Search Marketing
Behavioral Targeting
Dynamic Funnels
User Generated Content
Mobile Web
SMS/MMSSentiment
External Demographics
HD Video, Audio, Images
Speech to Text
Product/Service Logs
Social Interactions & Feeds
Business Data Feeds
User Click Stream
Sensors / RFID / Devices
Spatial & GPS Coordinates
Increasing Data Variety and Complexity
18. Big Data Core Technology landscape
• New paradigm for
storing data
• 100+ Non-SQL DB’s
and growing
• Support SQL querying
• Internal architecture
different from classic DBs
• Appliances
• Teradata
• Microsoft
PDW/APS
• Oracle BDA X4-2
• Hadoop/HDFS+
MapReduce
• Key Big Data
technology
Hadoop MPP
NoSQLNewSQL
19. Modern Data Architecture
• Apache Hadoop is an open source
framework that supports data-
intensive distributed applications
Uses HDFS storage to enable
applications to work with 1000s of
nodes and petabytes of data using a
scale-out model
Uses MapReduce to process data
Inspired by Google
MapReduce
Google File System
Related projects:
HBase, Hive, Mahout, Pig,Sqoop,
Ambari, Storm, Zookeeper, ... And
many more
22. Microsoft Azure HDInsight
Support HBase as NoSQL columnar
database on Azure Blobs
Support Storm as stream processing
Hadoop in Azure
Data Node Data Node Data Node Data Node
Task Tracker Task Tracker Task Tracker Task Tracker
Name Node
Job Tracker
HMaster
Coordination
Region Server Region Server Region Server Region Server
Able to leverage Azure Blob Storage
Pay per use model
Based on Hortonworks Data Platform
23.
24. Hive
• Hadoop feature to perform data warehouse
operations
• HiveQL
High-level, SQL-like language, abstraction over MapReduce
Supports equi-joins
Schema on read NOT schema on write
Automatically invokes MapReduce jobs
Much simpler than using MapReduce directly
• Metadata store
Contains descriptions of tables
• Acts as a bridge to many BI products which expect
tabular data
28. Machine learning
finding the needle in the haystack
• Formal definition: “A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P,
improves with experience E” - Tom M. Mitchell
• Another definition: “The goal of machine learning is to program
computers to use example data or past experience to solve a given
problem.” – Introduction to Machine Learning, 2nd Edition, MIT Press
• ML often involves two primary techniques:
– Supervised Learning: Finding the mapping between inputs and outputs using
correct values to “train” a model
– Unsupervised Learning: Finding patterns in the input data (similar to Density
Estimates in Statistics)
29. Vision Analytics
Recommendation
engines
Advertising analysis
Weather forecasting for
business planning
Social network analysis
Legal
discovery and document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Equipment monitoring
Location-based tracking
and services
Personalized Insurance
32. Typical machine learning algorithms
• Clustering (k-means, orthogonal partitioning,…)
• Association rule learning ( A priori)
• Regression (linear/logistic)
• Recommendation engines
• Classification (C4.5, decision trees, SVM, Naïve Bayes, AdaBoost, Random Forest, …)
• Similarity matching
• Neural networks
• Bayesian networks
• Genetic algorithms
• Ensembles
See http://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/
And http://www.cs.umd.edu/~samir/498/10Algorithms-08.pdf and
http://www.quora.com/What-are-the-top-10-data-mining-or-machine-learning-algorithms
33. Doing recommendations – some approaches
• Collaborative filtering
• Feature based recommendations
• K-nearest neighbours
34. Collaborative filtering
• A set of items
(books, beers,
blogposts,…)
• Ratings from users
• Recommended
items based on
your ratings and
other people’s
ratings
35. Feature based recommendations
• Use user’s ratings of items
Create an algorithm to define
which features (metadata ) of
items the user likes
• Requires detailed
information about items -
content based
An item can be a person as well –
see “People you may know”
• Most approaches combine
“feature based” and
“collaborative filtering”
36. K-Nearest Neighbours (Classification approach)
• Find ratings from people similar
to you and see what they liked
Use similarity functions (Minkowski
distance, RMSE, Pearson Correlation
Coefficient,…)
• Take the average ratings of the k
people most similar to you
Display the items with the highest
averages
• Conclusion – requires solid
background in Math and
Statistics
37. Machine Learning and Data Scientists
Developing predictive analytics and
machine learning must be simpler,
today it requires specialized skills:
• Data management
• Data exploration
• Math & statistics
• Domain expertise
• Machine learning
• Software development
• Data visualization
65% of enterprise feel they have a
strategic shortage of data scientists, a
role many did not know existed 12
months ago …
39. Microsoft Azure Machine Learning (Ctd.)
Personalized Workspace
Combine R modules with Microsoft’s
best in class algorithms running Xbox
and Bing
Work with anyone, anywhere by simply
sharing the workspace
Easy Access to All Data
Drop in desktop data sets into the
built-in storage space.
Bring in cloud data with the ease of a
drop down
Deploy Models as Web Services
Operationalize in minutes and refine
models at the speed of the market
Partner Tools
ML partners enjoy SDK access for
robust solutions
Microsoft Azure
Machine Learning Studio
Microsoft Azure
Machine Learning API service
Microsoft Azure
Machine Learning SDK
40.
41. Agenda
Introduction to Delve
Office Graph
Big Data and Machine Learning
Building your own Delve - architectural concept
42. E vent producers
Web logs
Documents &
metadata
Transform Long-term
storage
Azure SQL
Database & Azure
Storage
Predictive
Analytics
Azure
Machine
Learning
Presentation
and action
On premise
Building your own Delve - high level architecture
43. Building your own Delve – remarks
• Graph technology left out for simplicity
Take a look at Neo4J or Pegasus on Hadoop if you are interested
• Not very realistic to rebuild Delve but possible to
define point solutions
• If you still go ahead
Think about the end-to-end data pipeline
Fast track with Recommendation API in datamarket
http://datamarket.azure.com/dataset/amla/recommendations
Cache recommendations for performance and cost optimization
Learn R or Python to extend AzureML capabilities