SlideShare a Scribd company logo
1 of 34
The Science Behind Data Science
Presented at Big Data for Decision Makers
Ruhollah Farchtchi – Director of Big Data
December 5, 2013
Agenda
• Introductions
• Big Data Analytics Overview

• Use Cases – Examples of Data Products
• Building Blocks
• Data Mining

• Technologies
• Operational Models

© 2013 Unisys Corporation. All rights reserved.

2
So we’ve got a lot of data…
• What can we get out of it?
• How does it help with our business decision making?
• How is this complex landscape changing?
Column 1

Column 2

Column 3

Column 4

Multiple
Types
Multiple
Sources

Pictures

Column 5

1-A

2-A

3-A

4-A

5-A

1-B

2-B

3-B

4-B

5-B

1-C

2-C

3-C

4-C

5-C

1-D

2-D

3-D

4-D

5-D

1-E

2-E

3-E

4-E

5-E

1-F

Tabular /
Structured

My Documents

2-F

3-F

4-F

5-F

Documents

Unstructured

Emails
Video

Sensors, Networks, C
yber Infrastructure

Web, Email, Social Media Enterprise Applications

Mobile Devices, GPS, and
many more!

Multiple
Domains

Defense

Health

Finance

Other

• Logistics / Workforce
analytics
• Cyber and EW
• Intelligence Analysis

• Drug Discovery
• EHR
• Epidemic/pandemic
prediction

• Fraud Detection
• Identity Resolution
• Customer Support

• Supply/Demand
Forecasting
• MTTB Prediction
• Context-based IR

© 2013 Unisys Corporation. All rights reserved.

3
Source: http://www.ongridventures.com/wp-content/uploads/2012/10/Big-Data-Landscape.jpg

And we’ve got a lot of tools…

© 2013 Unisys Corporation. All rights reserved.

4
Big Data and Data Analytics – A Unisys Point of View
• Unisys Point of View: Today’s big data is tomorrow’s normal data
– What remains is the need to extract insights and value out of the data

• Data Analytics is often the goal or end-product of what organizations
what to get out of their data (Big or otherwise)
– Focused around the capabilities of:
• Efficient Data Processing – get data in and processed in time to make use of it and
in a tenable manner
• Effective Information Management – ability to make the data accessible and to
manage the downstream data products as assets
• and Expressive Analytics – make sense of the data in a format that is easily
digested and incorporated into decision making i.e., if you need a PhD to interpret the
results, you still have work to do here

– With the aim to increase business value

• It’s about understanding the data and what you can get out of it
– ―…40% of business leaders had no response when asked what types of
information would transform their industries over the next 10 years.‖1
1. Anne Lapkin, 2012. Hype Cycle for Big Data, 2012, Gartner.

© 2013 Unisys Corporation. All rights reserved.

5
Backward-looking
(Forensic)

Modeling and
Forecasting
Pattern
Recognition

Scale-out

Linear
Programming

Data
Analytics

Global
Optimization Classification
Machine Learning
Simulation

Business
Intelligence & Data
Warehousing
STAR
Schema
OLAP
RDBMS

SQL

ETL

Leverage for
large-scale
analytics and data
mining

Extend

Complexity

Forward-looking
(Predictive)

Data Analytics is the culmination of Analytics and IT

Big Data & NoSQL
Hadoop

Google
BigTable

Map/Reduce
Splunk Dynamo
Hive
MongoDB
Cassandra EMC
Greenplum
HBase

Leverage for largescale application
development &
information
management

Multi-TB Turning Point

Low
Volume, Variety, Velocity

Data Volume

High
Volume, Variety, Velocity

Data Analytics is at the intersection of high volume data processing and advanced analysis. The tools
and methodologies here represent a mix of both worlds and there is currently no ‘killer app’.
© 2013 Unisys Corporation. All rights reserved.

6
Challenges

Misaligned IT, Analytics, and
Business Strategies

Ineffective Data Management
Strategy

Ineffective/inefficient storage and
security platforms

In-accessible or siloed analytics
(―Cylinders of Excellence‖)

Untrusted analytic products or
analytics that are not
timely, accurate, or repeatable
(untested)

Inability to scale analytic
generation (lack of training)

© 2013 Unisys Corporation. All rights reserved.

7
Analytic Environment That Supports Data
Processing, Enhances Information Management and
Improve Decision Making
Data Products

Building Analytic Environment
1.

2.
3.

4.

5.

6.

7.

8.

Work with business leaders
and decision makers to
understand and quantify data
value chain
View data as an enterprise
asset
Innovate through creation of
new data products and
services
Retrain staff and/or acquire
Data Scientist skills
Integrate teams across big
data, data warehousing, and
business analysis
Revise information
management strategies to
incorporate big data
Develop new ways of capturing
information e.g., mobile and
streaming data
Identify and leverage
previously unused internal and
external data

Analyst
Focused

IT Focused

Raw Data
© 2013 Unisys Corporation. All rights reserved.

8
Creation of data products is key to analytic reuse
• What are Data Products?
– Essentially this the output of a data science or data mining activity
– Non-trivial; more than a simple query
– Requires a platform for processing

• They can manifest themselves as many things
– Analytical "engines" running in a larger application (Amazon's
recommender engine is a great Data Product)
– Lists (e.g., Top 10 things I need to know today)
– Entire applications (e.g., customer baseball cards)

• However once they are defined, one thing is true for all
– It takes a combination of domain agnostic analytic techniques
together with domain specific knowledge to produce something
relevant and consumable that can be monetized or operationalized.
© 2013 Unisys Corporation. All rights reserved.

9
Examples of Data Products
Use Case #1- Netflix Recommendation
•

Netflix is about connecting people to the movies they love by leveraging their movie
recommendation system: CinematchSM

•

CinematchSM initially was a linear model that helped to predict the users choices

•

The predictions are used to make personal movie recommendations based on a customers unique
tastes
–

Challenge: Can the recommendation engine be improved upon?

–

Resolution: Set the improvement accuracy level(10%) and create a contest with a $1 million prize

•

Crowdsourcing: Teams merged together for an internet enabled approach and improve results

•

Netflix provided a training dataset of 100+ million ratings that 480,000 users gave to 17K movies and
contained the quadruplet of the form (user, movie, date of grade , grade)
–
–
–
–
–

Goal is to predict grade
Example of Supervised Machine Learning
Submitted predictions are scored against the true grades in terms of Root Mean Squared Error (RMSE)
RSME is a frequently used measure of the difference between values predicted by a model and the values
observed(i.e. residuals)
Similarity is determined by a distance measure such as Jaccard or Cosine distance

Source; Netflixprize.com and Mining Massive Datasets by Anand Rajaraman and Jeffry Ullman

© 2013 Unisys Corporation. All rights reserved.

11
Use Case #2- Google PageRank
•

Google wanted to be able to measure and rank the importance of Web Pages.
–

Challenge: Identify and rank the pages that a users would want to view in terms of their relevance?

–

Resolution: Develop an algorithm that leverages link analysis and implement it as part of Google’s infrastructure

•

The PageRank algorithm considers a webpage to be important if many other webpages point to it.
The linking webpages that point to a given page aren’t treated equally

•

The algorithm takes into account both the importance (PageRank) of the linking pages and the number
of outgoing links it has – Similar to Social Network Analysis

•

Linking pages with higher PageRank are given more weight while pages with more outgoing links are
given less weight.

•

Example of Un-Supervised Machine Learning
0 0 1 0
1 0 0 0

Link Matrix=

1 1 0 1
0 0 0 0
Page 2

Page 1

Page 3

Page 4

Source; The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman

© 2013 Unisys Corporation. All rights reserved.

12
Use Case #3- Walmart Data Driven Value Chain
•

Walmart is the leading and largest retailer in the world.

•

Walmart has been a catalyst for technology adoption amongst its suppliers including
requiring partners to leverage RFID technology to track and coordinate inventories.

•

They have a great cross section of data from individual Social Security
Information, Geographic detail and product purchases

•

They utilize econometric and marketing mix modeling (multiplicative, log-log, power
additive, adstocks, lags and powers) for a number of their key analyses

•

Walmart mines their data to get their product mix correct under different and changing
environment conditions.
–
–

•

Challenge: Identify the correct product mix in order to protect the firm from too much or not enough inventory
Resolution: Mine their multiple data sources for data products that will help tighten and improve operational
forecasts

For impending hurricane warnings, Walmart found that:

Sales

–

Pop Tarts increase in sales(7 times their normal rate)

–

Identified that the top selling premium item was beer

–

Allows the firm to get the supply to the store ahead of time

GAs = a + b(TV)
GAs = a + b(TV)G

Item(Beer, Pop Tarts)

Source; What Walmart Knows about Customer Habits: New York Times

© 2013 Unisys Corporation. All rights reserved.

13
Use Case #4- Amazon Targeted Marketing
•

Amazon is the worlds largest online retailer and known for their e-commerce Web Site where they use
input about a customer’s interest to generate a list of recommendation.

•

Similar to Netflix they use recommendation algorithms but they do targeted marketing for items that a
customer would want to buy based on their previous purchase patterns

•

The recommendation algorithms personalize the online store for each customer and radically changes
based on the customers interest
–

Challenge(s): Analyze massive amounts of data, submit results realtime, new customers have very little data
and customer data is very volatile

–

Resolution: Cluster modeling, search based methods and Item to Item Collaborative filtering

•

Cluster Modeling: Identify customers similar to the user by dividing the customer base into segments
and treat the task as a classification problem. Typically uses a unsupervised learning algorithm such
as K-Means or Hierarchical

•

Search Based Methods: Treats the recommendations problem as a search for related items. Given a
users purchases and rated items, the algorithm constructs a search query to find other popular items
by the same author, artist or director with similar keywords

•

Item to Item Collaborative Filtering: Customized algorithm that is able to scale to massive data sets
and produces high quality recommendations in real time. This algorithm matches each of the users
purchased and rated items to similar items and then combines those similar items into a
recommendation list. Offline and Online components to increase performance
Source; Amazon.com Recommendations: Item to Item Collaborative Filtering. Greg Linden, Brenth Smith and Jeremy York

© 2013 Unisys Corporation. All rights reserved.

14
Unisys Big Data Analytics
Building Blocks
Big Data Analytics Methodology

Modeling Components
Decision Making &
Forecasting
• Provide actionable intelligence into the future state

Models
•

Statistical model applied to input data that separates the portion of volume due to each of the variables or
factors. We use the term model, because it is a simplification of reality.

Data
Internal Data

Demographic Data
Demographic Data

3rd Party Data

© 2013 Unisys Corporation. All rights reserved.

16
Data Mining
Data Mining - Motivations

• We’ve covered big data
– There’s a lot of it!

• New Modus Operandi
– Gather whatever data you can, whenever and where ever possible

• New Expectation
– Data gathered will have value; either for the purpose it was
collected or for a purpose not yet envisioned

• Challenge: There will never be enough analysts to sift
through it all
© 2013 Unisys Corporation. All rights reserved.

18
Data Mining Definitions
• Non-trivial extraction of implicit, previously unknown and potentially
useful information from data (normally large databases)
• Exploration & analysis, by automatic or semiautomatic means, of large
quantities of data in order to discover meaningful patterns.
• Part of the Knowledge Discovery in Databases Process.

Source: http://liris.cnrs.fr/abstract/abstract.html

© 2013 Unisys Corporation. All rights reserved.

19
Data Mining Tasks
Prediction Methods: Use some
variables to predict unknown or future
values of other variables

Description Methods: Find human
interpretable patterns that describe the
data.

• Classification

• Clustering

–

For a given set of attributes apply a
model for the class (what you want to
predict) as a function of the attributes

–

•

• Regression
–

Predict a value of a given continuous
valued variable based on the values of
other variables, assuming a linear or
nonlinear model of dependency

•

Data points in one cluster are more similar to one
another
Data points in separate clusters are less similar to
one another

• Association Rule Discovery
–

• Deviation Detection
–

Given a set of data points, each having a
set of attributes, and a similarity measure
among them, find clusters such that:

Given a set of records each of which
contain some number of items from a
given collection:
•

Detect significant deviations from
normal behavior

Produce dependency rules which will predict
occurrence of an item based on occurrences of other
items.

• Sequential Pattern Discovery
–

Given a set of sequences and support
threshold, find the complete set of
frequent subsequences
© 2013 Unisys Corporation. All rights reserved.

20
Classification - Example

Tax Fraud
Refund

Marital
Status

Taxable
Income

Cheat

Yes

Single

125k

?

Tid

Refund

Marital
Status

Taxable
Income

Cheat

No

Married

100k

?

1

Yes

Single

125k

No

No

Single

70k

?

2

No

Married

100k

No

Yes

Married

120k

?

3

No

Single

70k

No

4

Yes

Married

120k

No

5

No

Divorced

95k

Yes

6

No

Married

60k

No

7

Yes

Divorced

220k

No

8

No

Single

85k

Yes

9

No

Married

75k

No

10

No

Single

90k

Yes

Training Data Set

Test Data Set

Learn
Classifier

Model
Model
Model

© 2013 Unisys Corporation. All rights reserved.

21
Classification – Your Turn

• Fraud Detection
• Goal: Predict fraudulent cases in credit card transactions.
• Approach:
–
–
–
–

What kind of data will you try to get ?
Can you say something about the characteristics of the data?
Estimate the size of the data.
What kind of pitfalls you might run into ?

© 2013 Unisys Corporation. All rights reserved.

22
Fraud Detection

• Fraud Detection
• Goal: Predict fraudulent cases in credit card transactions.
• Approach:
– Use credit card transactions and the information on its
accountholder as attributes.
– When does a customer buy, what does he buy, how often he pays
on time, etc
– Label past transactions as fraud or fair transactions. This forms the
class attribute.
– Learn a model for the class of the transactions.
– Use this model to detect fraud by observing credit card transactions
on an account.

© 2013 Unisys Corporation. All rights reserved.

23
Clustering - Example

• Document Clustering:
– Goal: To find groups of documents that are similar to each other
based on the important terms appearing in them.
– Approach: To identify frequently occurring terms in each document.
Form a similarity measure based on the frequencies of different
terms. Use it to cluster.
– Gain: Search tools can utilize the clusters to relate a new document
or search term to clustered documents.
• Clustering Points: 3204 Articles of
Los Angeles Times.
• Similarity Measure: How many
words are common in these
documents (after some word
filtering).

© 2013 Unisys Corporation. All rights reserved.

24
Clustering - Illustration

Seems strait-forward for a small number of dimensions…
what if there were more?
© 2013 Unisys Corporation. All rights reserved.

25
Clustering - Illustration

Source: http://salsahpc.indiana.edu/plotviz

We [human beings] have a limited ability to visualize and reason over a large
number of dimensions – clustering helps
© 2013 Unisys Corporation. All rights reserved.

26
Association Rules

• Classic Association Rule Example:
– If a customer buys diaper and milk, then he is very likely to buy
beer.

• Applications: Supermarket shelf management.
– Goal: To identify items that are bought together by sufficiently many
customers.
– Approach: Process the point-of-sale data collected with barcode
scanners to find dependencies among items.

© 2013 Unisys Corporation. All rights reserved.

27
Technologies
Hadoop -- So what is Hadoop, Really?

- Dilbert
It’s just a framework
© 2013 Unisys Corporation. All rights reserved.

29
Hadoop and MapReduce

 Hadoop is an open-source framework
(written in Java) to store and process gobs
of data across many commodity
computers
 Hadoop is designed to solve a different
problem: the fast, reliable analysis of both
structured, unstructured and complex
data.

 Hadoop and related software are designed
for 3V’s: (1) Volume – Commodity
hardware and open source software
lowers cost and increases capacity;
(2) Velocity – Data ingest speed aided by
append-only and schema-on-read design;
and (3) Variety – Multiple tools to
structure, process, and access

 Hadoop consists of two
elements: reliable very large, low-cost
data storage using the Hadoop
Distributed File System (HDFS) and
high-performance parallel/distributed
data processing framework called
MapReduce.
 HDFS is self-healing high-bandwidth
clustered storage. Map-Reduce is
essentially fault tolerant distributed
computing.
© 2013 Unisys Corporation. All rights reserved.

30
The Hadoop Stack
• Hadoop runs on a
collection/cluster of
commodity, sharednothing x86 servers.
• You can add or remove
servers in a Hadoop cluster
(sizes from 50, 100 to even
2000+ nodes) at will; the
The four primary areas where to use Hadoop:
system detects and
1) To aggregate ―data exhaust‖ —
compensates for hardware or
system problems on any server. messages, posts, blog entries, photos, video
clips, maps, web graph….
• Hadoop is self-healing. It can 2) To give data context — friends networks, social
graphs, recommendations, collaborative filtering….
deliver data — and can run
3) To keep apps running — web logs, system
large-scale, high-performance
logs, system metrics, database query logs….
processing batch jobs — in
4) To deliver novel mashup services – mobile
spite of system changes or
location data, clickstream data, SKUs, pricing…..
failures.
© 2013 Unisys Corporation. All rights reserved.

31
Operational Models
Data Products Become the Drivers to Identify new
Insights, Cost Savings and Increase Efficiencies

Your Customers

Feedback

• Decreased time to
analytics
• Reuse of analytics
tools
• Focus on analytic vs.
IT integration

Internal Data Sets

Data Analytics Environment
Knowledge Repository
Populate

Analytics Engine

• More self-service
• Incorporation of
external data
• Ability to scale to
analytic needs
• Supports analytics
lifecycle

External Data Sets

© 2013 Unisys Corporation. All rights reserved.

33
Thank you

© 2013 Unisys Corporation. All rights reserved.

34

More Related Content

What's hot

Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engineDaosheng Mu
 
Web Analytics 2.0 and Multiplicity - PixelMEDIA
Web Analytics 2.0 and Multiplicity - PixelMEDIAWeb Analytics 2.0 and Multiplicity - PixelMEDIA
Web Analytics 2.0 and Multiplicity - PixelMEDIAPixelMEDIA
 
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...Laura Slingo
 
Recruitment Funnel And Application Source Dashboard
Recruitment Funnel And Application Source DashboardRecruitment Funnel And Application Source Dashboard
Recruitment Funnel And Application Source DashboardSlideTeam
 
Database versioning in golang
Database versioning in golangDatabase versioning in golang
Database versioning in golangThuc Le Dong
 
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | Edureka
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | EdurekaPower BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | Edureka
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | EdurekaEdureka!
 
How to Develop Successful SEO Reports #SEOKomm
How to Develop Successful SEO Reports #SEOKommHow to Develop Successful SEO Reports #SEOKomm
How to Develop Successful SEO Reports #SEOKommAleyda Solís
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and ProgrammingSumit Jain
 
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympus
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympusThe Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympus
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympusAleyda Solís
 

What's hot (10)

Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Design your 3d game engine
Design your 3d game engineDesign your 3d game engine
Design your 3d game engine
 
Web Analytics 2.0 and Multiplicity - PixelMEDIA
Web Analytics 2.0 and Multiplicity - PixelMEDIAWeb Analytics 2.0 and Multiplicity - PixelMEDIA
Web Analytics 2.0 and Multiplicity - PixelMEDIA
 
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...
How To Build Links To Product Pages Without Looking Like A Spammer | Brighton...
 
Recruitment Funnel And Application Source Dashboard
Recruitment Funnel And Application Source DashboardRecruitment Funnel And Application Source Dashboard
Recruitment Funnel And Application Source Dashboard
 
Database versioning in golang
Database versioning in golangDatabase versioning in golang
Database versioning in golang
 
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | Edureka
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | EdurekaPower BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | Edureka
Power BI Dashboard | Microsoft Power BI Tutorial | Data Visualization | Edureka
 
How to Develop Successful SEO Reports #SEOKomm
How to Develop Successful SEO Reports #SEOKommHow to Develop Successful SEO Reports #SEOKomm
How to Develop Successful SEO Reports #SEOKomm
 
Game Architecture and Programming
Game Architecture and ProgrammingGame Architecture and Programming
Game Architecture and Programming
 
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympus
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympusThe Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympus
The Ultimate SEO Guide for Successful Web Migrations at #DigitalOlympus
 

Viewers also liked

Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesMurphy Choy
 
Brisbane Health-y Data: Legislation, Ethics and Governance
Brisbane Health-y Data: Legislation, Ethics and GovernanceBrisbane Health-y Data: Legislation, Ethics and Governance
Brisbane Health-y Data: Legislation, Ethics and GovernanceARDC
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Dublinked .
 
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...spirecorporate
 
Big Data Day LA 2016/ Data Science Track - Data Science + Hollywood, Todd Ho...
Big Data Day LA 2016/ Data Science Track -  Data Science + Hollywood, Todd Ho...Big Data Day LA 2016/ Data Science Track -  Data Science + Hollywood, Todd Ho...
Big Data Day LA 2016/ Data Science Track - Data Science + Hollywood, Todd Ho...Data Con LA
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceSkillspeed
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data AnalyticsDatameer
 
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Amazon Web Services
 
All you wanted to know about analytics in e commerce- amazon, ebay, flipkart
All you wanted to know about analytics in e commerce- amazon, ebay, flipkartAll you wanted to know about analytics in e commerce- amazon, ebay, flipkart
All you wanted to know about analytics in e commerce- amazon, ebay, flipkartAnju Gothwal
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Web Services
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-CommerceDivante
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 

Viewers also liked (20)

Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance Companies
 
Brisbane Health-y Data: Legislation, Ethics and Governance
Brisbane Health-y Data: Legislation, Ethics and GovernanceBrisbane Health-y Data: Legislation, Ethics and Governance
Brisbane Health-y Data: Legislation, Ethics and Governance
 
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
Advanced Data Analytics and Open Data - Dr Ingo Keck of CeADAR - Dublinked Da...
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Big data
Big dataBig data
Big data
 
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...
121010_Mobile Banking & Payments for Emerging Asia Summit 2012_Monitise: Mobi...
 
SMAC -IoT Technology
SMAC -IoT TechnologySMAC -IoT Technology
SMAC -IoT Technology
 
Big Data Day LA 2016/ Data Science Track - Data Science + Hollywood, Todd Ho...
Big Data Day LA 2016/ Data Science Track -  Data Science + Hollywood, Todd Ho...Big Data Day LA 2016/ Data Science Track -  Data Science + Hollywood, Todd Ho...
Big Data Day LA 2016/ Data Science Track - Data Science + Hollywood, Todd Ho...
 
BIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-CommerceBIG Data & Hadoop Applications in E-Commerce
BIG Data & Hadoop Applications in E-Commerce
 
Modus operandi
Modus operandiModus operandi
Modus operandi
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Extending BI with Big Data Analytics
Extending BI with Big Data AnalyticsExtending BI with Big Data Analytics
Extending BI with Big Data Analytics
 
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
Announcing Amazon Rekognition - Deep Learning-Based Image Analysis - December...
 
All you wanted to know about analytics in e commerce- amazon, ebay, flipkart
All you wanted to know about analytics in e commerce- amazon, ebay, flipkartAll you wanted to know about analytics in e commerce- amazon, ebay, flipkart
All you wanted to know about analytics in e commerce- amazon, ebay, flipkart
 
Amazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer ChurnAmazon Machine Learning Case Study: Predicting Customer Churn
Amazon Machine Learning Case Study: Predicting Customer Churn
 
Big Data in e-Commerce
Big Data in e-CommerceBig Data in e-Commerce
Big Data in e-Commerce
 
Ch15 software reuse
Ch15 software reuseCh15 software reuse
Ch15 software reuse
 
Netflix case study
Netflix case studyNetflix case study
Netflix case study
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 

Similar to Big data analytics presented at meetup big data for decision makers

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptxNamrataBhatt8
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraMolly Alexander
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxRATISHKUMAR32
 
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient..."Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...Dataconomy Media
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxmuflehaljarrah
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Papershashanksalunkhe12
 
Data-Ed Webinar: Data Warehouse Strategies
Data-Ed Webinar: Data Warehouse StrategiesData-Ed Webinar: Data Warehouse Strategies
Data-Ed Webinar: Data Warehouse StrategiesDATAVERSITY
 

Similar to Big data analytics presented at meetup big data for decision makers (20)

02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 
data analytics lecture2.pptx
data analytics lecture2.pptxdata analytics lecture2.pptx
data analytics lecture2.pptx
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptxLecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
 
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient..."Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
"Hadoop: What we've learned in 5 years", Martin Oberhuber, Senior Data Scient...
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Implementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White PaperImplementing Data Mesh WP LTIMindtree White Paper
Implementing Data Mesh WP LTIMindtree White Paper
 
Data-Ed Webinar: Data Warehouse Strategies
Data-Ed Webinar: Data Warehouse StrategiesData-Ed Webinar: Data Warehouse Strategies
Data-Ed Webinar: Data Warehouse Strategies
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

Big data analytics presented at meetup big data for decision makers

  • 1. The Science Behind Data Science Presented at Big Data for Decision Makers Ruhollah Farchtchi – Director of Big Data December 5, 2013
  • 2. Agenda • Introductions • Big Data Analytics Overview • Use Cases – Examples of Data Products • Building Blocks • Data Mining • Technologies • Operational Models © 2013 Unisys Corporation. All rights reserved. 2
  • 3. So we’ve got a lot of data… • What can we get out of it? • How does it help with our business decision making? • How is this complex landscape changing? Column 1 Column 2 Column 3 Column 4 Multiple Types Multiple Sources Pictures Column 5 1-A 2-A 3-A 4-A 5-A 1-B 2-B 3-B 4-B 5-B 1-C 2-C 3-C 4-C 5-C 1-D 2-D 3-D 4-D 5-D 1-E 2-E 3-E 4-E 5-E 1-F Tabular / Structured My Documents 2-F 3-F 4-F 5-F Documents Unstructured Emails Video Sensors, Networks, C yber Infrastructure Web, Email, Social Media Enterprise Applications Mobile Devices, GPS, and many more! Multiple Domains Defense Health Finance Other • Logistics / Workforce analytics • Cyber and EW • Intelligence Analysis • Drug Discovery • EHR • Epidemic/pandemic prediction • Fraud Detection • Identity Resolution • Customer Support • Supply/Demand Forecasting • MTTB Prediction • Context-based IR © 2013 Unisys Corporation. All rights reserved. 3
  • 4. Source: http://www.ongridventures.com/wp-content/uploads/2012/10/Big-Data-Landscape.jpg And we’ve got a lot of tools… © 2013 Unisys Corporation. All rights reserved. 4
  • 5. Big Data and Data Analytics – A Unisys Point of View • Unisys Point of View: Today’s big data is tomorrow’s normal data – What remains is the need to extract insights and value out of the data • Data Analytics is often the goal or end-product of what organizations what to get out of their data (Big or otherwise) – Focused around the capabilities of: • Efficient Data Processing – get data in and processed in time to make use of it and in a tenable manner • Effective Information Management – ability to make the data accessible and to manage the downstream data products as assets • and Expressive Analytics – make sense of the data in a format that is easily digested and incorporated into decision making i.e., if you need a PhD to interpret the results, you still have work to do here – With the aim to increase business value • It’s about understanding the data and what you can get out of it – ―…40% of business leaders had no response when asked what types of information would transform their industries over the next 10 years.‖1 1. Anne Lapkin, 2012. Hype Cycle for Big Data, 2012, Gartner. © 2013 Unisys Corporation. All rights reserved. 5
  • 6. Backward-looking (Forensic) Modeling and Forecasting Pattern Recognition Scale-out Linear Programming Data Analytics Global Optimization Classification Machine Learning Simulation Business Intelligence & Data Warehousing STAR Schema OLAP RDBMS SQL ETL Leverage for large-scale analytics and data mining Extend Complexity Forward-looking (Predictive) Data Analytics is the culmination of Analytics and IT Big Data & NoSQL Hadoop Google BigTable Map/Reduce Splunk Dynamo Hive MongoDB Cassandra EMC Greenplum HBase Leverage for largescale application development & information management Multi-TB Turning Point Low Volume, Variety, Velocity Data Volume High Volume, Variety, Velocity Data Analytics is at the intersection of high volume data processing and advanced analysis. The tools and methodologies here represent a mix of both worlds and there is currently no ‘killer app’. © 2013 Unisys Corporation. All rights reserved. 6
  • 7. Challenges Misaligned IT, Analytics, and Business Strategies Ineffective Data Management Strategy Ineffective/inefficient storage and security platforms In-accessible or siloed analytics (―Cylinders of Excellence‖) Untrusted analytic products or analytics that are not timely, accurate, or repeatable (untested) Inability to scale analytic generation (lack of training) © 2013 Unisys Corporation. All rights reserved. 7
  • 8. Analytic Environment That Supports Data Processing, Enhances Information Management and Improve Decision Making Data Products Building Analytic Environment 1. 2. 3. 4. 5. 6. 7. 8. Work with business leaders and decision makers to understand and quantify data value chain View data as an enterprise asset Innovate through creation of new data products and services Retrain staff and/or acquire Data Scientist skills Integrate teams across big data, data warehousing, and business analysis Revise information management strategies to incorporate big data Develop new ways of capturing information e.g., mobile and streaming data Identify and leverage previously unused internal and external data Analyst Focused IT Focused Raw Data © 2013 Unisys Corporation. All rights reserved. 8
  • 9. Creation of data products is key to analytic reuse • What are Data Products? – Essentially this the output of a data science or data mining activity – Non-trivial; more than a simple query – Requires a platform for processing • They can manifest themselves as many things – Analytical "engines" running in a larger application (Amazon's recommender engine is a great Data Product) – Lists (e.g., Top 10 things I need to know today) – Entire applications (e.g., customer baseball cards) • However once they are defined, one thing is true for all – It takes a combination of domain agnostic analytic techniques together with domain specific knowledge to produce something relevant and consumable that can be monetized or operationalized. © 2013 Unisys Corporation. All rights reserved. 9
  • 10. Examples of Data Products
  • 11. Use Case #1- Netflix Recommendation • Netflix is about connecting people to the movies they love by leveraging their movie recommendation system: CinematchSM • CinematchSM initially was a linear model that helped to predict the users choices • The predictions are used to make personal movie recommendations based on a customers unique tastes – Challenge: Can the recommendation engine be improved upon? – Resolution: Set the improvement accuracy level(10%) and create a contest with a $1 million prize • Crowdsourcing: Teams merged together for an internet enabled approach and improve results • Netflix provided a training dataset of 100+ million ratings that 480,000 users gave to 17K movies and contained the quadruplet of the form (user, movie, date of grade , grade) – – – – – Goal is to predict grade Example of Supervised Machine Learning Submitted predictions are scored against the true grades in terms of Root Mean Squared Error (RMSE) RSME is a frequently used measure of the difference between values predicted by a model and the values observed(i.e. residuals) Similarity is determined by a distance measure such as Jaccard or Cosine distance Source; Netflixprize.com and Mining Massive Datasets by Anand Rajaraman and Jeffry Ullman © 2013 Unisys Corporation. All rights reserved. 11
  • 12. Use Case #2- Google PageRank • Google wanted to be able to measure and rank the importance of Web Pages. – Challenge: Identify and rank the pages that a users would want to view in terms of their relevance? – Resolution: Develop an algorithm that leverages link analysis and implement it as part of Google’s infrastructure • The PageRank algorithm considers a webpage to be important if many other webpages point to it. The linking webpages that point to a given page aren’t treated equally • The algorithm takes into account both the importance (PageRank) of the linking pages and the number of outgoing links it has – Similar to Social Network Analysis • Linking pages with higher PageRank are given more weight while pages with more outgoing links are given less weight. • Example of Un-Supervised Machine Learning 0 0 1 0 1 0 0 0 Link Matrix= 1 1 0 1 0 0 0 0 Page 2 Page 1 Page 3 Page 4 Source; The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani and Jerome Friedman © 2013 Unisys Corporation. All rights reserved. 12
  • 13. Use Case #3- Walmart Data Driven Value Chain • Walmart is the leading and largest retailer in the world. • Walmart has been a catalyst for technology adoption amongst its suppliers including requiring partners to leverage RFID technology to track and coordinate inventories. • They have a great cross section of data from individual Social Security Information, Geographic detail and product purchases • They utilize econometric and marketing mix modeling (multiplicative, log-log, power additive, adstocks, lags and powers) for a number of their key analyses • Walmart mines their data to get their product mix correct under different and changing environment conditions. – – • Challenge: Identify the correct product mix in order to protect the firm from too much or not enough inventory Resolution: Mine their multiple data sources for data products that will help tighten and improve operational forecasts For impending hurricane warnings, Walmart found that: Sales – Pop Tarts increase in sales(7 times their normal rate) – Identified that the top selling premium item was beer – Allows the firm to get the supply to the store ahead of time GAs = a + b(TV) GAs = a + b(TV)G Item(Beer, Pop Tarts) Source; What Walmart Knows about Customer Habits: New York Times © 2013 Unisys Corporation. All rights reserved. 13
  • 14. Use Case #4- Amazon Targeted Marketing • Amazon is the worlds largest online retailer and known for their e-commerce Web Site where they use input about a customer’s interest to generate a list of recommendation. • Similar to Netflix they use recommendation algorithms but they do targeted marketing for items that a customer would want to buy based on their previous purchase patterns • The recommendation algorithms personalize the online store for each customer and radically changes based on the customers interest – Challenge(s): Analyze massive amounts of data, submit results realtime, new customers have very little data and customer data is very volatile – Resolution: Cluster modeling, search based methods and Item to Item Collaborative filtering • Cluster Modeling: Identify customers similar to the user by dividing the customer base into segments and treat the task as a classification problem. Typically uses a unsupervised learning algorithm such as K-Means or Hierarchical • Search Based Methods: Treats the recommendations problem as a search for related items. Given a users purchases and rated items, the algorithm constructs a search query to find other popular items by the same author, artist or director with similar keywords • Item to Item Collaborative Filtering: Customized algorithm that is able to scale to massive data sets and produces high quality recommendations in real time. This algorithm matches each of the users purchased and rated items to similar items and then combines those similar items into a recommendation list. Offline and Online components to increase performance Source; Amazon.com Recommendations: Item to Item Collaborative Filtering. Greg Linden, Brenth Smith and Jeremy York © 2013 Unisys Corporation. All rights reserved. 14
  • 15. Unisys Big Data Analytics Building Blocks
  • 16. Big Data Analytics Methodology Modeling Components Decision Making & Forecasting • Provide actionable intelligence into the future state Models • Statistical model applied to input data that separates the portion of volume due to each of the variables or factors. We use the term model, because it is a simplification of reality. Data Internal Data Demographic Data Demographic Data 3rd Party Data © 2013 Unisys Corporation. All rights reserved. 16
  • 18. Data Mining - Motivations • We’ve covered big data – There’s a lot of it! • New Modus Operandi – Gather whatever data you can, whenever and where ever possible • New Expectation – Data gathered will have value; either for the purpose it was collected or for a purpose not yet envisioned • Challenge: There will never be enough analysts to sift through it all © 2013 Unisys Corporation. All rights reserved. 18
  • 19. Data Mining Definitions • Non-trivial extraction of implicit, previously unknown and potentially useful information from data (normally large databases) • Exploration & analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns. • Part of the Knowledge Discovery in Databases Process. Source: http://liris.cnrs.fr/abstract/abstract.html © 2013 Unisys Corporation. All rights reserved. 19
  • 20. Data Mining Tasks Prediction Methods: Use some variables to predict unknown or future values of other variables Description Methods: Find human interpretable patterns that describe the data. • Classification • Clustering – For a given set of attributes apply a model for the class (what you want to predict) as a function of the attributes – • • Regression – Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency • Data points in one cluster are more similar to one another Data points in separate clusters are less similar to one another • Association Rule Discovery – • Deviation Detection – Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that: Given a set of records each of which contain some number of items from a given collection: • Detect significant deviations from normal behavior Produce dependency rules which will predict occurrence of an item based on occurrences of other items. • Sequential Pattern Discovery – Given a set of sequences and support threshold, find the complete set of frequent subsequences © 2013 Unisys Corporation. All rights reserved. 20
  • 21. Classification - Example Tax Fraud Refund Marital Status Taxable Income Cheat Yes Single 125k ? Tid Refund Marital Status Taxable Income Cheat No Married 100k ? 1 Yes Single 125k No No Single 70k ? 2 No Married 100k No Yes Married 120k ? 3 No Single 70k No 4 Yes Married 120k No 5 No Divorced 95k Yes 6 No Married 60k No 7 Yes Divorced 220k No 8 No Single 85k Yes 9 No Married 75k No 10 No Single 90k Yes Training Data Set Test Data Set Learn Classifier Model Model Model © 2013 Unisys Corporation. All rights reserved. 21
  • 22. Classification – Your Turn • Fraud Detection • Goal: Predict fraudulent cases in credit card transactions. • Approach: – – – – What kind of data will you try to get ? Can you say something about the characteristics of the data? Estimate the size of the data. What kind of pitfalls you might run into ? © 2013 Unisys Corporation. All rights reserved. 22
  • 23. Fraud Detection • Fraud Detection • Goal: Predict fraudulent cases in credit card transactions. • Approach: – Use credit card transactions and the information on its accountholder as attributes. – When does a customer buy, what does he buy, how often he pays on time, etc – Label past transactions as fraud or fair transactions. This forms the class attribute. – Learn a model for the class of the transactions. – Use this model to detect fraud by observing credit card transactions on an account. © 2013 Unisys Corporation. All rights reserved. 23
  • 24. Clustering - Example • Document Clustering: – Goal: To find groups of documents that are similar to each other based on the important terms appearing in them. – Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster. – Gain: Search tools can utilize the clusters to relate a new document or search term to clustered documents. • Clustering Points: 3204 Articles of Los Angeles Times. • Similarity Measure: How many words are common in these documents (after some word filtering). © 2013 Unisys Corporation. All rights reserved. 24
  • 25. Clustering - Illustration Seems strait-forward for a small number of dimensions… what if there were more? © 2013 Unisys Corporation. All rights reserved. 25
  • 26. Clustering - Illustration Source: http://salsahpc.indiana.edu/plotviz We [human beings] have a limited ability to visualize and reason over a large number of dimensions – clustering helps © 2013 Unisys Corporation. All rights reserved. 26
  • 27. Association Rules • Classic Association Rule Example: – If a customer buys diaper and milk, then he is very likely to buy beer. • Applications: Supermarket shelf management. – Goal: To identify items that are bought together by sufficiently many customers. – Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items. © 2013 Unisys Corporation. All rights reserved. 27
  • 29. Hadoop -- So what is Hadoop, Really? - Dilbert It’s just a framework © 2013 Unisys Corporation. All rights reserved. 29
  • 30. Hadoop and MapReduce  Hadoop is an open-source framework (written in Java) to store and process gobs of data across many commodity computers  Hadoop is designed to solve a different problem: the fast, reliable analysis of both structured, unstructured and complex data.  Hadoop and related software are designed for 3V’s: (1) Volume – Commodity hardware and open source software lowers cost and increases capacity; (2) Velocity – Data ingest speed aided by append-only and schema-on-read design; and (3) Variety – Multiple tools to structure, process, and access  Hadoop consists of two elements: reliable very large, low-cost data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel/distributed data processing framework called MapReduce.  HDFS is self-healing high-bandwidth clustered storage. Map-Reduce is essentially fault tolerant distributed computing. © 2013 Unisys Corporation. All rights reserved. 30
  • 31. The Hadoop Stack • Hadoop runs on a collection/cluster of commodity, sharednothing x86 servers. • You can add or remove servers in a Hadoop cluster (sizes from 50, 100 to even 2000+ nodes) at will; the The four primary areas where to use Hadoop: system detects and 1) To aggregate ―data exhaust‖ — compensates for hardware or system problems on any server. messages, posts, blog entries, photos, video clips, maps, web graph…. • Hadoop is self-healing. It can 2) To give data context — friends networks, social graphs, recommendations, collaborative filtering…. deliver data — and can run 3) To keep apps running — web logs, system large-scale, high-performance logs, system metrics, database query logs…. processing batch jobs — in 4) To deliver novel mashup services – mobile spite of system changes or location data, clickstream data, SKUs, pricing….. failures. © 2013 Unisys Corporation. All rights reserved. 31
  • 33. Data Products Become the Drivers to Identify new Insights, Cost Savings and Increase Efficiencies Your Customers Feedback • Decreased time to analytics • Reuse of analytics tools • Focus on analytic vs. IT integration Internal Data Sets Data Analytics Environment Knowledge Repository Populate Analytics Engine • More self-service • Incorporation of external data • Ability to scale to analytic needs • Supports analytics lifecycle External Data Sets © 2013 Unisys Corporation. All rights reserved. 33
  • 34. Thank you © 2013 Unisys Corporation. All rights reserved. 34

Editor's Notes

  1. Think about the access to top talent and how crowd sourcing is allowing organizations to put a bounty on solutions to hard problems.
  2. Think about graph analysis and the work being done with SNA today.
  3. Think about common patterns and pattern discovery. For example in Cargo, if a ship stops at certain ports is the probability higher or lower that it may have picked up some illegal substances on the way.
  4. Really great example of how different techniques can be combined and reused. This is really driving the need for an enterprise analytic data set as you can start to chain analytics together to do many types of operations.
  5. Think about automation of analysis tasks. If I’ve figured how to to bucket things, I may be able to triage the data better according to priorities in my organization.
  6. Clustering is really BIG in the big data world right now due to the wide applicability.