SlideShare a Scribd company logo
1 of 44
SQL Server 2008 for Business Intelligence UTS Short Course
Peter Gfader Specializes in  C# and .NET (Java not anymore) TestingAutomated tests Agile, ScrumCertified Scrum Trainer Technology aficionado  Silverlight ASP.NET Windows Forms
Admin Stuff Attendance You initial sheet Hands On Lab You get me to initial sheet Homework Certificate  At end of 5 sessions If I say if you have completed successfully 
Course Website Course Timetable & Materials http://www.ssw.com.au/ssw/Events/2010UTSSQL/ Resources http://sharepoint.ssw.com.au/Training/UTSSQL/
Course Overview
Last week(s) Other cube browsers Microsoft Data Analyzer Proclarity Excel 2003/2007/2010 Excel services Thinslicer Performance Point Power Pivot
Create report on top of Northwind Top 10 customers (Table) Top 10 products (Table) Top 10 employees (Table) 1 chart that shows the top 10 customers 1 usage of the gauge control (surprise me) Homework
The plan
Step by step to BI Create Data Warehouse Copy data to data warehouse  Create OLAP Cubes Create Reports Browse the cube Do some Data Mining Discovering relationships Predict future events
Agenda What is Data Mining? Why? Uses Algorithms Demo Hands on Lab
What is Data Mining? “Data mining is the use of powerful software tools to discover significant traits or relationships,from databases or data warehouses and often used to predict future events”
What is Data Mining? It exploits statistical algorithms  Once the “knowledge” is extracted it: Can be used to discover Can be used to predict values of other cases
Why Data Mining? Marketing Who picks the movie? The kids, the wife, me Who are our Customers and what sort of films do they hire? Is a 30 year old woman with 2 children going to hire Arnie’s latest film Validation Is this data sensible? Terminator 2 and Toy Story Prediction Sales Next Year
Get new information from data, future trends, past trends, outlier, maximums, minimums Analyse data from different perspectives and summarizing it into useful information New information to increase revenue cuts costs or both :-) Why?   Its all about money
Who are our biggest customers? What are customers buying with cigars? What are the customer retention levels of our branches? Which customers have bought olives, feta cheese but no ciabatta bread? Which regions have the highest male/female ratio of single 20 somethings? Which region has lowest customer retention levels and list out lost customers? Which Questions are Data Mining?
Ad hoc query Drill through to details Business Intelligence tool What’s not data mining
[object Object]
Good raw material  good data miningSamples should be representative Samples "similar" to domain Not all-seeing crystal ball Verify and Validate! Data - Uncover patterns in samples
OLAP Is about fast ad hoc querying Analysis by dimensions and measures Gives precise answers Data Mining May use RDBMS or OLAP source Is about discovering and predicting Gives imprecise answers OLAP is not a prerequisite for data mining, but it  almost always comes first OLAP versus Data Mining (learning to ride a bike before a car)
Classification algorithms  predictone or more discrete variables, based on the other attributes in the dataset Regression algorithms  predictone or more continuous variables, such as profit or loss, based on other attributes in the dataset Segmentation algorithms  dividedata into groups, or clusters, of items that have similar properties Association algorithms  find correlations between different attributes in a dataset Sequence analysis algorithms  summarize frequent sequences or episodes in data, such as a Web path flow Types of Data Mining Algorithms
Clustering Time Series Decision Trees Naïve Bayes Association Linear Regression Complete Set Of AlgorithmsWays to analyze your data Neural Network Sequence Clustering Logistic Regression
Split data Each of branch is like an attribute Brightness = amount of data Decision trees
Decision Trees (1) Decision Trees assign (classify) each case to one of a few (discrete) broad categories of selected attribute (variable) and explains the classification with few selected input variables The process of building is recursive partitioning – splitting data into partitions and then splitting it up more Initially all cases are in one big box
Decision Trees (2) The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable Several measures of purity Then it repeats splitting for each new class Again testing all possible breaks Unuseful branches of the tree can be pre-pruned or post-pruned
Decision Trees (3) Decision trees are used for classification and prediction Typical questions: Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers like to buy?
Decision Trees – Who Decides
Naïve Bayes Bayes Formula Uses statistics to say falls into certain category or not with probability Spam filtering: score of spam (Bayes) Testing only a particular attribute
Naïve Bayes Quickly builds mining models that can be used for classification and prediction It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute This can later be used to predict an outcome of the predicted attribute based on the known input attributes  This makes the model a good option for exploring the data
Cluster Analysis (1) Grouping data into clusters Objects within a cluster have high similarity based on the attribute values The class label of each object is not known Several techniques Partitioning methods Hierarchical methods Density based methods Model based methods And more…
Cluster Analysis (2) Segments a heterogeneous population into a number of more homogenous subgroups or clusters Some typical questions: Discover distinct groups of customers Identification of groups of houses in a city In biology, derive animal and plant taxonomies Find outliers
Clustering Annual  Income Age
Time series Timebaseddata  prediction
Sequence clustering Numbers orders stronger associations Direction of association (not necessary the other direction)
If you own certain stocks ' you own maybe other ones as well Probability = thickness of line Association
Let system learn how to classify data Neural Network adapts to the new data Formulate statement/hypothesis Outcome is know (Data / Surveys) 1. 70% data to train network (outcome is known) 2. 30% of data to test network (outcome is known) 3. New data (no survey needed, predict from network) Other example: OCR  Neural Nets
Both have directions Sequence Clustering has probability number and colour They are very similar. The difference is that Association analyses items that occur together whereas sequence clustering analyses items that follow one another. An example is that Sequence Clustering might be used by credit card companies to spot fraud, e.g. a petrol station refill followed by another petrol station refill followed by a big purchase = fraud (different transactions) Whereas Association will be more like: when someone buys popcorn at the cinemas, they also buy a drink (same transaction) Difference between algorithms: Association and Sequence
Conclusion: When To Use What
Visual Numerics 3rd party algorithms http://www.vni.com/company/whitepapers/                              MicrosoftBIwithNumericalLibraries.pdf There is more...
Excel Data Mining Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007 http://www.microsoft.com/downloads/en/details.aspx?familyid=896A493A-2502-4795-94AE-E00632BA6DE7&displaylang=en
Train station / airport  Who is the bad guy Farmers  Find the best crops Supermarket  Find to figure out how to get you to buy more, where the expensive items Other usages of data miningFind patterns - Profiling
SSIS 2008 - Data profiling task Get a profile of the data in a table  potential candidate keys length of data values in columns Null percentage of rows distribution of values .... Tip
Video: Simple data mining model http://www.sqlservercentral.com/articles/Video/65055/ Video: Data mining and Reporting Services http://www.sqlservercentral.com/articles/Video/64190/ Data Mining Algorithms http://msdn.microsoft.com/en-us/library/ms175595.aspx Resources 1
Jamie MacLennan http://blogs.msdn.com/b/jamiemac/ Richard Lees on BI http://richardlees.blogspot.com/ Book Data Mining with Microsoft SQL Server 2008 http://www.amazon.com/gp/product/0470277742?ie=UTF8&tag=sqlserverda09-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0470277742 Resources 2
Summary Why Data Mining? Uses Algorithms Demo Hands on Lab

More Related Content

What's hot (20)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Online retail a look at data consulting approach
Online retail   a look at data consulting approachOnline retail   a look at data consulting approach
Online retail a look at data consulting approach
 
Data analysis
Data analysisData analysis
Data analysis
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
BAS 250 Lecture 2
BAS 250 Lecture 2BAS 250 Lecture 2
BAS 250 Lecture 2
 
Classification of data
Classification of dataClassification of data
Classification of data
 
Data analytics
Data analyticsData analytics
Data analytics
 
3 classification
3  classification3  classification
3 classification
 
Artificial Intelligence in Data Curation
Artificial Intelligence in Data CurationArtificial Intelligence in Data Curation
Artificial Intelligence in Data Curation
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?
 
Business analyst
Business analystBusiness analyst
Business analyst
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Data Science
Data ScienceData Science
Data Science
 
Analytics in Online Retail
Analytics in Online RetailAnalytics in Online Retail
Analytics in Online Retail
 

Similar to Data Mining with SQL Server 2008

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data WarehouseEric Sun
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Dean Willson
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesZenodia Charpy
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisYuanyuan Tian
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Tom Martens - Cube Ware - The big data challenge - bo
Tom Martens - Cube Ware - The big data challenge - boTom Martens - Cube Ware - The big data challenge - bo
Tom Martens - Cube Ware - The big data challenge - boSogeti Nederland B.V.
 
Knowledge Discovery Using Data Mining
Knowledge Discovery Using Data MiningKnowledge Discovery Using Data Mining
Knowledge Discovery Using Data Miningparthvora18
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfShaikSikindar1
 

Similar to Data Mining with SQL Server 2008 (20)

Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo cases
 
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph AnalysisBig Data Analytics: From SQL to Machine Learning and Graph Analysis
Big Data Analytics: From SQL to Machine Learning and Graph Analysis
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Tom Martens - Cube Ware - The big data challenge - bo
Tom Martens - Cube Ware - The big data challenge - boTom Martens - Cube Ware - The big data challenge - bo
Tom Martens - Cube Ware - The big data challenge - bo
 
Knowledge Discovery Using Data Mining
Knowledge Discovery Using Data MiningKnowledge Discovery Using Data Mining
Knowledge Discovery Using Data Mining
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Talk
TalkTalk
Talk
 
Part1
Part1Part1
Part1
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Top 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdfTop 30 Data Analyst Interview Questions.pdf
Top 30 Data Analyst Interview Questions.pdf
 

More from Peter Gfader

Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity Peter Gfader
 
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019Peter Gfader
 
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)Peter Gfader
 
How to make more impact as an engineer
How to make more impact as an engineerHow to make more impact as an engineer
How to make more impact as an engineerPeter Gfader
 
13 explosive things you should try as an agilist
13 explosive things you should try as an agilist13 explosive things you should try as an agilist
13 explosive things you should try as an agilistPeter Gfader
 
You cant be agile if your code sucks
You cant be agile if your code sucksYou cant be agile if your code sucks
You cant be agile if your code sucksPeter Gfader
 
Use Scrum and Continuous Delivery to innovate like crazy!
Use Scrum and Continuous Delivery to innovate like crazy!Use Scrum and Continuous Delivery to innovate like crazy!
Use Scrum and Continuous Delivery to innovate like crazy!Peter Gfader
 
Innovation durch Scrum und Continuous Delivery
Innovation durch Scrum und Continuous DeliveryInnovation durch Scrum und Continuous Delivery
Innovation durch Scrum und Continuous DeliveryPeter Gfader
 
Qcon london2012 recap
Qcon london2012 recapQcon london2012 recap
Qcon london2012 recapPeter Gfader
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployPeter Gfader
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Peter Gfader
 
Clean Code Development
Clean Code DevelopmentClean Code Development
Clean Code DevelopmentPeter Gfader
 
SSAS - Other Cube Browsers
SSAS - Other Cube BrowsersSSAS - Other Cube Browsers
SSAS - Other Cube BrowsersPeter Gfader
 
Reports with SQL Server Reporting Services
Reports with SQL Server Reporting ServicesReports with SQL Server Reporting Services
Reports with SQL Server Reporting ServicesPeter Gfader
 
OLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesOLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesPeter Gfader
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL ServerPeter Gfader
 
SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text searchPeter Gfader
 
Usability AJAX and other ASP.NET Features
Usability AJAX and other ASP.NET FeaturesUsability AJAX and other ASP.NET Features
Usability AJAX and other ASP.NET FeaturesPeter Gfader
 
Work with data in ASP.NET
Work with data in ASP.NETWork with data in ASP.NET
Work with data in ASP.NETPeter Gfader
 

More from Peter Gfader (20)

Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity Achieving Technical Excellence in Your Software Teams - from Devternity
Achieving Technical Excellence in Your Software Teams - from Devternity
 
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
You Can't Be Agile If Your Testing Practices Suck - Vilnius October 2019
 
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)
You Cant Be Agile If Your Code Sucks (with 9 Tips For Dev Teams)
 
How to make more impact as an engineer
How to make more impact as an engineerHow to make more impact as an engineer
How to make more impact as an engineer
 
13 explosive things you should try as an agilist
13 explosive things you should try as an agilist13 explosive things you should try as an agilist
13 explosive things you should try as an agilist
 
You cant be agile if your code sucks
You cant be agile if your code sucksYou cant be agile if your code sucks
You cant be agile if your code sucks
 
Use Scrum and Continuous Delivery to innovate like crazy!
Use Scrum and Continuous Delivery to innovate like crazy!Use Scrum and Continuous Delivery to innovate like crazy!
Use Scrum and Continuous Delivery to innovate like crazy!
 
Innovation durch Scrum und Continuous Delivery
Innovation durch Scrum und Continuous DeliveryInnovation durch Scrum und Continuous Delivery
Innovation durch Scrum und Continuous Delivery
 
Speed = $$$
Speed = $$$Speed = $$$
Speed = $$$
 
Qcon london2012 recap
Qcon london2012 recapQcon london2012 recap
Qcon london2012 recap
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeploy
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...
 
Clean Code Development
Clean Code DevelopmentClean Code Development
Clean Code Development
 
SSAS - Other Cube Browsers
SSAS - Other Cube BrowsersSSAS - Other Cube Browsers
SSAS - Other Cube Browsers
 
Reports with SQL Server Reporting Services
Reports with SQL Server Reporting ServicesReports with SQL Server Reporting Services
Reports with SQL Server Reporting Services
 
OLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis ServicesOLAP – Creating Cubes with SQL Server Analysis Services
OLAP – Creating Cubes with SQL Server Analysis Services
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
SQL Server - Full text search
SQL Server - Full text searchSQL Server - Full text search
SQL Server - Full text search
 
Usability AJAX and other ASP.NET Features
Usability AJAX and other ASP.NET FeaturesUsability AJAX and other ASP.NET Features
Usability AJAX and other ASP.NET Features
 
Work with data in ASP.NET
Work with data in ASP.NETWork with data in ASP.NET
Work with data in ASP.NET
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 

Recently uploaded (20)

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 

Data Mining with SQL Server 2008

  • 1. SQL Server 2008 for Business Intelligence UTS Short Course
  • 2. Peter Gfader Specializes in C# and .NET (Java not anymore) TestingAutomated tests Agile, ScrumCertified Scrum Trainer Technology aficionado Silverlight ASP.NET Windows Forms
  • 3. Admin Stuff Attendance You initial sheet Hands On Lab You get me to initial sheet Homework Certificate At end of 5 sessions If I say if you have completed successfully 
  • 4. Course Website Course Timetable & Materials http://www.ssw.com.au/ssw/Events/2010UTSSQL/ Resources http://sharepoint.ssw.com.au/Training/UTSSQL/
  • 6. Last week(s) Other cube browsers Microsoft Data Analyzer Proclarity Excel 2003/2007/2010 Excel services Thinslicer Performance Point Power Pivot
  • 7. Create report on top of Northwind Top 10 customers (Table) Top 10 products (Table) Top 10 employees (Table) 1 chart that shows the top 10 customers 1 usage of the gauge control (surprise me) Homework
  • 9. Step by step to BI Create Data Warehouse Copy data to data warehouse Create OLAP Cubes Create Reports Browse the cube Do some Data Mining Discovering relationships Predict future events
  • 10. Agenda What is Data Mining? Why? Uses Algorithms Demo Hands on Lab
  • 11. What is Data Mining? “Data mining is the use of powerful software tools to discover significant traits or relationships,from databases or data warehouses and often used to predict future events”
  • 12. What is Data Mining? It exploits statistical algorithms Once the “knowledge” is extracted it: Can be used to discover Can be used to predict values of other cases
  • 13. Why Data Mining? Marketing Who picks the movie? The kids, the wife, me Who are our Customers and what sort of films do they hire? Is a 30 year old woman with 2 children going to hire Arnie’s latest film Validation Is this data sensible? Terminator 2 and Toy Story Prediction Sales Next Year
  • 14. Get new information from data, future trends, past trends, outlier, maximums, minimums Analyse data from different perspectives and summarizing it into useful information New information to increase revenue cuts costs or both :-) Why? Its all about money
  • 15. Who are our biggest customers? What are customers buying with cigars? What are the customer retention levels of our branches? Which customers have bought olives, feta cheese but no ciabatta bread? Which regions have the highest male/female ratio of single 20 somethings? Which region has lowest customer retention levels and list out lost customers? Which Questions are Data Mining?
  • 16. Ad hoc query Drill through to details Business Intelligence tool What’s not data mining
  • 17.
  • 18. Good raw material  good data miningSamples should be representative Samples "similar" to domain Not all-seeing crystal ball Verify and Validate! Data - Uncover patterns in samples
  • 19. OLAP Is about fast ad hoc querying Analysis by dimensions and measures Gives precise answers Data Mining May use RDBMS or OLAP source Is about discovering and predicting Gives imprecise answers OLAP is not a prerequisite for data mining, but it almost always comes first OLAP versus Data Mining (learning to ride a bike before a car)
  • 20. Classification algorithms predictone or more discrete variables, based on the other attributes in the dataset Regression algorithms predictone or more continuous variables, such as profit or loss, based on other attributes in the dataset Segmentation algorithms dividedata into groups, or clusters, of items that have similar properties Association algorithms find correlations between different attributes in a dataset Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow Types of Data Mining Algorithms
  • 21. Clustering Time Series Decision Trees Naïve Bayes Association Linear Regression Complete Set Of AlgorithmsWays to analyze your data Neural Network Sequence Clustering Logistic Regression
  • 22. Split data Each of branch is like an attribute Brightness = amount of data Decision trees
  • 23. Decision Trees (1) Decision Trees assign (classify) each case to one of a few (discrete) broad categories of selected attribute (variable) and explains the classification with few selected input variables The process of building is recursive partitioning – splitting data into partitions and then splitting it up more Initially all cases are in one big box
  • 24. Decision Trees (2) The algorithm tries all possible breaks in classes using all possible values of each input attribute; it then selects the split that partitions data to the purest classes of the searched variable Several measures of purity Then it repeats splitting for each new class Again testing all possible breaks Unuseful branches of the tree can be pre-pruned or post-pruned
  • 25. Decision Trees (3) Decision trees are used for classification and prediction Typical questions: Predict which customers will leave Help in mailing and promotion campaigns Explain reasons for a decision What are the movies young female customers like to buy?
  • 26. Decision Trees – Who Decides
  • 27. Naïve Bayes Bayes Formula Uses statistics to say falls into certain category or not with probability Spam filtering: score of spam (Bayes) Testing only a particular attribute
  • 28. Naïve Bayes Quickly builds mining models that can be used for classification and prediction It calculates probabilities for each possible state of the input attribute, given each state of the predictable attribute This can later be used to predict an outcome of the predicted attribute based on the known input attributes This makes the model a good option for exploring the data
  • 29. Cluster Analysis (1) Grouping data into clusters Objects within a cluster have high similarity based on the attribute values The class label of each object is not known Several techniques Partitioning methods Hierarchical methods Density based methods Model based methods And more…
  • 30. Cluster Analysis (2) Segments a heterogeneous population into a number of more homogenous subgroups or clusters Some typical questions: Discover distinct groups of customers Identification of groups of houses in a city In biology, derive animal and plant taxonomies Find outliers
  • 31. Clustering Annual Income Age
  • 32. Time series Timebaseddata  prediction
  • 33. Sequence clustering Numbers orders stronger associations Direction of association (not necessary the other direction)
  • 34. If you own certain stocks ' you own maybe other ones as well Probability = thickness of line Association
  • 35. Let system learn how to classify data Neural Network adapts to the new data Formulate statement/hypothesis Outcome is know (Data / Surveys) 1. 70% data to train network (outcome is known) 2. 30% of data to test network (outcome is known) 3. New data (no survey needed, predict from network) Other example: OCR Neural Nets
  • 36. Both have directions Sequence Clustering has probability number and colour They are very similar. The difference is that Association analyses items that occur together whereas sequence clustering analyses items that follow one another. An example is that Sequence Clustering might be used by credit card companies to spot fraud, e.g. a petrol station refill followed by another petrol station refill followed by a big purchase = fraud (different transactions) Whereas Association will be more like: when someone buys popcorn at the cinemas, they also buy a drink (same transaction) Difference between algorithms: Association and Sequence
  • 38. Visual Numerics 3rd party algorithms http://www.vni.com/company/whitepapers/ MicrosoftBIwithNumericalLibraries.pdf There is more...
  • 39. Excel Data Mining Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007 http://www.microsoft.com/downloads/en/details.aspx?familyid=896A493A-2502-4795-94AE-E00632BA6DE7&displaylang=en
  • 40. Train station / airport Who is the bad guy Farmers Find the best crops Supermarket Find to figure out how to get you to buy more, where the expensive items Other usages of data miningFind patterns - Profiling
  • 41. SSIS 2008 - Data profiling task Get a profile of the data in a table potential candidate keys length of data values in columns Null percentage of rows distribution of values .... Tip
  • 42. Video: Simple data mining model http://www.sqlservercentral.com/articles/Video/65055/ Video: Data mining and Reporting Services http://www.sqlservercentral.com/articles/Video/64190/ Data Mining Algorithms http://msdn.microsoft.com/en-us/library/ms175595.aspx Resources 1
  • 43. Jamie MacLennan http://blogs.msdn.com/b/jamiemac/ Richard Lees on BI http://richardlees.blogspot.com/ Book Data Mining with Microsoft SQL Server 2008 http://www.amazon.com/gp/product/0470277742?ie=UTF8&tag=sqlserverda09-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0470277742 Resources 2
  • 44. Summary Why Data Mining? Uses Algorithms Demo Hands on Lab
  • 46. Thank You! Gateway Court Suite 10 81 - 91 Military Road Neutral Bay, Sydney NSW 2089 AUSTRALIA ABN: 21 069 371 900 Phone: + 61 2 9953 3000 Fax: + 61 2 9953 3105 info@ssw.com.auwww.ssw.com.au

Editor's Notes

  1. Click to add notesPeter Gfader shows SQL Server
  2. Java current version 1.6 Update 211.7 released next year 2010Dynamic languages Parallel computingMaybe closures
  3. 3. Create the following report on top of Northwind Top 10 customers (Table) Top 10 products (Table) Top 10 employees (Table) 1 chart that shows the top 10 customers 1 usage of the gauge control (surprise me)a. Download Report builder 2 from http://www.microsoft.com/downloads/en/details.aspx?FamilyID=9f783224-9871-4eea-b1d5-f3140a253db6&displaylang=enb. Send me the screenshot of the final report
  4. Data mining can be used to uncover patterns in data samples, it is important to be aware that the use of non-representative samples of data may produce results that are not indicative of the domainSimilarly, data mining will not find patterns that may be present in the domain, if those patterns are not present in the sample being "mined". There is a tendency for insufficiently knowledgeable "consumers" of the results to attribute "magical abilities" to data mining, treating the technique as a sort of all-seeing crystal ball. Like any other tool, it only functions in conjunction with the appropriate raw material: in this case, indicative and representative data that the user must first collect. Further, the discovery of a particular pattern in a particular set of data does not necessarily mean that pattern is representative of the whole population from which that data was drawn. Hence, an important part of the process is the verification and validation of patterns on other samples of data.
  5. Data mining can be used to uncover patterns in data samples, it is important to be aware that the use of non-representative samples of data may produce results that are not indicative of the domain Similarly, data mining will not find patterns that may be present in the domain, if those patterns are not present in the sample being "mined".  There is a tendency for insufficiently knowledgeable "consumers" of the results to attribute "magical abilities" to data mining, treating the technique as a sort of all-seeing crystal ball. Like any other tool, it only functions in conjunction with the appropriate raw material: in this case, indicative and representative data that the user must first collect.  Further, the discovery of a particular pattern in a particular set of data does not necessarily mean that pattern is representative of the whole population from which that data was drawn. Hence, an important part of the process is the verification and validation of patterns on other samples of data. 
  6. http://msdn.microsoft.com/en-us/library/ms175595.aspxWays to analyze your dataDT = split dataEach of branch is like an attributeBrightness = amount of dataTODO: Check out barsClustering = mapping of popular pointsNumber of childrenDarkness = Lines are links between clusters (associations)Time seriesTimebased data  predictionSequence clusteringNumbers orders stronger associationsDirection of association (not necessary the other direction)AssociationIf you own certain stocks  you own maybe other ones as wellProbability = thickness of lineNaive BayesBayes FormulaUses statistics to say falls into certain category or not (with probabiblty)Spam filtering  score of spam (bayes)Testing only a particular attributeNeural NetsLet system learn how to classify dataFormulate statement/hypothesisOutcome is know(Data / Surveys)1. 70% data to train network (outcome is known)2. 30% of data to test network (outcome is known)3. New data (no survey needed, predict from network)Ex: OCR Example above = get loyalty of customersNeural Network adapts to the new data
  7. What attributes I am interested inAlgorithm splits data for me
  8. Pruned = gestutzt
  9. Diff. Color = relationshipUser clicked on toy story2
  10. Very easy to setupClassifies and gives a score  prediction
  11. Class label:Combination of diff. AttributesName clusters yourself
  12. Diff. Color = relationshipUser clicked on toy story2
  13. Diff. Color = relationshipUser clicked on toy story2
  14. Get loyalty of customers
  15. Click to add notesPeter Gfader shows SQL Server