SlideShare a Scribd company logo
1 of 12
Chapter 2
Related Concepts
2.1 Database/OLTP Systems
• Unlike a simple set, data in a database are usually viewed to have a particular
structure or schema which it is associated with.
• Unlike a file, a database is independent of the physical method used to store it.
• Data model is used to describe the data, attributes, and relationships among
them. A common data model is the ER (entity-relationship) data model. It can be
viewed as a documentation and communication tool to convey type and structure
of the actual data. A data model is independent of the particular the DBMS used.
• Basic database queries are well defined with precise results. Data mining
applications conversely are often vaguely defined with imprecise results. A data
mining query outputs a KDD object.
• A KDD object is either a rule, a classification, or a cluster, which do not exist
before executing the query, and are not part of the database being queried.
2.2 Fuzzy Sets and Fuzzy Logic
• A fuzzy set is a set, 𝐹, in which the set membership function, f, is a real valued (as opposed to
Boolean) function with output in the range [0,1]. An element 𝑥 is said to belong to 𝐹 with
probability 𝑓(𝑥) and simultaneously to be in ¬𝐹 with probability 1 − 𝑓(𝑥)
• Membership function is not Boolean so the results of this query are fuzzy. Classification problem is
solved by assigning a set membership function to each record for each class. The record is then
assigned to the class that has the highest membership function value.
• Association rules are generated given a confidence value that indicates the degree to which it
holds in the entire database. This can be thought of as a membership function.
• Fuzzy logic uses rules and membership functions to estimate a continuous function. Fuzzy logic is
a valuable tool to develop control systems for such things as elevators, trains, and heating
systems.
𝑚𝑒𝑚 ¬𝑥 = 1 − 𝑚𝑒𝑚 𝑥
𝑚𝑒𝑚 𝑥 ∧ 𝑦 = min(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
𝑚𝑒𝑚 𝑥 ⋁𝑦 = max(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
2.3 Information Retrieval
The effectiveness of the IR system in processing the
query is often measured by precision and recall:
• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
• 𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡
The inverse document frequency (IDF) is often used
by similarity measures. Given a keyword, 𝑘, and
𝑛 documents, IDF can be defined as:
• 𝐼𝐷𝐹𝑘 = log
𝑛
𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑘
+ 1
Concept hierarchies (tree or DAG (directed acyclic
graph) ) can be used in spatial data mining.
2.5 Data Warehousing
• Data warehouse is a set of data that supports DSS and is subject-
oriented, integrated, time-variant, nonvolatile.
• DW contains contains informational data, which are used to support
other functions such as planning and forecasting.
• OLAP retrieval tools facilitate quick query response at all granularities.
2.6 Dimensional Modeling
Multidimensional star schema Multidimensional constellation schema
• A dimension is a collection of logically related attributes and is viewed as an axis
for modeling the data.
• The specific data stored are called facts and usually are numeric data. In a
relational system each dimension is a table and facts are stored in a fact table.
Multidimensional Data Cube
2.7 Online Analytic Processing (OLAP)
OLAP supports as hoc querying of the data warehouse. OLAP requires
a multidimensional view of the data and involves some analysis.
Operations supported: Slice, Dice, Roll up, Drill down, Visualization
2.8 Statistics
• Such simple concepts as determining a data distribution and calculating a mean, a
variance can be viewed as data mining techniques in their own, a descriptive model for
the data under consideration.
• When a model is generated, the goal is to fit it to the entire data, not just a sample
searched. Assumptions often made about independence of data may be incorrect, thus
leading to errors in the resulting model. Any model should be statistically significant,
meaningful, and valid.
• An often used tool in data mining and machine learning is one of sampling. Here a subset
of the total population is examined, and a generalization (model) about the entire
population is made from this subset.
• The term exploratory data analysis describes the fact that the data can actually drive the
creation of the model and any statistical characteristics.
• Some data mining applications determine correlations among data. These relationships,
however, are not casual in nature. Care must be taken when assigning significance to
such relationships.
2.9 Machine Learning
• Data mining involves not only modeling but also the development of effective and
efficient algorithms and data structures to perform the modeling on large data sets.
• Machine learning is the area of AI that examines how to write programs that can learn. In
data mining, machine learning is often used for prediction or classification.
• Predictive modeling is done in two phases. During the training phase, historical or
sampled data are used to create a model that represents those data. It is assumed to
hold for the whole database and its future states. The testing phase then applies this
model to the remaining and future data.
• With supervised learning a sample of the database is used to train the system to properly
perform the desired task. The quality of the training data determines how well the
program learns. With unsupervised learning there is no knowledge of the correct
answers of applying the model to the data.
• The objective for data mining is to uncover useful information and provide it to humans,
while machine learning research is focused more on the learning portion.
References:
Dunham, Margaret H. “Data Mining: Introductory and Advanced
Topics”. Pearson Education, Inc., 2003.

More Related Content

What's hot

Graph Clustering and cluster
Graph Clustering and clusterGraph Clustering and cluster
Graph Clustering and clusterAdil Mehmoood
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Miningijsrd.com
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction StratergiesAnjaliSoorej
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by AnalogyColleen Farrelly
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET Journal
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxnishanth kurush
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster AnalysisDerek Kane
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 

What's hot (20)

Graph Clustering and cluster
Graph Clustering and clusterGraph Clustering and cluster
Graph Clustering and cluster
 
Data reduction
Data reductionData reduction
Data reduction
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
Survey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data MiningSurvey on Various Classification Techniques in Data Mining
Survey on Various Classification Techniques in Data Mining
 
Clustering
ClusteringClustering
Clustering
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data Reduction Stratergies
Data Reduction StratergiesData Reduction Stratergies
Data Reduction Stratergies
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...A Survey on Constellation Based Attribute Selection Method for High Dimension...
A Survey on Constellation Based Attribute Selection Method for High Dimension...
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Machine Learning by Analogy
Machine Learning by AnalogyMachine Learning by Analogy
Machine Learning by Analogy
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence ChainIRJET- Missing Data Imputation by Evidence Chain
IRJET- Missing Data Imputation by Evidence Chain
 
Presentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptxPresentation_Malware Analysis.pptx
Presentation_Malware Analysis.pptx
 
M033059064
M033059064M033059064
M033059064
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 

Viewers also liked

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectValerii Klymchuk
 
A Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection SystemA Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection SystemAM Publications
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And RecallNicolas Bettenburg
 
Probability
ProbabilityProbability
Probabilityacsteele
 
Presentación Constitución
Presentación Constitución Presentación Constitución
Presentación Constitución sasalinda41
 
Presentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijvenPresentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijveningevandelst
 
Bab 5-ting-4.ppt
Bab 5-ting-4.pptBab 5-ting-4.ppt
Bab 5-ting-4.pptMangkai Ram
 
Periodo simples e_composto
Periodo simples e_compostoPeriodo simples e_composto
Periodo simples e_compostoEquipe_FAETEC
 
L'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projetsL'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projetsMarc Bonnemains
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data miningbalbeerrawat
 
Databse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining ApproachDatabse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining ApproachSuraj Chauhan
 

Viewers also liked (19)

01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Database Project
Database ProjectDatabase Project
Database Project
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
DM for IDS
DM for IDSDM for IDS
DM for IDS
 
A Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection SystemA Study on Data Mining Based Intrusion Detection System
A Study on Data Mining Based Intrusion Detection System
 
Computing Accuracy Precision And Recall
Computing Accuracy Precision And RecallComputing Accuracy Precision And Recall
Computing Accuracy Precision And Recall
 
Probability
ProbabilityProbability
Probability
 
Global KTech Corporate Deck
Global KTech Corporate DeckGlobal KTech Corporate Deck
Global KTech Corporate Deck
 
Presentación Constitución
Presentación Constitución Presentación Constitución
Presentación Constitución
 
Presentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijvenPresentatie les 2 externe communicatie - vacatures schrijven
Presentatie les 2 externe communicatie - vacatures schrijven
 
Bab 5-ting-4.ppt
Bab 5-ting-4.pptBab 5-ting-4.ppt
Bab 5-ting-4.ppt
 
Periodo simples e_composto
Periodo simples e_compostoPeriodo simples e_composto
Periodo simples e_composto
 
L'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projetsL'apport de l'analyse d'entreprise dans les projets
L'apport de l'analyse d'entreprise dans les projets
 
Intrusion detection using data mining
Intrusion detection using data miningIntrusion detection using data mining
Intrusion detection using data mining
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
 
Databse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining ApproachDatabse Intrusion Detection Using Data Mining Approach
Databse Intrusion Detection Using Data Mining Approach
 

Similar to 02 Related Concepts

DM_Notes.pptx
DM_Notes.pptxDM_Notes.pptx
DM_Notes.pptxWorkingad
 
Lecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptxLecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptxArifKamal36
 
DatabaseManagementSystem.pptx
DatabaseManagementSystem.pptxDatabaseManagementSystem.pptx
DatabaseManagementSystem.pptxuwmctesting
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance Anaya Zafar
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examplesJayeshGadhave1
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Jayanti Pande
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Data Modeling Training.pptx
Data Modeling Training.pptxData Modeling Training.pptx
Data Modeling Training.pptxssuser23b3eb
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMDr.Florence Dayana
 
Relational data base management system (Unit 1)
Relational data base management system (Unit 1)Relational data base management system (Unit 1)
Relational data base management system (Unit 1)Ismail Mukiibi
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxMaryJoseph79
 
Database Management System
Database Management SystemDatabase Management System
Database Management SystemNishant Munjal
 

Similar to 02 Related Concepts (20)

DM_Notes.pptx
DM_Notes.pptxDM_Notes.pptx
DM_Notes.pptx
 
Lecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptxLecture 1. Data Structure & Algorithm.pptx
Lecture 1. Data Structure & Algorithm.pptx
 
Data Mining Technniques
Data Mining TechnniquesData Mining Technniques
Data Mining Technniques
 
DatabaseManagementSystem.pptx
DatabaseManagementSystem.pptxDatabaseManagementSystem.pptx
DatabaseManagementSystem.pptx
 
data structures and its importance
 data structures and its importance  data structures and its importance
data structures and its importance
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examples
 
Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Ch1_Intro-95(1).ppt
Ch1_Intro-95(1).pptCh1_Intro-95(1).ppt
Ch1_Intro-95(1).ppt
 
Data Modeling Training.pptx
Data Modeling Training.pptxData Modeling Training.pptx
Data Modeling Training.pptx
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
 
Data processing
Data processingData processing
Data processing
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
 
Relational data base management system (Unit 1)
Relational data base management system (Unit 1)Relational data base management system (Unit 1)
Relational data base management system (Unit 1)
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
Unit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptxUnit 2_DBMS_10.2.22.pptx
Unit 2_DBMS_10.2.22.pptx
 
DISE - Database Concepts
DISE - Database ConceptsDISE - Database Concepts
DISE - Database Concepts
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 

More from Valerii Klymchuk

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides templateValerii Klymchuk
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataValerii Klymchuk
 

More from Valerii Klymchuk (6)

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides template
 
Toronto Capstone
Toronto CapstoneToronto Capstone
Toronto Capstone
 
05 Scalar Visualization
05 Scalar Visualization05 Scalar Visualization
05 Scalar Visualization
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation Data
 

Recently uploaded

MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

02 Related Concepts

  • 2. 2.1 Database/OLTP Systems • Unlike a simple set, data in a database are usually viewed to have a particular structure or schema which it is associated with. • Unlike a file, a database is independent of the physical method used to store it. • Data model is used to describe the data, attributes, and relationships among them. A common data model is the ER (entity-relationship) data model. It can be viewed as a documentation and communication tool to convey type and structure of the actual data. A data model is independent of the particular the DBMS used. • Basic database queries are well defined with precise results. Data mining applications conversely are often vaguely defined with imprecise results. A data mining query outputs a KDD object. • A KDD object is either a rule, a classification, or a cluster, which do not exist before executing the query, and are not part of the database being queried.
  • 3. 2.2 Fuzzy Sets and Fuzzy Logic • A fuzzy set is a set, 𝐹, in which the set membership function, f, is a real valued (as opposed to Boolean) function with output in the range [0,1]. An element 𝑥 is said to belong to 𝐹 with probability 𝑓(𝑥) and simultaneously to be in ¬𝐹 with probability 1 − 𝑓(𝑥) • Membership function is not Boolean so the results of this query are fuzzy. Classification problem is solved by assigning a set membership function to each record for each class. The record is then assigned to the class that has the highest membership function value. • Association rules are generated given a confidence value that indicates the degree to which it holds in the entire database. This can be thought of as a membership function. • Fuzzy logic uses rules and membership functions to estimate a continuous function. Fuzzy logic is a valuable tool to develop control systems for such things as elevators, trains, and heating systems. 𝑚𝑒𝑚 ¬𝑥 = 1 − 𝑚𝑒𝑚 𝑥 𝑚𝑒𝑚 𝑥 ∧ 𝑦 = min(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 ) 𝑚𝑒𝑚 𝑥 ⋁𝑦 = max(𝑚𝑒𝑚 𝑥 , 𝑚𝑒𝑚 𝑦 )
  • 4. 2.3 Information Retrieval The effectiveness of the IR system in processing the query is often measured by precision and recall: • 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 • 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑎𝑛𝑑 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 The inverse document frequency (IDF) is often used by similarity measures. Given a keyword, 𝑘, and 𝑛 documents, IDF can be defined as: • 𝐼𝐷𝐹𝑘 = log 𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑘 + 1 Concept hierarchies (tree or DAG (directed acyclic graph) ) can be used in spatial data mining.
  • 5. 2.5 Data Warehousing • Data warehouse is a set of data that supports DSS and is subject- oriented, integrated, time-variant, nonvolatile. • DW contains contains informational data, which are used to support other functions such as planning and forecasting. • OLAP retrieval tools facilitate quick query response at all granularities.
  • 6. 2.6 Dimensional Modeling Multidimensional star schema Multidimensional constellation schema • A dimension is a collection of logically related attributes and is viewed as an axis for modeling the data. • The specific data stored are called facts and usually are numeric data. In a relational system each dimension is a table and facts are stored in a fact table.
  • 8. 2.7 Online Analytic Processing (OLAP) OLAP supports as hoc querying of the data warehouse. OLAP requires a multidimensional view of the data and involves some analysis. Operations supported: Slice, Dice, Roll up, Drill down, Visualization
  • 9. 2.8 Statistics • Such simple concepts as determining a data distribution and calculating a mean, a variance can be viewed as data mining techniques in their own, a descriptive model for the data under consideration. • When a model is generated, the goal is to fit it to the entire data, not just a sample searched. Assumptions often made about independence of data may be incorrect, thus leading to errors in the resulting model. Any model should be statistically significant, meaningful, and valid. • An often used tool in data mining and machine learning is one of sampling. Here a subset of the total population is examined, and a generalization (model) about the entire population is made from this subset. • The term exploratory data analysis describes the fact that the data can actually drive the creation of the model and any statistical characteristics. • Some data mining applications determine correlations among data. These relationships, however, are not casual in nature. Care must be taken when assigning significance to such relationships.
  • 10. 2.9 Machine Learning • Data mining involves not only modeling but also the development of effective and efficient algorithms and data structures to perform the modeling on large data sets. • Machine learning is the area of AI that examines how to write programs that can learn. In data mining, machine learning is often used for prediction or classification. • Predictive modeling is done in two phases. During the training phase, historical or sampled data are used to create a model that represents those data. It is assumed to hold for the whole database and its future states. The testing phase then applies this model to the remaining and future data. • With supervised learning a sample of the database is used to train the system to properly perform the desired task. The quality of the training data determines how well the program learns. With unsupervised learning there is no knowledge of the correct answers of applying the model to the data. • The objective for data mining is to uncover useful information and provide it to humans, while machine learning research is focused more on the learning portion.
  • 11.
  • 12. References: Dunham, Margaret H. “Data Mining: Introductory and Advanced Topics”. Pearson Education, Inc., 2003.