SlideShare a Scribd company logo
1 of 30
Download to read offline
Data Mining and Data Warehouse
 INTRODUCTION
 DATA MINING
 WHY DATA MINING
 APPLICATION OF DATA MINING
 STEPS OF DATA MINING
 DATA MINING TECHNIQUES
 THREAT OF DATA MINING
 SOLUTION OF THREAT
 ROLE OF DATA MINING
 DATA WAREHOUSE
 OLTP & OLAP
 DATA MINING TOOLS
 LATEST RESEARCH
INTRODUCTION
Data mining, the extraction of hidden predictive information
from large databases, is a powerful new technology with great
potential to help companies focus on the most important
information in their data warehouses.
DATA MINING
It is extraction of previously unknown, valid and understandable
information or pattern from data in repositories or sources :
 Databases
 Text files
 Social networks
 Computer simulation
The information obtained should be such that is can be used in any
organizations and enterprises for business making.
Why Data Mining ?
Data. Data everywhere yet
 I can’t find the data I need
 I can’t get the data I need
 I can’t understand the data I found
 I can’t use the data I found
• Data explosion problem
Advance data collection tools and database technology lead to
tremendous amounts of data stored in database.
• We are drawing in data, but starving for
knowledge!
• Solution: Data warehousing and Data mining
- Data warehousing and on-line analytical processing.
- Extraction of interesting knowledge using data mining.
APPLICATION OF DATA MINING
Data Mining is primarily used today by companies with a strong
consumer focus — retail, financial, communication, and marketing
organizations.
1. FINANCE INDUSTRY
Credit Card Analysis
2. INSURANCE INDUSTRY
Claims and Fraud Analysis
3. TELECOMMUNICATION
Call Record Analysis
4. TRANSPORT
Logistics Management
5. CONSUMER GOODS
Promotion Analysis
6. SCIENTIFIC RESERCH
Image, Video, Speech
7. UTILITIES
Power Usage Analysis
STEPS OF DATA MINING
 Data integration
 Data selection
 Data transformation
 Data mining
 Pattern evaluation
 Knowledge presentation
Data Mining and Data Warehouse
DATA MINING TECHNIQUES
Classification and Prediction
example – Focused Hiring
Cluster Analysis
example – Market Segmentation
Outlier Analysis
example – Fraud Detection
Association Analysis
example – Market Basket Analysis
Evolution Analysis
example – Forecasting stock market index using Time series Analysis
Threat To Privacy From Data Mining
They data mine information about your buying habits, sites you surf, so they
can personalize your search results when you use their search engine. It's
both frightening but on the other hand, in theory it's a way for companies to
tailor your online experience. The problem, of course, is that while generally
the data isn't scoured by humans, it is used by machines.
SOLUTION OF DATA MINING THREAT
SOLUTIONS :
 Purposes Specification & Use Limitation
 Openness
 Security Measures like Encryption
ROLE OF DATA MINING IN IT
Business Intelligence
Model Tool Method
Behavioral Basics
Information TechnologyData
Problem
Decision
DATA WAREHOUSE
Data warehousing is a technology that aggregates
structured data from one or more sources so that it can
be compared and analyzed for greater business
intelligence.
Data Mining and Data Warehouse
DATA WAREHOUSE
 Data warehouse provides the enterprise with a
memory.
 Data Mining provides enterprise with intelligence.
OLTP & OLAP
On-Line Transaction Processing (OLTP)
Short, simple, frequent queries and modifications
Each involving a small number of tuples
Example – answering queries from a web interface, sales at cash registers,
selling airline tickets.
On-line Application Processing (OLAP)
Few but complex queries --- may run for hours.
Queries do not depend on having an absolutely up-to-date
Database.
Example – analyst at Wal-mart look for items with increasing sales in some
region.
Data Mining and Data Warehouse
DATA MINING TOOLS
 Microsoft SQL Server 2005
 Microsoft SQL Server 2008
 Oracle Data Mining
 DB Miner
Latest Research and Reviews on Data
Mining
1. Systematic discovery of mutation-specific synthetic lethal by mining pan-
cancer human primary tumor data.
2. Multi-label Learning for Predicting the Activities of Antimicrobial
Peptides.
3. Semantic correction system - Little complex but interesting. Generally
retried text faces semantic error, hence leads to wrong result. Applying
this as preprocessing leads to better outcomes.
4. Syntactic correction system - Much needed now a days. Non-English
speakers creates much syntactical error. It can also be used as
preprocessing job in many projects. So you algorithm should
automatically detect such errors and suggest correct grammar.
5. Search engine for Wikipedia - Wikipedia data available as dump file.
Check dbpedia for reference. Apply indexing techniques and build
small kind of SE for wiki pages. As Wikipedia already provides this
functionality but you can work on better user experience, result
optimization.
6. Twitter tweets classifier - Pretty easy and interesting too. Creating
learning system for various categories kind of Sports, entertainment,
business, politics, Hollywood etc. Train the classifier (naive bayes,
SVM) and predict the category for incoming tweets.
7. Sentiment analysis for twitter, review, conversations - There are few
packages available in R which can help to perform this job. One needs to add
few additional feature on top of that to make more intuitive. Nltk, Stanford,
good open source tools for the same.
8. Spam mail detection - Again learning based classification system. Train
the classifier using users pre-selected spam mail which would be able to
classify new upcoming mails. If uses mark new mail as spam, then
retrain(may be some other better option).
9. Sarcasms detection - This can be very interesting one. In sentiment
analysis we identify users sentiment regarding something's, here we identify
sarcasm expressed by users. Check out Page on psu.edu - Sarcasm detection
on twitter
Data Mining and Data Warehouse

More Related Content

What's hot (20)

data-mining-tutorial.ppt
data-mining-tutorial.pptdata-mining-tutorial.ppt
data-mining-tutorial.ppt
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Data Mining
Data MiningData Mining
Data Mining
 
Big Data
Big DataBig Data
Big Data
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data mining Data mining
Data mining
 
Data science 101
Data science 101Data science 101
Data science 101
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Data science project presentation
Data science project presentationData science project presentation
Data science project presentation
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big data
Big dataBig data
Big data
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Big data
Big dataBig data
Big data
 
Introduction to Data Science & Python.pdf
Introduction to Data Science & Python.pdfIntroduction to Data Science & Python.pdf
Introduction to Data Science & Python.pdf
 

Similar to Data Mining and Data Warehouse

A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentationmillerca2
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data scienceVipul Kalamkar
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investmentvijayk23x
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
Interview for saby upadhyay
Interview for  saby upadhyayInterview for  saby upadhyay
Interview for saby upadhyayAnthonyBennet
 
Interview for saby upadhyay
Interview for  saby upadhyayInterview for  saby upadhyay
Interview for saby upadhyayCameronDonovan
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of dataHarsha MV
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptxRupaliKute3
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...WiMLDSMontreal
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
How Marketing On Your Customer
How Marketing On Your CustomerHow Marketing On Your Customer
How Marketing On Your CustomerMia Malone
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Business analytics Project.docx
Business analytics Project.docxBusiness analytics Project.docx
Business analytics Project.docxkushi62
 

Similar to Data Mining and Data Warehouse (20)

A Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining PresentationA Practical Approach To Data Mining Presentation
A Practical Approach To Data Mining Presentation
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Gerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and InvestmentGerenral insurance Accounts IT and Investment
Gerenral insurance Accounts IT and Investment
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
Interview for saby upadhyay
Interview for  saby upadhyayInterview for  saby upadhyay
Interview for saby upadhyay
 
Interview for saby upadhyay
Interview for  saby upadhyayInterview for  saby upadhyay
Interview for saby upadhyay
 
Harness the power of data
Harness the power of dataHarness the power of data
Harness the power of data
 
Data Science- Basics.pptx
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...
 
Big data
Big dataBig data
Big data
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
IT Ready - DW: 1st Day
IT Ready - DW: 1st Day IT Ready - DW: 1st Day
IT Ready - DW: 1st Day
 
How Marketing On Your Customer
How Marketing On Your CustomerHow Marketing On Your Customer
How Marketing On Your Customer
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Business analytics Project.docx
Business analytics Project.docxBusiness analytics Project.docx
Business analytics Project.docx
 

Recently uploaded

CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 

Recently uploaded (16)

CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 

Data Mining and Data Warehouse

  • 2.  INTRODUCTION  DATA MINING  WHY DATA MINING  APPLICATION OF DATA MINING  STEPS OF DATA MINING  DATA MINING TECHNIQUES  THREAT OF DATA MINING  SOLUTION OF THREAT  ROLE OF DATA MINING  DATA WAREHOUSE  OLTP & OLAP  DATA MINING TOOLS  LATEST RESEARCH
  • 3. INTRODUCTION Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses.
  • 4. DATA MINING It is extraction of previously unknown, valid and understandable information or pattern from data in repositories or sources :  Databases  Text files  Social networks  Computer simulation The information obtained should be such that is can be used in any organizations and enterprises for business making.
  • 5. Why Data Mining ? Data. Data everywhere yet  I can’t find the data I need  I can’t get the data I need  I can’t understand the data I found  I can’t use the data I found
  • 6. • Data explosion problem Advance data collection tools and database technology lead to tremendous amounts of data stored in database. • We are drawing in data, but starving for knowledge! • Solution: Data warehousing and Data mining - Data warehousing and on-line analytical processing. - Extraction of interesting knowledge using data mining.
  • 7. APPLICATION OF DATA MINING Data Mining is primarily used today by companies with a strong consumer focus — retail, financial, communication, and marketing organizations.
  • 9. 2. INSURANCE INDUSTRY Claims and Fraud Analysis
  • 15. STEPS OF DATA MINING  Data integration  Data selection  Data transformation  Data mining  Pattern evaluation  Knowledge presentation
  • 17. DATA MINING TECHNIQUES Classification and Prediction example – Focused Hiring Cluster Analysis example – Market Segmentation Outlier Analysis example – Fraud Detection Association Analysis example – Market Basket Analysis Evolution Analysis example – Forecasting stock market index using Time series Analysis
  • 18. Threat To Privacy From Data Mining They data mine information about your buying habits, sites you surf, so they can personalize your search results when you use their search engine. It's both frightening but on the other hand, in theory it's a way for companies to tailor your online experience. The problem, of course, is that while generally the data isn't scoured by humans, it is used by machines.
  • 19. SOLUTION OF DATA MINING THREAT SOLUTIONS :  Purposes Specification & Use Limitation  Openness  Security Measures like Encryption
  • 20. ROLE OF DATA MINING IN IT Business Intelligence Model Tool Method Behavioral Basics Information TechnologyData Problem Decision
  • 21. DATA WAREHOUSE Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence.
  • 23. DATA WAREHOUSE  Data warehouse provides the enterprise with a memory.  Data Mining provides enterprise with intelligence.
  • 24. OLTP & OLAP On-Line Transaction Processing (OLTP) Short, simple, frequent queries and modifications Each involving a small number of tuples Example – answering queries from a web interface, sales at cash registers, selling airline tickets. On-line Application Processing (OLAP) Few but complex queries --- may run for hours. Queries do not depend on having an absolutely up-to-date Database. Example – analyst at Wal-mart look for items with increasing sales in some region.
  • 26. DATA MINING TOOLS  Microsoft SQL Server 2005  Microsoft SQL Server 2008  Oracle Data Mining  DB Miner
  • 27. Latest Research and Reviews on Data Mining 1. Systematic discovery of mutation-specific synthetic lethal by mining pan- cancer human primary tumor data. 2. Multi-label Learning for Predicting the Activities of Antimicrobial Peptides. 3. Semantic correction system - Little complex but interesting. Generally retried text faces semantic error, hence leads to wrong result. Applying this as preprocessing leads to better outcomes.
  • 28. 4. Syntactic correction system - Much needed now a days. Non-English speakers creates much syntactical error. It can also be used as preprocessing job in many projects. So you algorithm should automatically detect such errors and suggest correct grammar. 5. Search engine for Wikipedia - Wikipedia data available as dump file. Check dbpedia for reference. Apply indexing techniques and build small kind of SE for wiki pages. As Wikipedia already provides this functionality but you can work on better user experience, result optimization. 6. Twitter tweets classifier - Pretty easy and interesting too. Creating learning system for various categories kind of Sports, entertainment, business, politics, Hollywood etc. Train the classifier (naive bayes, SVM) and predict the category for incoming tweets.
  • 29. 7. Sentiment analysis for twitter, review, conversations - There are few packages available in R which can help to perform this job. One needs to add few additional feature on top of that to make more intuitive. Nltk, Stanford, good open source tools for the same. 8. Spam mail detection - Again learning based classification system. Train the classifier using users pre-selected spam mail which would be able to classify new upcoming mails. If uses mark new mail as spam, then retrain(may be some other better option). 9. Sarcasms detection - This can be very interesting one. In sentiment analysis we identify users sentiment regarding something's, here we identify sarcasm expressed by users. Check out Page on psu.edu - Sarcasm detection on twitter