2. Outline
Introduction
Data Mining Vs. Text Mining
Motivation for Text Mining
I/O Model for Text Mining
Steps for Text Mining
Key Terms in Text Mining
Text Mining Frameworks
Merits of Text Mining
Applications of Text Mining
Demerits of Text Mining
References
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
3. Introduction
Text Mining is a Discovery
Text Mining is also referred as Text Data Mining (TDM)
and Knowledge Discovery in Textual Database (KDT).
Text Mining is used to extract relevant information or
knowledge or pattern from different sources that are in
unstructured or semi-structured form.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
4. Introduction Cont.
Extract and discover knowledge hidden in text
automatically
Aid domain experts by automatically:
identifying concepts
extracting facts/relations
discovering implicit links
generating hypotheses
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
5. Data Mining vs. Text Mining
Data Mining Text Mining
Process directly Linguistic processing or natural
language processing (NLP)
Identify causal relationship Discover heretofore unknown
information
Structured Data Semi-structured & Unstructured
Data (Text)
Structured numeric transaction
data residing in rational data
warehouse
Applications deal with much
more diverse and eclectic
collections of systems and
formats
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
6. Motivation for Text Mining
Approximately 90% of the world’s data is held in
unstructured formats (source: Oracle Corporation)
Information intensive business processes demand that we
transcend from simple document retrieval to “knowledge”
discovery.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
7. Input-Output Model for Text Mining
Input
Text Mining
Technique
Output
Patterns
Connections
Trends
Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
8. Steps for Text Mining
Pre-Processing the Text
Applying Text Mining Techniques
Summarization
Classification
Clustering
Visualization
Information Extraction
Analyzing the Text
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
9. Keywords Terms in Text Mining
Information Extraction (IE)
The science of searching for
Information in documents
Documents themselves
Metadata which describe
documents
Text, sound, images or data,
within database: relational
stand-alone database or
hypertext networked
databases such as the
Internet or intranets.
Artificial Intelligence (AI)
Artificial intelligence
(AI) is a branch of
computer science and
engineering that deals
with intelligent behavior,
learning, and adaptation
in machines.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
10. Merits of Text Mining
Database limits itself to Storage of less Information
whereas Text Mining overcomes this limitation
Extraction of relevant Information and Relationships
from Natural Documents
Extraction of Information from Unstructured or Semi-
structured Documents
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
11. Applications of Text Mining
Analysis of Market Trends
Classification Technique
Information Extraction Technique
Analysis and Screening of Junk Emails
Classification on the basis of pre-defined frequently
occurring items
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
12. Demerits of Text Mining
Requires Initial Learned Information System for
Initial Extraction
Suitable programs are not been defined to Analyze
Text from Mining Knowledge or Information
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007
13. References
[1] R Baeza-Yates and B Ribeiro-Neto. “Modern Information Retrieval”, ACM
Press, New York, 1999.
[2] Ning Zhong, Yuefeng Li and T. Grance, “Effective Pattern Discovery for Text
Mining,” IEEE Transactions on Knowledge and Data Engineering, Vol. 24, No. 1,
January 2012.
[3] Raymond J Mooney and Un Yong Nahm, “ Text Mining with Information
Extraction”, Proceedings of the 4th International MIDP Colloquium, pages 141-
160, Van Schaik Pub., South Africa, 2005.
[4] M E Califf and R J Mooney, “Relational Learning of Pattern-Match Rules for
Information Extraction”, Proceedings of the 16th National Conference on Artificial
Intelligence (AAAI-99), pages 328-334, Orlando, FL, July 1999.
[5] D Freitag and N Kushmerick, “Boosted Wrapper Induction”, Proceedings of
the 17th National Conference on Artificial Intelligence (AAAI-2000), pages 577-
583, Austin, TX, July 2000.
Prakhyath Rai, Asst. Professor, Dept. of ISE, SCEM, Mangaluru-575007