Unveiling Design Patterns: A Visual Guide with UML Diagrams
Dev ops-presentation
1. Text mining
Fuzzy document classification
Using Elasticsearch
Lev Ozeryansky
2. Identity Card
• Merging of Ankor and We!
• Owned by Hilan (publicly traded in Tel Aviv Stock exchange)
• Fast growing IT integration company
• Over 2000 systems installed and maintained
• Over 1000 leading customers - Hi-tech, Industry, Academy, Banks,
Insurance,
• Strong technological team – over 45 engineers, professional services
and project managers
• Over 120 employees
• Four main divisions – Infrastructure, Big Data, Cloud, Cyber
4. What is classification
• Document classification as document categorization.
• Using classification.
• Our classification data source.
• What we do with?
• Java programmer.
• .NET programmer.
11. w-shingling
• In natural language processing a w-shingling is a set of
unique "shingles"—contiguous subsequences of tokens in
a document. (Wikipedia)
• Tokenization
• Elasticsearch analyze mechanism