SlideShare a Scribd company logo
1 of 16
Text mining 
Fuzzy document classification 
Using Elasticsearch 
Lev Ozeryansky
Identity Card 
• Merging of Ankor and We! 
• Owned by Hilan (publicly traded in Tel Aviv Stock exchange) 
• Fast growing IT integration company 
• Over 2000 systems installed and maintained 
• Over 1000 leading customers - Hi-tech, Industry, Academy, Banks, 
Insurance, 
• Strong technological team – over 45 engineers, professional services 
and project managers 
• Over 120 employees 
• Four main divisions – Infrastructure, Big Data, Cloud, Cyber
Technology Edge
What is classification 
• Document classification as document categorization. 
• Using classification. 
• Our classification data source. 
• What we do with? 
• Java programmer. 
• .NET programmer.
Data source
The mathematics 
• Let be class set 
• Let be documents set 
• Classification function
Classification method 
• Cosine similarity 
• Function
Build document class 
vector 
• Java programmer 
• Java 
• 5 
• Hibernate 
• .NET programmer 
• C# 
• 5 
• Nhibernate
Let index classificators 
• Add weight manually. 
• For Java programmer: 
• Java = 0.7 
• 5 = 0.5 
• Hibernate = 0.3 
• For .NET programmer 
• C# = 0.7 
• 5 = 0.5 
• Nhibernate = 0.3
DEMO
w-shingling 
• In natural language processing a w-shingling is a set of 
unique "shingles"—contiguous subsequences of tokens in 
a document. (Wikipedia) 
• Tokenization 
• Elasticsearch analyze mechanism
DEMO
Classification process 
• Tokens array. 
• Classification query. 
• Use terms query when terms array == tokens array 
• Two vectors 
• Vector of filtered tokens 
• Classification vector
DEMO
Classification process 
• SciPy to calculate distance.
Q&A

More Related Content

What's hot

Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Edureka!
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITFrank van der Linden
 
Bank management system c++
Bank management system c++Bank management system c++
Bank management system c++Akshay Sorathia
 
Architecture - why so serious?
Architecture - why so serious?Architecture - why so serious?
Architecture - why so serious?Barbara Fusinska
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsPushkar Chivate
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchPeter Steenbergen
 
Scala for java developers 6 may 2017 - yeni
Scala for java developers   6 may 2017 - yeniScala for java developers   6 may 2017 - yeni
Scala for java developers 6 may 2017 - yeniBaris Dere
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Ike Ellis
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database ChoicesLynn Langit
 
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CAAdobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CAJaemi Bremner
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshopRory Preddy
 
Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackLynn Langit
 
When Our Serverless Team Chooses Containers
When Our Serverless Team Chooses ContainersWhen Our Serverless Team Chooses Containers
When Our Serverless Team Chooses ContainersSam Goldstein
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBAndrew Siemer
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive servicesRick van den Bosch
 

What's hot (19)

Linq architecture
Linq   architectureLinq   architecture
Linq architecture
 
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
Kibana Tutorial | Kibana Dashboard Tutorial | Kibana Elasticsearch | ELK Stac...
 
Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }Elasticsearch { "Meetup" : "talk" }
Elasticsearch { "Meetup" : "talk" }
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
 
Bank management system c++
Bank management system c++Bank management system c++
Bank management system c++
 
Architecture - why so serious?
Architecture - why so serious?Architecture - why so serious?
Architecture - why so serious?
 
Cloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure toolsCloud architectural patterns and Microsoft Azure tools
Cloud architectural patterns and Microsoft Azure tools
 
Getting started with Laravel & Elasticsearch
Getting started with Laravel & ElasticsearchGetting started with Laravel & Elasticsearch
Getting started with Laravel & Elasticsearch
 
Laravel and SOLR
Laravel and SOLRLaravel and SOLR
Laravel and SOLR
 
Scala for java developers 6 may 2017 - yeni
Scala for java developers   6 may 2017 - yeniScala for java developers   6 may 2017 - yeni
Scala for java developers 6 may 2017 - yeni
 
Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014Tips & Tricks SQL in the City Seattle 2014
Tips & Tricks SQL in the City Seattle 2014
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CAAdobe Spark Meetup - 9/19/2018 - San Jose, CA
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
AWS for Java Developers workshop
AWS for Java Developers workshopAWS for Java Developers workshop
AWS for Java Developers workshop
 
Machine Learning on the Microsoft Stack
Machine Learning on the Microsoft StackMachine Learning on the Microsoft Stack
Machine Learning on the Microsoft Stack
 
When Our Serverless Team Chooses Containers
When Our Serverless Team Chooses ContainersWhen Our Serverless Team Chooses Containers
When Our Serverless Team Chooses Containers
 
Test driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDBTest driving Azure Search and DocumentDB
Test driving Azure Search and DocumentDB
 
Getting started with Azure Cognitive services
Getting started with Azure Cognitive servicesGetting started with Azure Cognitive services
Getting started with Azure Cognitive services
 

Viewers also liked

HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER June_Johnson
 
I'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not WorkingI'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not WorkingIrene Cauwels
 
CV for Jason Cowan
CV for Jason CowanCV for Jason Cowan
CV for Jason CowanJason Cowan
 
MBarberResume1Word-A
MBarberResume1Word-AMBarberResume1Word-A
MBarberResume1Word-AMarisa Barber
 
Production process of print advert
Production process of print advertProduction process of print advert
Production process of print advertMatthewTurnbull1998
 
Activity about parts of brain
Activity about parts of brain Activity about parts of brain
Activity about parts of brain Joel Delliva
 
Narrative theory
Narrative theoryNarrative theory
Narrative theoryKiraRachael
 
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...CASRAI
 
Cathodic protection in_practise
Cathodic protection in_practiseCathodic protection in_practise
Cathodic protection in_practiseabdallahbeca
 
4th primary school of larissa!!!
4th primary school of larissa!!!4th primary school of larissa!!!
4th primary school of larissa!!!sofiathoma
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Мень А. Библиологический словарь (Ii том. k-п) - 2002
Мень А.   Библиологический словарь (Ii том. k-п) - 2002Мень А.   Библиологический словарь (Ii том. k-п) - 2002
Мень А. Библиологический словарь (Ii том. k-п) - 2002Михаил Ломоносов
 

Viewers also liked (19)

HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER HOW TO HACK FACEBOOK BY COMPUTER
HOW TO HACK FACEBOOK BY COMPUTER
 
Vu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNETVu-2015_tissue-plasticity-PNET
Vu-2015_tissue-plasticity-PNET
 
I'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not WorkingI'm Trying To Relax But It's Not Working
I'm Trying To Relax But It's Not Working
 
CV for Jason Cowan
CV for Jason CowanCV for Jason Cowan
CV for Jason Cowan
 
Target audience
Target audienceTarget audience
Target audience
 
MBarberResume1Word-A
MBarberResume1Word-AMBarberResume1Word-A
MBarberResume1Word-A
 
Biologi-Sel
Biologi-SelBiologi-Sel
Biologi-Sel
 
Production process of print advert
Production process of print advertProduction process of print advert
Production process of print advert
 
Vail Style Guide
Vail Style GuideVail Style Guide
Vail Style Guide
 
Activity about parts of brain
Activity about parts of brain Activity about parts of brain
Activity about parts of brain
 
Narrative theory
Narrative theoryNarrative theory
Narrative theory
 
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
Tutorial: CASRAI Standards Development (for a non-technology audience) - Davi...
 
Pokok bahasan hpp 2014
Pokok bahasan hpp 2014Pokok bahasan hpp 2014
Pokok bahasan hpp 2014
 
Cathodic protection in_practise
Cathodic protection in_practiseCathodic protection in_practise
Cathodic protection in_practise
 
Webinar session 2
Webinar session 2Webinar session 2
Webinar session 2
 
4th primary school of larissa!!!
4th primary school of larissa!!!4th primary school of larissa!!!
4th primary school of larissa!!!
 
Bhungroo_Brochure
Bhungroo_BrochureBhungroo_Brochure
Bhungroo_Brochure
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Мень А. Библиологический словарь (Ii том. k-п) - 2002
Мень А.   Библиологический словарь (Ii том. k-п) - 2002Мень А.   Библиологический словарь (Ii том. k-п) - 2002
Мень А. Библиологический словарь (Ii том. k-п) - 2002
 

Similar to Dev ops-presentation

Characerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT NetworkCharacerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT NetworkHans Ashlock
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Self Service for IT Infrastructure
Self Service for IT Infrastructure Self Service for IT Infrastructure
Self Service for IT Infrastructure Cisco DevNet
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningVishwas Lele
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBJohn Bennett
 
AWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-RayAWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-RayEitan Sela
 
Selenium Online Training
Selenium Online Training Selenium Online Training
Selenium Online Training Nagendra Kumar
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overviewscrazzl
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easyTokyo Azure Meetup
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017Roy Russo
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014NoSQLmatters
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark PresentationStephen Borg
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeAndrus Adamchik
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Spark Summit
 

Similar to Dev ops-presentation (20)

Characerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT NetworkCharacerizing and Validating QoS in the Emerging IoT Network
Characerizing and Validating QoS in the Emerging IoT Network
 
Rdbms
RdbmsRdbms
Rdbms
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Self Service for IT Infrastructure
Self Service for IT Infrastructure Self Service for IT Infrastructure
Self Service for IT Infrastructure
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Delivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDBDelivering big content at NBC News with RavenDB
Delivering big content at NBC News with RavenDB
 
AWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-RayAWS Lambda support for AWS X-Ray
AWS Lambda support for AWS X-Ray
 
Selenium Online Training
Selenium Online Training Selenium Online Training
Selenium Online Training
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Mazenet
MazenetMazenet
Mazenet
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
 
Java
JavaJava
Java
 
Tokyo azure meetup #2 big data made easy
Tokyo azure meetup #2   big data made easyTokyo azure meetup #2   big data made easy
Tokyo azure meetup #2 big data made easy
 
Dev nexus 2017
Dev nexus 2017Dev nexus 2017
Dev nexus 2017
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
 
Tech Spark Presentation
Tech Spark PresentationTech Spark Presentation
Tech Spark Presentation
 
Apache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM AlternativeApache Cayenne: a Java ORM Alternative
Apache Cayenne: a Java ORM Alternative
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
 
ppt_on_java.pptx
ppt_on_java.pptxppt_on_java.pptx
ppt_on_java.pptx
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 

Recently uploaded

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Recently uploaded (20)

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

Dev ops-presentation

  • 1. Text mining Fuzzy document classification Using Elasticsearch Lev Ozeryansky
  • 2. Identity Card • Merging of Ankor and We! • Owned by Hilan (publicly traded in Tel Aviv Stock exchange) • Fast growing IT integration company • Over 2000 systems installed and maintained • Over 1000 leading customers - Hi-tech, Industry, Academy, Banks, Insurance, • Strong technological team – over 45 engineers, professional services and project managers • Over 120 employees • Four main divisions – Infrastructure, Big Data, Cloud, Cyber
  • 4. What is classification • Document classification as document categorization. • Using classification. • Our classification data source. • What we do with? • Java programmer. • .NET programmer.
  • 6. The mathematics • Let be class set • Let be documents set • Classification function
  • 7. Classification method • Cosine similarity • Function
  • 8. Build document class vector • Java programmer • Java • 5 • Hibernate • .NET programmer • C# • 5 • Nhibernate
  • 9. Let index classificators • Add weight manually. • For Java programmer: • Java = 0.7 • 5 = 0.5 • Hibernate = 0.3 • For .NET programmer • C# = 0.7 • 5 = 0.5 • Nhibernate = 0.3
  • 10. DEMO
  • 11. w-shingling • In natural language processing a w-shingling is a set of unique "shingles"—contiguous subsequences of tokens in a document. (Wikipedia) • Tokenization • Elasticsearch analyze mechanism
  • 12. DEMO
  • 13. Classification process • Tokens array. • Classification query. • Use terms query when terms array == tokens array • Two vectors • Vector of filtered tokens • Classification vector
  • 14. DEMO
  • 15. Classification process • SciPy to calculate distance.
  • 16. Q&A