Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Work Together Effectively Cross Media Concept and Entity Driven Search
1. Work Together Effectively
Cross Media Concept and
Entity Driven Search for
Enterprise
Chalitha Perera and Dileepa Jayakody
R&D Engineers
2. Work Together Effectively
• Headquartered in London with office in Colombo, Sri Lanka
• Focused on delivering enterprise content management solutions
• Our Skills
3. Work Together Effectively
Zaizi R&D Department
• Giving sense to the content
– Enriching it semantically
• Adding value to ECM/CMS
– More structured content, easy to manage, link and search
• Improving search
– Across different domains, data sources, User Experience
• Machine Learning applied research
5. Work Together Effectively
Problem
• Unstructured Text Content
– Text documents, PDFs, Word …
• Rapid growth in multimedia content
• Heterogeneous Data Sources
– ECMs (Alfresco, Sharepoint), File System,
Confluence, JIRA …
• Data is not useful without effective methods for
– Knowledge Extraction
– Information Retrieval
6. Work Together Effectively
Current Enterprise Search
Limitations
• Limited to keyword based search
• Search context is not considered
• Ambiguity of terms
• Low precision
• Inability to properly handle multimedia files
7. Work Together Effectively
Desired traits of Solution
• Semantically Enhance documents
– Unstructured text
– Multimedia documents
• Cross media search
• Search with semantic concepts and entities
• Federated Search
– Search across different content repositories
– User permissions
8. Work Together Effectively
Sensefy
• Semantic Enterprise Search Engine
• Cross Media Search
• Federated Search
• Smart Search Assistance
• Open Source
10. Work Together Effectively
Repository Crawler
• Four types of connectors
– Repository Connectors
– Authority Connectors
– Transformation Connectors
– Output Connectors
• Connect different source repositories with different target indexes
– Source repositories (Alfresco, Sharepoint, Confluence etc)
– Target Indexes (Solr, ElasticSearch, Amazon CloudSearch)
• Security Model to enforce source repository security policies
11. Work Together Effectively
Media In Context (MICO)
Platform
• MICO provides an integrated platform for
– Cross media analysis
– Metadata publishing
– Metadata querying
• Sensefy uses MICO as the cross media analysis engine to extract entities and concepts
from multimedia
13. Work Together Effectively
Semantic Content Enrichment
• Named Entity Recognition
– People, places, organizations and concepts
• Entity Linking
– DBpedia, Yago, Custom Enterprise knowledge bases
• Entity Disambiguation
14. Work Together Effectively
Entity Search with Suggestions
• Named Entity Suggestions
• Ability to query with disambiguated entities
• Search results with high precision
– Keyword search results for “ronaldo”- “Cristiano Ronaldo” and “Ronaldo”
– Entity Search - will contain only the documents related to selected entity