SlideShare a Scribd company logo
1 of 8
Apache UIMA &
Semantic Search
      Tommaso Teofili
   tommaso@apache.org
Apache UIMA - what is it?
Unstructered Information Management Architecture

Architectural Framework to manage (eventually large) volumes of
unstructered data

Former IBM Alphaworks project donated to ASF

Currently an Incubator podling ( http://incubator.apache.org/uima )

Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
Apache UIMA - how?

Many pluggable reusable components (described via XML)
Analysis Engines (primitive or aggregates)
Asynchronous scaleout (JMS, Apache ActiveMQ)
Flow controllers
Type systems
Apache UIMA - what is NOT?

It’s not a semantic search tool inherently
the “Lucas example”
the semantic search package for UIMA is not open source!
( http://www.alphaworks.ibm.com/tech/uima/download )
UIMA & Semantic Search
Metadata generation engine for CM systems
Data enrichment
Linked data
Jeopardy (see http://www.research.ibm.com/deepqa/
faq.shtml#24 )
Let’s see...
RE Market Analysis & UIMA
Macpi: a real estate market analysis tool developed at DIA
   Webpipe (crawling and wrapping data)
   Apache UIMA
   Spring framework
Knowledge extraction
Extract metadata with Apache UIMA to build our search
Apache UIMA & AlchemyAPI
AlchemyAPI from Orchestr8 services wrapped as UIMA AEs
Named-entity recognition, word disambiguation
   “Barack Obama” is http://dbpedia.org/resource/Barack_Obama

Exploiting linked data
   enriching free text with DBpedia, GeoNames, Freebase URIs

Plugging with other UIMA AEs
   providing you with a reusable component to deal with Linked Data
UIMA & Semantic Search



  it’s demo time!

More Related Content

Similar to Apache UIMA and Semantic Search

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyJean-Sebastien Delfino
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveAll Things Open
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesHao Chen
 
ImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentINBOX International inc.
 
Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Luca Ponzanelli
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...cresco
 
Cms an overview
Cms an overviewCms an overview
Cms an overviewkmusthu
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementHao Chen
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014Amazon Web Services
 
Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating SystemSaurabh Gupta
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationETCenter
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovSoftServe
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis PlatformValentin Kropov
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...Trieu Nguyen
 
IBM i Resources Retiring?
IBM i Resources Retiring?IBM i Resources Retiring?
IBM i Resources Retiring?HelpSystems
 

Similar to Apache UIMA and Semantic Search (20)

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache TuscanyApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
ApacheCon NA 2010 - Developing Composite Apps for the Cloud with Apache Tuscany
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics Perspective
 
Apache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New FeaturesApache Eagle: Architecture Evolvement and New Features
Apache Eagle: Architecture Evolvement and New Features
 
Lighty
LightyLighty
Lighty
 
ImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules DevelopmentImpressCMS Persistable Framework: Rapid Modules Development
ImpressCMS Persistable Framework: Rapid Modules Development
 
Cloud Computing 2019
Cloud Computing 2019Cloud Computing 2019
Cloud Computing 2019
 
Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012Harnessing Stack Overflow for the IDE - RSSE 2012
Harnessing Stack Overflow for the IDE - RSSE 2012
 
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
Students of Navgujarat College of Computer Applications, Ahmedabad felt excit...
 
Cms an overview
Cms an overviewCms an overview
Cms an overview
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture Evolvement
 
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
(WEB203) Building a Website That Costs Pennies to Operate | AWS re:Invent 2014
 
Amoeba distributed operating System
Amoeba distributed operating SystemAmoeba distributed operating System
Amoeba distributed operating System
 
Graymeta C4 use case, Deduplication
Graymeta C4 use case, DeduplicationGraymeta C4 use case, Deduplication
Graymeta C4 use case, Deduplication
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis Platform
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
IBM i Resources Retiring?
IBM i Resources Retiring?IBM i Resources Retiring?
IBM i Resources Retiring?
 

More from Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakTommaso Teofili
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaTommaso Teofili
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 

More from Tommaso Teofili (15)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Apache UIMA and Semantic Search

  • 1. Apache UIMA & Semantic Search Tommaso Teofili tommaso@apache.org
  • 2. Apache UIMA - what is it? Unstructered Information Management Architecture Architectural Framework to manage (eventually large) volumes of unstructered data Former IBM Alphaworks project donated to ASF Currently an Incubator podling ( http://incubator.apache.org/uima ) Apache UIMA is an Oasis standard ( http://www.oasis-open.org )
  • 3. Apache UIMA - how? Many pluggable reusable components (described via XML) Analysis Engines (primitive or aggregates) Asynchronous scaleout (JMS, Apache ActiveMQ) Flow controllers Type systems
  • 4. Apache UIMA - what is NOT? It’s not a semantic search tool inherently the “Lucas example” the semantic search package for UIMA is not open source! ( http://www.alphaworks.ibm.com/tech/uima/download )
  • 5. UIMA & Semantic Search Metadata generation engine for CM systems Data enrichment Linked data Jeopardy (see http://www.research.ibm.com/deepqa/ faq.shtml#24 ) Let’s see...
  • 6. RE Market Analysis & UIMA Macpi: a real estate market analysis tool developed at DIA Webpipe (crawling and wrapping data) Apache UIMA Spring framework Knowledge extraction Extract metadata with Apache UIMA to build our search
  • 7. Apache UIMA & AlchemyAPI AlchemyAPI from Orchestr8 services wrapped as UIMA AEs Named-entity recognition, word disambiguation “Barack Obama” is http://dbpedia.org/resource/Barack_Obama Exploiting linked data enriching free text with DBpedia, GeoNames, Freebase URIs Plugging with other UIMA AEs providing you with a reusable component to deal with Linked Data
  • 8. UIMA & Semantic Search it’s demo time!

Editor's Notes

  1. Hello everybody, I am Tommaso Teofili, I am from Rome and I work in Sourcesense and I’m an Apache UIMA committer. In this talk I am going to tell you about Apache UIMA & Semantic Search.
  2. Apache UIMA stands for Unstructerd Information Management Architecture, it's an architectural framework to manage large volumes of unstructured data. It was an IBM project donated to Apache. It's currently an Incubator podling and the new 2.3.0 release is coming. It's also been approved as an Oasis standard. By the way, I became Apache UIMA committer on August.
  3. UIMA is made of components, which can be put in a pipeline to be executed. Every Apache UIMA component is described via XML descriptors. The most important of which is the Analysis Engine, that is responsible for the very analysis of documents. Each AE can be aggregated using an aggregate XML descriptor. To manage large volumes of data UIMA can scale using Asynchronous Scaleout. Each UIMA pipeline is made of AEs, one or more flow controllers and CAS consumers, the component which is responsible of treating the output (usually CASs).
  4. Apache UIMA is not inherently a semantic search tool. The semantic search tool package for UIMA is not open source. Lucas is a CAS consumer for UIMA capable of putting UIMA annotations inside Lucene indexes, a semi-semantic approach.
  5. UIMA is well suited as a metadata generation engine for CM systems. It can enrich data with annotations, entities, categories and so on, eventually linking them. For those who know the US Jeopardy game, IBM it's building a system able to answer open-domain questions against humans and it's based on UIMA-AS. We'll see two examples of the power of UIMA to boost semantic search.
  6. Here's the first one: Recently during my college studies I used UIMA to extract metadata to build semantic searches on a real estate market analysis tool called Macpi. zone statistics -> UIMA to find zones, price, filter out bad matchings frequent announces -> disambiguation, data filtering
  7. Another little demo I’m going to show is about integrating Apache UIMA and AlchemyAPI from Ochestr8. AlchemyAPI provides, you guess, services to extract Entities from text, but what we are interested in here is using the linked data, to exploit, for example DBPedia, Freebase and so on data clouds. So the power of UIMA here is again putting together in a pipeline many components, once my system learned I can easily substitute this component with a UIMA Dictionary using the ConceptMapper. I get Alchemy component then I can put the OpenCalais component, my personal dictionary of specific terms, a machine learning engine based on what UIMA extracts and so on. Here we’ll see indexing documents by concepts.
  8. Ok, let’s go to the demos!