SlideShare a Scribd company logo
1 of 31
Download to read offline
A framework for knowledge extraction, linked data and semantic search.
What do we want computers to do for us?
We have data.
• From 2005 to 2020, the digital universe will grow in
size by a factor of 300, from 30 exabytes to 40 trillion
gigabyte (40 ZB).

• From now until 2020, the digital universe will about
double every two years.

• Volumes of data are projected to reach 5.247 GB per
person with emerging economies playing an
increasingly important role (producing two thirds of
the world data by the end of this decade). 

• Only 0.5% of this data is used today for analysis. 

• The amount of information individuals create
themselves - writing documents, taking pictures,
recording audio - is far less than the information
being created about them in the digital universe.
[IDC I V I E W, 2012]
What do we want computers to do for us?
Text
Images/Video
Audio
"language": "de"
Categorisation,
Summarisation,
Search,
Question/Answer,
…
"label": "outdoor"
Suggest tags,
Image search,
…
Automatic Speech
Recognition,
Speaker identification,
Music classification,
…
[Andrew NG, 2011]We want computers to process data.
Natural Language
Processing
We use it everyday.
[J U RAFSKY & MARTIN, 2008]
a theoretically motivated range of
computational techniques for
analysing naturally occurring
text/speech for the purpose of
achieving human-like language
processing.
Features extraction in text/speech.
Levels of knowledge encoding in language data.
INPUT
Morphologic
Syntactic
Semantic
FEATURES
NLP
{
Parser
Lexical DB
Stemming
AnaphoraPos Tagging
NER
TEXT
NLP
FEATURES
WISDOM
What do we want computers to do with a text?
STRUCTURED
DATA
CONTEXT
We want computers to make sense of unstructured data.
KNOWLEDGE
{
Semantic Lifting
TEXT WISDOM
A practical example.
CONTEXT
Combining Semantic Web technologies with NLP technologies.
KNOWLEDGE
Lucoli
"label":
"Lucoli"
"values":
["13.338889"],
"predicate": "http://
www.w3.org/2003/01/
geo/wgs84_pos#long"
"values":
["42.29194444444445"],
"predicate":
"http://www.w3.org/2003/01/geo/
wgs84_pos#lat"
"values": [
!
!
!
!
!
],
"predicate": "http://
xmlns.com/foaf/0.1/
depiction"
About 20 minutes
car drive from L’Aquila.
…
How we started.
Building an open platform for
knowledge extraction, linked data
and semantic search.

!
Delivering the world’s most
advanced open source 

content analysis and making
linked data publishing and
information discovery accessible
to anyone.
• Incorporating requirements from industry partners:
• CMS companies
• System integrators
• Tool providers
• Inheriting 6 years of IP with R&D on:
• Semantic Information Management and
Publishing (RDF and Semantic Web Technology)
• Semantic Processing
• Conceptual Search
CONTENT ANALYSIS
LINKED DATA PUBLISHING
1
3
Linked Data Cloud
Technology Stack
Text
Legacy Data
Audio/Images
(under development)
CONTENT DISCOVERY2
• Enterprise
Linked Data
• Content
Enhancement
• Semantic Search
• Semantic enhancement process chaining
• Multiple NLP features extraction facilities
• Multiple language support
• Content classification and sentiment analysis
• Graduated as Top Level Project of the Apache
Foundation in September 2012
STANBOL.APACHE.ORG
A Toolbox for Semantic Processing.
SOLR.APACHE.ORG
The Highly Scalable Search Server.
• Based on Apache Lucene
• Various language specific processing procedures
• Highly scalable (Solr cloud) and highly configurable
• Ultra fast indexing/searching, indexes can be merged/
optimised
• Semantic Search available with an easy-to-install
Redlink Plugin
DEV.REDLINK.IO/PLUGINS/SOLR
Adding Semantic Search to Apache Solr.
• Boost your existing Apache Solr installation with
semantic enhancements via Redlink Content Analysis
• Watch the screencast
• Learn more• Customising the semantic enhancements
with user-created vocabularies and Redlink NLP extraction
facilities
Managing vocabularies.
Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data
• Build your first app
• Learn more
• Redlink allows users to create their own Linked Data server for
managing vocabularies or publishing datasets for Linked (Open)
Data projects
• Datasets managed with Redlink can
be made available for content
analysis and linking
• Datasets can be either private (Linked
Enterprise Data) or public (Linked
Open Data)
!
• Public Datasets such as DBpedia, Freebase and
GeoNames are available for de-referencing and interlinking
• Read-Write Linked Data
• Triple store with transactions, versioning
and rule-based reasoning
• SPARQL and LDPath query languages
• Transparent Linked Data Caching
• Graduated as Top Level Project of the Apache
Foundation in November 2013
MARMOTTA.APACHE.ORG
The Open Platform for Linked Data.
An Open Linked Data Project
for Tourism in Salzburg
• Cross platform publishing as more travellers massively begin
using mobile devices
• Multiple Web CMSs (both proprietary and open source) to be
managed simultaneously
• Costly manual curation and interlinking
• Increasing demand for content syndication (from big players like
foursquare as well as from local application developers)
• Need for better SEO especially for events and sites (too regional to
be understood by commercial search engines)
Remixing existing content and creating new value.
A magazine
running on WordPress
An online
booking system
freshly updated content
on locations and events
a database containing:
events, facilities, accommodations, …
Everything we know already
from Wikipedia
the World’s largest
encyclopedia
Using Linked Data to make sense of the information
Linked Data Publishing
• Data from the online booking system (Feratel) is enriched and transformed
in triples using identified vocabularies and ontologies
• Triples are stored in the Redlink triple store in a dedicated context
• RDF data and SPARQL end-points are published to the data website
(data.salzburgerland.com) running CKAN as Linked Open Data
• CKAN makes the data accessibile to third parties in various formats by
querying Redlink
Transforming Feratel Data
in Semantic Knowledge
from SOAP to Linked Data
Ontologies provide a mean
to hold everything together
Data Modelling with LODE
Using LODE: An ontology for
Linking Open Descriptions of
Events
Adding the relationships
between things
Florianifeier
with RDF different data sources are integrated to provide
robot-friendly information that describe real world things
<subject><predicate><object>
Semantic Lifting and
Linked Data Principles
• A “word” or “phrase” becomes an
identifier used to denote
“things” (named entities) existing in
the real world

1.Real-world thing are
unambiguously represented with
web addresses (URI)
2.By accessing these web addresses
(HTTP-URI) usable data is sent in
return using standard formats (RDF,
SPARQL)
3.This data includes links to other
data so that people can discover
more things
"label":"May",
"reference":
“http://dbpedia.org/
resource/May”
!
Type: Thing
"values"["13.7446"],"predicate": "http://
www.w3.org/2003/01/geo/wgs84_pos#long"
values"["47.10222"],"predicate": “http://
www.w3.org/2003/01/geo/wgs84_pos#lat”
"reference":
“http://dbpedia.org/page/Unternberg”
!
Type: Place
“label":"Florianifeier",
"reference":“http://
rdf.salzburgerland.com/
events/event/dea7fde1-5583-4002-97eb-007
4a182fa9c.html”!
Type: Event
Tim Berners-Lee.
LANGUAGE EVENT THING LOCATION
ENGLISH FLORIANIFEIER MAY UNTERNBERG
[Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe]
“This May don't miss the
Florianifeier, we'll have fun
as usual in Unternberg”
Dynamic Semantic Publishing with ordLiftW
• Data from the Redlink triple store is made available for content enrichment
and can be edited using WordLift, a semantic plugin for WordPress.
Data Curation
• Using Linked Data the Web
becomes my new CMS

• information is automatically
imported in WordPress

• posts are connected with
entities

• properties for each entity can
be edited using WordPress

• any change is automatically
reflected in the triple-store and
re-published as Open Data
Using Linked Data and WordLift the Web becomes your new CMS.
editing a blog post
editing an entity
Web Search
19.900 results
no answer
Touristic applications attempting to discover events in Salzburgerland.
“Which events occur in May in Lungau?”
Linked Open Data
Query
5 result
5 answer
Unternberg is a village in the area of Lungauon google.at!!
Better SEO using
Semantic Markup
Florianifeier
Unternberg
• Using schema.org the data
from the triple-store is added
to the pages as semantic
markup
• Search engines can finally
“recognise” entities that were
previously unknown (i.e.
Florianifeier)
ordLiftW
•Media in cross-media context, allowing to
analyse media resources as well as
connected content, including video, images,
audio, text, link structure and metadata;
•Investigate cross-media analysis along the
complete, distributed analysis chain, namely
extraction, metadata publishing, querying
and recommendations;
•Contribute its main software development
results as Open Source components to two
established Apache projects, Apache
Marmotta and Apache Stanbol, simplifying
the use of the technology in industrial
products.
What do we want computers to do with Media?
MICO-PROJECT.EU
“Show me the tempo-regional fragments where
Lewis Jones is right beside Connor Macfarlane?”
MICO-PROJECT.EU
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX mm: <http://linkedmultimedia.org/sparql-
mm/functions#>
PREFIX ma: <http://www.w3.org/ns/ma-ont#>
PREFIX dct: <http://purl.org/dc/terms/>
!
SELECT (mm:boundingBox(?l1,?l2) AS ?left_right)
WHERE {
?f1 ma:locator ?l1; dct:subject ?p1.
?p1 foaf:name "Lewis Jones".
?f2 ma:locator ?l2; dct:subject ?p2.
?p2 foaf:name "Connor Macfarlane".
!
FILTER mm:rightBeside(?l1,?l2)
FILTER mm:temporalOverlaps(?l1,?l2)
}
We want computers to process media.
GRAZIE!
foaf:name
“Andrea Volpini"
Hopefully
soon in the
Linked
Data
Cloud!
CREDITS
ANDREW NG, 2011

J U RAFSKY & MARTIN, 2008

Webscale IA using Linked Open Data on slideshare by reduxd

LODE linking open descriptions of events aswc 2009 on
slideshare by Raphael Troncy 

Semantic SEO in the post-Hummingbird era on slideshare by Kim
Renberg and Andrea Volpini

Querying of metadata, media content and context in MICO a
demo by Thomas Kurz
this presentation is the result of many inspiring ideas and amazing work from

other people and here is the list:
any idea, graphics or meme belonging to us is available 

for sharing, copying and re-mixing under 

creative commons license 3.0

More Related Content

What's hot

Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
Neo4j
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
sssw2011
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
sssw2011
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011
sssw2011
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
Julie Allinson
 

What's hot (20)

Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
 
Semantic Web - Introduction
Semantic Web - IntroductionSemantic Web - Introduction
Semantic Web - Introduction
 
Knowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based SearchKnowledge Graphs - The Power of Graph-Based Search
Knowledge Graphs - The Power of Graph-Based Search
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Using cognitive computing to better analyze human communication
Using cognitive computing to better analyze human communicationUsing cognitive computing to better analyze human communication
Using cognitive computing to better analyze human communication
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 
Open Data and Linked Data
Open Data and Linked DataOpen Data and Linked Data
Open Data and Linked Data
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with Ontologies
 
PROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked DataPROPEL . Austrian's Roadmap for Enterprise Linked Data
PROPEL . Austrian's Roadmap for Enterprise Linked Data
 
Mining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your OrganizationMining the Social Web for Fun & Profit Within Your Organization
Mining the Social Web for Fun & Profit Within Your Organization
 
Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011Harith Alani's presentation at SSSW 2011
Harith Alani's presentation at SSSW 2011
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
POLE Investigations with Neo4j
POLE Investigations with Neo4jPOLE Investigations with Neo4j
POLE Investigations with Neo4j
 
Intelligence led policing- pole sandbox (webinar 21012019)
Intelligence led policing- pole sandbox (webinar 21012019) Intelligence led policing- pole sandbox (webinar 21012019)
Intelligence led policing- pole sandbox (webinar 21012019)
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...
 
Linked Data and the OpenART project
Linked Data and the OpenART projectLinked Data and the OpenART project
Linked Data and the OpenART project
 
Capitalize On Social Media With Big Data Analytics
Capitalize On Social Media With Big Data AnalyticsCapitalize On Social Media With Big Data Analytics
Capitalize On Social Media With Big Data Analytics
 

Similar to What do we want computers to do for us?

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum Presentation
Mediabistro
 

Similar to What do we want computers to do for us? (20)

Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
X api chinese cop monthly meeting feb.2016
X api chinese cop monthly meeting   feb.2016X api chinese cop monthly meeting   feb.2016
X api chinese cop monthly meeting feb.2016
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they Compare
 
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
 
Bibliotheken en cloud computing
Bibliotheken en cloud computingBibliotheken en cloud computing
Bibliotheken en cloud computing
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Lee Feigenbaum Presentation
Lee Feigenbaum PresentationLee Feigenbaum Presentation
Lee Feigenbaum Presentation
 
The Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open DataThe Europeana Strategy and Linked Open Data
The Europeana Strategy and Linked Open Data
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
Big Data to SMART Data : Process Scenario
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
 
Paul houle resume
Paul houle resumePaul houle resume
Paul houle resume
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
The CSO Open Data Experience
The CSO Open Data ExperienceThe CSO Open Data Experience
The CSO Open Data Experience
 
Web3.0 or The semantic web
Web3.0 or The semantic webWeb3.0 or The semantic web
Web3.0 or The semantic web
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
 

More from Andrea Volpini

More from Andrea Volpini (20)

Seo automation using gpt 3 and transformer-based language models
Seo automation using gpt 3 and transformer-based language modelsSeo automation using gpt 3 and transformer-based language models
Seo automation using gpt 3 and transformer-based language models
 
Schema Markup Essentials by Semrush
Schema Markup Essentials by SemrushSchema Markup Essentials by Semrush
Schema Markup Essentials by Semrush
 
How AI/ML Can Supercharge Your SEO (a TNW round table)
How AI/ML Can Supercharge Your SEO (a TNW round table)How AI/ML Can Supercharge Your SEO (a TNW round table)
How AI/ML Can Supercharge Your SEO (a TNW round table)
 
Making Websites Talk: the rise of Voice Search and Conversational Interfaces
Making Websites Talk: the rise of Voice Search and Conversational InterfacesMaking Websites Talk: the rise of Voice Search and Conversational Interfaces
Making Websites Talk: the rise of Voice Search and Conversational Interfaces
 
Wordlift Roadmap for 2018
Wordlift Roadmap for 2018Wordlift Roadmap for 2018
Wordlift Roadmap for 2018
 
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
AI-powered SEO - Structured Data & Semantics - WordLift for SMXL Milan 2017
 
Is semantic markup really helping websites improve their online visibility?
Is semantic markup really helping websites improve their online visibility?Is semantic markup really helping websites improve their online visibility?
Is semantic markup really helping websites improve their online visibility?
 
WordLift - SEMANTiCS 2016
WordLift - SEMANTiCS 2016 WordLift - SEMANTiCS 2016
WordLift - SEMANTiCS 2016
 
New Thinking in the Practice of Digital Journalism
New Thinking in the Practice of Digital Journalism New Thinking in the Practice of Digital Journalism
New Thinking in the Practice of Digital Journalism
 
Semantic SEO in the post Hummingbird Era and WordLift
Semantic SEO in the post Hummingbird Era and WordLiftSemantic SEO in the post Hummingbird Era and WordLift
Semantic SEO in the post Hummingbird Era and WordLift
 
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0 Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
Semantic SEO nell’Era Post Hummingbird e WordLift 3.0
 
Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)Linked Open GeoData for Enel Drive (W3C LOD2014)
Linked Open GeoData for Enel Drive (W3C LOD2014)
 
WordLift 3.0 - Dynamic Semantic Publishing for WordPress
WordLift 3.0 - Dynamic Semantic Publishing for WordPress WordLift 3.0 - Dynamic Semantic Publishing for WordPress
WordLift 3.0 - Dynamic Semantic Publishing for WordPress
 
Redlink - Semantic Technologies for News & Media
Redlink - Semantic Technologies for News & Media Redlink - Semantic Technologies for News & Media
Redlink - Semantic Technologies for News & Media
 
Hybrid TV & OTT TV for Telco 3.0
Hybrid TV & OTT TV for Telco 3.0Hybrid TV & OTT TV for Telco 3.0
Hybrid TV & OTT TV for Telco 3.0
 
Wordlift 2.5 Sneak-Peek
Wordlift 2.5 Sneak-PeekWordlift 2.5 Sneak-Peek
Wordlift 2.5 Sneak-Peek
 
RedLink GmbH (Introduction)
RedLink GmbH (Introduction)  RedLink GmbH (Introduction)
RedLink GmbH (Introduction)
 
HelixCloud Webinar
HelixCloud WebinarHelixCloud Webinar
HelixCloud Webinar
 
Semantic Marketing
Semantic MarketingSemantic Marketing
Semantic Marketing
 
WordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in RomeWordLift 2.0 presented on the Semantic Web Meetup in Rome
WordLift 2.0 presented on the Semantic Web Meetup in Rome
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

What do we want computers to do for us?

  • 1. A framework for knowledge extraction, linked data and semantic search. What do we want computers to do for us?
  • 2. We have data. • From 2005 to 2020, the digital universe will grow in size by a factor of 300, from 30 exabytes to 40 trillion gigabyte (40 ZB). • From now until 2020, the digital universe will about double every two years. • Volumes of data are projected to reach 5.247 GB per person with emerging economies playing an increasingly important role (producing two thirds of the world data by the end of this decade). • Only 0.5% of this data is used today for analysis. • The amount of information individuals create themselves - writing documents, taking pictures, recording audio - is far less than the information being created about them in the digital universe. [IDC I V I E W, 2012]
  • 3. What do we want computers to do for us? Text Images/Video Audio "language": "de" Categorisation, Summarisation, Search, Question/Answer, … "label": "outdoor" Suggest tags, Image search, … Automatic Speech Recognition, Speaker identification, Music classification, … [Andrew NG, 2011]We want computers to process data.
  • 4. Natural Language Processing We use it everyday. [J U RAFSKY & MARTIN, 2008] a theoretically motivated range of computational techniques for analysing naturally occurring text/speech for the purpose of achieving human-like language processing.
  • 5. Features extraction in text/speech. Levels of knowledge encoding in language data. INPUT Morphologic Syntactic Semantic FEATURES NLP { Parser Lexical DB Stemming AnaphoraPos Tagging NER
  • 6. TEXT NLP FEATURES WISDOM What do we want computers to do with a text? STRUCTURED DATA CONTEXT We want computers to make sense of unstructured data. KNOWLEDGE { Semantic Lifting
  • 7. TEXT WISDOM A practical example. CONTEXT Combining Semantic Web technologies with NLP technologies. KNOWLEDGE Lucoli "label": "Lucoli" "values": ["13.338889"], "predicate": "http:// www.w3.org/2003/01/ geo/wgs84_pos#long" "values": ["42.29194444444445"], "predicate": "http://www.w3.org/2003/01/geo/ wgs84_pos#lat" "values": [ ! ! ! ! ! ], "predicate": "http:// xmlns.com/foaf/0.1/ depiction" About 20 minutes car drive from L’Aquila. …
  • 8. How we started. Building an open platform for knowledge extraction, linked data and semantic search. ! Delivering the world’s most advanced open source content analysis and making linked data publishing and information discovery accessible to anyone.
  • 9. • Incorporating requirements from industry partners: • CMS companies • System integrators • Tool providers • Inheriting 6 years of IP with R&D on: • Semantic Information Management and Publishing (RDF and Semantic Web Technology) • Semantic Processing • Conceptual Search
  • 10. CONTENT ANALYSIS LINKED DATA PUBLISHING 1 3 Linked Data Cloud Technology Stack Text Legacy Data Audio/Images (under development) CONTENT DISCOVERY2 • Enterprise Linked Data • Content Enhancement • Semantic Search
  • 11. • Semantic enhancement process chaining • Multiple NLP features extraction facilities • Multiple language support • Content classification and sentiment analysis • Graduated as Top Level Project of the Apache Foundation in September 2012 STANBOL.APACHE.ORG A Toolbox for Semantic Processing.
  • 12. SOLR.APACHE.ORG The Highly Scalable Search Server. • Based on Apache Lucene • Various language specific processing procedures • Highly scalable (Solr cloud) and highly configurable • Ultra fast indexing/searching, indexes can be merged/ optimised • Semantic Search available with an easy-to-install Redlink Plugin
  • 13. DEV.REDLINK.IO/PLUGINS/SOLR Adding Semantic Search to Apache Solr. • Boost your existing Apache Solr installation with semantic enhancements via Redlink Content Analysis • Watch the screencast • Learn more• Customising the semantic enhancements with user-created vocabularies and Redlink NLP extraction facilities
  • 14. Managing vocabularies. Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data • Build your first app • Learn more • Redlink allows users to create their own Linked Data server for managing vocabularies or publishing datasets for Linked (Open) Data projects • Datasets managed with Redlink can be made available for content analysis and linking • Datasets can be either private (Linked Enterprise Data) or public (Linked Open Data) ! • Public Datasets such as DBpedia, Freebase and GeoNames are available for de-referencing and interlinking
  • 15. • Read-Write Linked Data • Triple store with transactions, versioning and rule-based reasoning • SPARQL and LDPath query languages • Transparent Linked Data Caching • Graduated as Top Level Project of the Apache Foundation in November 2013 MARMOTTA.APACHE.ORG The Open Platform for Linked Data.
  • 16. An Open Linked Data Project for Tourism in Salzburg • Cross platform publishing as more travellers massively begin using mobile devices • Multiple Web CMSs (both proprietary and open source) to be managed simultaneously • Costly manual curation and interlinking • Increasing demand for content syndication (from big players like foursquare as well as from local application developers) • Need for better SEO especially for events and sites (too regional to be understood by commercial search engines)
  • 17. Remixing existing content and creating new value. A magazine running on WordPress An online booking system freshly updated content on locations and events a database containing: events, facilities, accommodations, … Everything we know already from Wikipedia the World’s largest encyclopedia Using Linked Data to make sense of the information
  • 18. Linked Data Publishing • Data from the online booking system (Feratel) is enriched and transformed in triples using identified vocabularies and ontologies • Triples are stored in the Redlink triple store in a dedicated context • RDF data and SPARQL end-points are published to the data website (data.salzburgerland.com) running CKAN as Linked Open Data • CKAN makes the data accessibile to third parties in various formats by querying Redlink
  • 19. Transforming Feratel Data in Semantic Knowledge from SOAP to Linked Data
  • 20. Ontologies provide a mean to hold everything together Data Modelling with LODE
  • 21. Using LODE: An ontology for Linking Open Descriptions of Events Adding the relationships between things
  • 22. Florianifeier with RDF different data sources are integrated to provide robot-friendly information that describe real world things <subject><predicate><object>
  • 23. Semantic Lifting and Linked Data Principles • A “word” or “phrase” becomes an identifier used to denote “things” (named entities) existing in the real world 1.Real-world thing are unambiguously represented with web addresses (URI) 2.By accessing these web addresses (HTTP-URI) usable data is sent in return using standard formats (RDF, SPARQL) 3.This data includes links to other data so that people can discover more things "label":"May", "reference": “http://dbpedia.org/ resource/May” ! Type: Thing "values"["13.7446"],"predicate": "http:// www.w3.org/2003/01/geo/wgs84_pos#long" values"["47.10222"],"predicate": “http:// www.w3.org/2003/01/geo/wgs84_pos#lat” "reference": “http://dbpedia.org/page/Unternberg” ! Type: Place “label":"Florianifeier", "reference":“http:// rdf.salzburgerland.com/ events/event/dea7fde1-5583-4002-97eb-007 4a182fa9c.html”! Type: Event Tim Berners-Lee. LANGUAGE EVENT THING LOCATION ENGLISH FLORIANIFEIER MAY UNTERNBERG [Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe] “This May don't miss the Florianifeier, we'll have fun as usual in Unternberg”
  • 24. Dynamic Semantic Publishing with ordLiftW • Data from the Redlink triple store is made available for content enrichment and can be edited using WordLift, a semantic plugin for WordPress.
  • 25. Data Curation • Using Linked Data the Web becomes my new CMS • information is automatically imported in WordPress • posts are connected with entities • properties for each entity can be edited using WordPress • any change is automatically reflected in the triple-store and re-published as Open Data Using Linked Data and WordLift the Web becomes your new CMS. editing a blog post editing an entity
  • 26. Web Search 19.900 results no answer Touristic applications attempting to discover events in Salzburgerland. “Which events occur in May in Lungau?” Linked Open Data Query 5 result 5 answer Unternberg is a village in the area of Lungauon google.at!!
  • 27. Better SEO using Semantic Markup Florianifeier Unternberg • Using schema.org the data from the triple-store is added to the pages as semantic markup • Search engines can finally “recognise” entities that were previously unknown (i.e. Florianifeier) ordLiftW
  • 28. •Media in cross-media context, allowing to analyse media resources as well as connected content, including video, images, audio, text, link structure and metadata; •Investigate cross-media analysis along the complete, distributed analysis chain, namely extraction, metadata publishing, querying and recommendations; •Contribute its main software development results as Open Source components to two established Apache projects, Apache Marmotta and Apache Stanbol, simplifying the use of the technology in industrial products. What do we want computers to do with Media? MICO-PROJECT.EU
  • 29. “Show me the tempo-regional fragments where Lewis Jones is right beside Connor Macfarlane?” MICO-PROJECT.EU PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX mm: <http://linkedmultimedia.org/sparql- mm/functions#> PREFIX ma: <http://www.w3.org/ns/ma-ont#> PREFIX dct: <http://purl.org/dc/terms/> ! SELECT (mm:boundingBox(?l1,?l2) AS ?left_right) WHERE { ?f1 ma:locator ?l1; dct:subject ?p1. ?p1 foaf:name "Lewis Jones". ?f2 ma:locator ?l2; dct:subject ?p2. ?p2 foaf:name "Connor Macfarlane". ! FILTER mm:rightBeside(?l1,?l2) FILTER mm:temporalOverlaps(?l1,?l2) } We want computers to process media.
  • 31. CREDITS ANDREW NG, 2011 J U RAFSKY & MARTIN, 2008 Webscale IA using Linked Open Data on slideshare by reduxd LODE linking open descriptions of events aswc 2009 on slideshare by Raphael Troncy Semantic SEO in the post-Hummingbird era on slideshare by Kim Renberg and Andrea Volpini Querying of metadata, media content and context in MICO a demo by Thomas Kurz this presentation is the result of many inspiring ideas and amazing work from other people and here is the list: any idea, graphics or meme belonging to us is available for sharing, copying and re-mixing under creative commons license 3.0