SlideShare a Scribd company logo
1 of 31
Download to read offline
Gabriel Dragomir

Drupal and Apache
Stanbol
SEMANTIC ANNOTATION WITH CUSTOM
VOCABULARIES
About me

• Drupal developer, trainer and consultant
• Founding member of Drupal Romania
Association
The Semantic Web
• Tim Berners Lee:
‘‘The first step is putting data on the
Web in a form that machines can
naturally understand, or converting
it to that form. This creates what I
call a Semantic Web – a Web of data
that can be processed directly or
indirectly by machines.’’
What’s the hype?
• Most organizations need to organize/analyze/

relate huge amounts of textual, unstructured,
dissipated data

• Examples:
• keyword extraction from content: annotate
abstracts

• text categorization: organize big volumes of text
based on a thesaurus

• media monitoring of tags: occurences of a specific
keyword on social media channels
Linked data

http://lod-cloud.net/
Linked data
• Project started in 2007
• Aimed at building the Web of Data by:
• identifying open access data sets
• converting them into RDF
vocabularies

• publish them as open access data
sets
Linked data ecosystem
• Linked Open Vocabularies (LOV):
http://lov.okfn.org/dataset/lov/

• Provides a conceptual map of the
vocabularies

• Various providers: libraries,
governmental actors, NGOs
Linked data ecosystem
• Where to find other data sets?
• http://www.w3.org/2001/sw/wiki/
SKOS/Datasets

• Swoogle: http://swoogle.umbc.edu/
• PoolParty: http://

vocabulary.semantic-web.at
Linked data at work!
Semantic annotation
• Creates specific metadata that enable
new ways to retrieve and aggregate
information

• Annotations are done based on a

conceptual scheme, an ontology (ex.
FOAF, DC Core)

• For more on ontologies see: http://

www.w3.org/wiki/Good_Ontologies

• The annotations build semantic
Semantic annotation
• Most common uses:
• Named Entity Linking: limited

recognizing entities of type person,
organization, place (e.g. OpenCalais)

• Entityhub Linking: annotation based on

vocabularies with no limitations of
entity types. Requires more natural
language processing prior to annotation.
Apache Stanbol on the fly
• Here comes Apache Stanbol
• A new approach:
• modular semantic analysis of documents
• processing components can be built for
virtually any language

• flexible workflows via semantic annotation
chains

• any vocabulary (Linked Data, custom) can be
used
Service oriented
architecture
• Stanbol is designed to offer service oriented
integration

• RESTful web services API returning RDF or
JSON/JSON-LD

• Each component exposes an endpoint
independently

• Open Services Gateway initiative compliant
(OSGi) via Apache Felix and Apache Sling

• Remote component management
Implementation
• OSGi layer: Apache Felix and Apache Sling
• Build environment: Apache Maven
• RDF framework: Apache Clerezza
• Triples store, reasoning engine: Apache Jena
• Indexing and semantic search: Apache Solr
• Content analysis/metadata extraction: Apache
Tika

• Natural language processing: Apache OpenNLP
Architecture
Components
• Semantic layer:
• Enhancer, EntityHub, ContentHub
• Enhancement engines: internal, 3rd party
• User interfaces
• Knowledge integration (rule sets,
reasoners)

• Storage integration
Content enhancement
• Examples:
• retrieve additional metadata for a piece of
content

• identify the language of a text
• extract entities (persons, places, organizations)
• create annotations to external sources
• use 3rd party services for named entities
recognition
Drupal meets Stanbol
• Several modules implement RDF

support allowing data transport to
Stanbol semantic annotations

• Taxonomy system allows for complex
annotation

• Fieldable taxonomy terms allow for
storage of complex semantic data
User scenarios
• Semantic indexing via Stanbol (SOLR
yard)

• Content enrichment with semantically
related information (documents,
factual data, images etc.)

• Tag as you type: dynamic annotation
of text in editors
How it works
• POST request sends content via REST API
• content is processed by an enhancement chain
• Returns JSON-LD, RDF/XML, RDF/JSON etc

JSON-LD - JavaScript Object Notation for Linked
Data a human readable and simple linked data
transport format

• for best results an enancement chain should do
language detection, tokenization, POS Tagging
prior to performing semantic annotation

• http://stanbol-yle.jelastic.planeetta.net/demo/
enhancer
Drupal integration

Source: blog.iks-project.eu
Drupal distribution: IKS
CE
• IKS CE distribution - Wolfgang Ziegler (fago),
Stéphane Corlosquet (scor)

• Components:
• Search API Stanbol
• VIE.js - semantic annotation UI
• https://drupal.org/project/iksce
• http://drupal.org/project/vie
• http://drupal.org/project/search_api_stanbol
• https://github.com/fago/stanbol-for-drupal
Search API Stanbol
• enables the indexing of Drupal

entities such as nodes, users,
taxonomy terms, files, etc. in Stanbol
EntityHub.

• data sent as RDF
• data can be mashed up with data from

other sources (Managed Sites, Remote
Sites)
VIE.js
• “Vienna IKS Editables”
• JavaScript library for

implementing decoupled Content
Management Systems and semantic
interaction in web applications.
Monolitic vs Decoupled
Content Management Systems
• Monolitic vs Decoupled Content
Management Systems

source: Henri Bergius - http://bergie.iki.fi
Demo setup
• we store Drupal entities in a SOLR index
• annotations are to be made based on:
• DBPedia - bundled with Apache Stanbol
• a custom vocabulary of terms related to
semantic web - Social Semantic Web
Thesaurus

• SemWeb is imported as a SOLR index
into Apache Stanbol
Custom vocabularies
• PoolParty Semantic Web
• 224 concepts related to semantic web
• Author: Andreas Blumauer
• http://vocabulary.semantic-web.at/
PoolPartySemanticWeb.html

• http://vocabulary.semantic-web.at/

PoolPartySemanticWeb/Drupal.html
Demo
• index Drupal entities in Apache Stanbol
• retrieve annotated entites via REST API
• annotate entities using dbpedia and
semweb indexes

• edit Drupal entities and annotate on the
fly

• retrieve linked data tag recommendations
Questions?
Contact me

• gabriel.dragomir@webikon.com
• twitter: gabidrg
Thank you!

More Related Content

What's hot

Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
Markus Luczak-Rösch
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
Matthew Rowe
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Cory Lampert
 

What's hot (20)

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
ORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and ComplexityORE and SWAP: Composition and Complexity
ORE and SWAP: Composition and Complexity
 
Saveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF dataSaveface - Save your Facebook content as RDF data
Saveface - Save your Facebook content as RDF data
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)Adventures in Linked Data Land (presentation by Richard Light)
Adventures in Linked Data Land (presentation by Richard Light)
 
A Semantic Data Model for Web Applications
A Semantic Data Model for Web ApplicationsA Semantic Data Model for Web Applications
A Semantic Data Model for Web Applications
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
Webofdata
WebofdataWebofdata
Webofdata
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Danbri Drupalcon Export
Danbri Drupalcon ExportDanbri Drupalcon Export
Danbri Drupalcon Export
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 

Viewers also liked

02 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 200802 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 2008
doctorado
 
Clib(20090925)
Clib(20090925)Clib(20090925)
Clib(20090925)
真 岡本
 
хуен бхMo
хуен бхMoхуен бхMo
хуен бхMo
bongxinh19
 
219 fullbook
219 fullbook219 fullbook
219 fullbook
Cut Nta
 
Senior Thesis Reality Tv
Senior Thesis Reality TvSenior Thesis Reality Tv
Senior Thesis Reality Tv
ZosoManiac
 

Viewers also liked (20)

Безопасный двор
Безопасный дворБезопасный двор
Безопасный двор
 
02 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 200802 Audiovisual El Salvador 2008
02 Audiovisual El Salvador 2008
 
Reki rossii
Reki rossiiReki rossii
Reki rossii
 
Clib(20090925)
Clib(20090925)Clib(20090925)
Clib(20090925)
 
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
Dmitry Ustalov — TagBag: Annotating a Foreign Language Lexical Resource with ...
 
Cbs executive magazine may 2010
Cbs executive magazine may 2010Cbs executive magazine may 2010
Cbs executive magazine may 2010
 
XenServer und Storage
XenServer und StorageXenServer und Storage
XenServer und Storage
 
La grammaire dl
La grammaire dlLa grammaire dl
La grammaire dl
 
オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山オープンデータカフェ・セミナー@八王子 桑山
オープンデータカフェ・セミナー@八王子 桑山
 
хуен бхMo
хуен бхMoхуен бхMo
хуен бхMo
 
Tavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence BrochureTavant Technologies - Business Intelligence Brochure
Tavant Technologies - Business Intelligence Brochure
 
219 fullbook
219 fullbook219 fullbook
219 fullbook
 
A View on the Future of Sakai
A View on the Future of SakaiA View on the Future of Sakai
A View on the Future of Sakai
 
Hackday Ml
Hackday MlHackday Ml
Hackday Ml
 
Senior Thesis Reality Tv
Senior Thesis Reality TvSenior Thesis Reality Tv
Senior Thesis Reality Tv
 
Cara i'rab bhs arb
Cara i'rab bhs arbCara i'rab bhs arb
Cara i'rab bhs arb
 
POEMAS DE AMOR
POEMAS DE AMORPOEMAS DE AMOR
POEMAS DE AMOR
 
Richard Rogers - Methods in Media
Richard Rogers - Methods in MediaRichard Rogers - Methods in Media
Richard Rogers - Methods in Media
 
Aprender a Convivir y estudio
Aprender a Convivir y estudioAprender a Convivir y estudio
Aprender a Convivir y estudio
 
Target List of Hesper-BOT Malware
Target List of Hesper-BOT MalwareTarget List of Hesper-BOT Malware
Target List of Hesper-BOT Malware
 

Similar to Drupal and Apache Stanbol

S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
Stoitsis Giannis
 
Drupal status report for all staff day
Drupal status report for all staff dayDrupal status report for all staff day
Drupal status report for all staff day
sbclapp
 

Similar to Drupal and Apache Stanbol (20)

If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Building APIs in an easy way using API Platform
Building APIs in an easy way using API PlatformBuilding APIs in an easy way using API Platform
Building APIs in an easy way using API Platform
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
NoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache HadoopNoSQL, Apache SOLR and Apache Hadoop
NoSQL, Apache SOLR and Apache Hadoop
 
Apache drill
Apache drillApache drill
Apache drill
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Intro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-AthensIntro to-technologies-Green-City-Hackathon-Athens
Intro to-technologies-Green-City-Hackathon-Athens
 
Drupal status report for all staff day
Drupal status report for all staff dayDrupal status report for all staff day
Drupal status report for all staff day
 
High Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed ContentHigh Voltage - Building Static Sites With Wordpress-Managed Content
High Voltage - Building Static Sites With Wordpress-Managed Content
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016Elasticsearch JVM-MX Meetup April 2016
Elasticsearch JVM-MX Meetup April 2016
 
Alfresco overview EDM
Alfresco overview EDMAlfresco overview EDM
Alfresco overview EDM
 
Application of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLibApplication of Library Management Software: NewGenLib
Application of Library Management Software: NewGenLib
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Drupal and Apache Stanbol

  • 1. Gabriel Dragomir Drupal and Apache Stanbol SEMANTIC ANNOTATION WITH CUSTOM VOCABULARIES
  • 2. About me • Drupal developer, trainer and consultant • Founding member of Drupal Romania Association
  • 3. The Semantic Web • Tim Berners Lee: ‘‘The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web – a Web of data that can be processed directly or indirectly by machines.’’
  • 4. What’s the hype? • Most organizations need to organize/analyze/ relate huge amounts of textual, unstructured, dissipated data • Examples: • keyword extraction from content: annotate abstracts • text categorization: organize big volumes of text based on a thesaurus • media monitoring of tags: occurences of a specific keyword on social media channels
  • 6. Linked data • Project started in 2007 • Aimed at building the Web of Data by: • identifying open access data sets • converting them into RDF vocabularies • publish them as open access data sets
  • 7. Linked data ecosystem • Linked Open Vocabularies (LOV): http://lov.okfn.org/dataset/lov/ • Provides a conceptual map of the vocabularies • Various providers: libraries, governmental actors, NGOs
  • 8. Linked data ecosystem • Where to find other data sets? • http://www.w3.org/2001/sw/wiki/ SKOS/Datasets • Swoogle: http://swoogle.umbc.edu/ • PoolParty: http:// vocabulary.semantic-web.at
  • 10. Semantic annotation • Creates specific metadata that enable new ways to retrieve and aggregate information • Annotations are done based on a conceptual scheme, an ontology (ex. FOAF, DC Core) • For more on ontologies see: http:// www.w3.org/wiki/Good_Ontologies • The annotations build semantic
  • 11. Semantic annotation • Most common uses: • Named Entity Linking: limited recognizing entities of type person, organization, place (e.g. OpenCalais) • Entityhub Linking: annotation based on vocabularies with no limitations of entity types. Requires more natural language processing prior to annotation.
  • 12. Apache Stanbol on the fly • Here comes Apache Stanbol • A new approach: • modular semantic analysis of documents • processing components can be built for virtually any language • flexible workflows via semantic annotation chains • any vocabulary (Linked Data, custom) can be used
  • 13. Service oriented architecture • Stanbol is designed to offer service oriented integration • RESTful web services API returning RDF or JSON/JSON-LD • Each component exposes an endpoint independently • Open Services Gateway initiative compliant (OSGi) via Apache Felix and Apache Sling • Remote component management
  • 14. Implementation • OSGi layer: Apache Felix and Apache Sling • Build environment: Apache Maven • RDF framework: Apache Clerezza • Triples store, reasoning engine: Apache Jena • Indexing and semantic search: Apache Solr • Content analysis/metadata extraction: Apache Tika • Natural language processing: Apache OpenNLP
  • 16. Components • Semantic layer: • Enhancer, EntityHub, ContentHub • Enhancement engines: internal, 3rd party • User interfaces • Knowledge integration (rule sets, reasoners) • Storage integration
  • 17. Content enhancement • Examples: • retrieve additional metadata for a piece of content • identify the language of a text • extract entities (persons, places, organizations) • create annotations to external sources • use 3rd party services for named entities recognition
  • 18. Drupal meets Stanbol • Several modules implement RDF support allowing data transport to Stanbol semantic annotations • Taxonomy system allows for complex annotation • Fieldable taxonomy terms allow for storage of complex semantic data
  • 19. User scenarios • Semantic indexing via Stanbol (SOLR yard) • Content enrichment with semantically related information (documents, factual data, images etc.) • Tag as you type: dynamic annotation of text in editors
  • 20. How it works • POST request sends content via REST API • content is processed by an enhancement chain • Returns JSON-LD, RDF/XML, RDF/JSON etc JSON-LD - JavaScript Object Notation for Linked Data a human readable and simple linked data transport format • for best results an enancement chain should do language detection, tokenization, POS Tagging prior to performing semantic annotation • http://stanbol-yle.jelastic.planeetta.net/demo/ enhancer
  • 22. Drupal distribution: IKS CE • IKS CE distribution - Wolfgang Ziegler (fago), Stéphane Corlosquet (scor) • Components: • Search API Stanbol • VIE.js - semantic annotation UI • https://drupal.org/project/iksce • http://drupal.org/project/vie • http://drupal.org/project/search_api_stanbol • https://github.com/fago/stanbol-for-drupal
  • 23. Search API Stanbol • enables the indexing of Drupal entities such as nodes, users, taxonomy terms, files, etc. in Stanbol EntityHub. • data sent as RDF • data can be mashed up with data from other sources (Managed Sites, Remote Sites)
  • 24. VIE.js • “Vienna IKS Editables” • JavaScript library for implementing decoupled Content Management Systems and semantic interaction in web applications.
  • 25. Monolitic vs Decoupled Content Management Systems • Monolitic vs Decoupled Content Management Systems source: Henri Bergius - http://bergie.iki.fi
  • 26. Demo setup • we store Drupal entities in a SOLR index • annotations are to be made based on: • DBPedia - bundled with Apache Stanbol • a custom vocabulary of terms related to semantic web - Social Semantic Web Thesaurus • SemWeb is imported as a SOLR index into Apache Stanbol
  • 27. Custom vocabularies • PoolParty Semantic Web • 224 concepts related to semantic web • Author: Andreas Blumauer • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb.html • http://vocabulary.semantic-web.at/ PoolPartySemanticWeb/Drupal.html
  • 28. Demo • index Drupal entities in Apache Stanbol • retrieve annotated entites via REST API • annotate entities using dbpedia and semweb indexes • edit Drupal entities and annotate on the fly • retrieve linked data tag recommendations