SlideShare a Scribd company logo
1 of 16
From Big Linked Data to Linked Big Data:
DBpedia as a framework for
data integration
Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2
1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino
2- Istituto Superiore Mario Boella (ISMB)
7th DBpedia Community Meeting in Leipzig
15 September 2016
PhD candidate on semantics at
Nexa Center for Internet & Society,
DAUIN, Politecnico di Torino
Experiences with LOD and DBpedia
• TellMeFirst, a tool for classifying and enriching
textual documents built on DBpedia Spotlight
(http://tellmefirst.polito.it)
• Contratti Pubblici, a tool for processing, exploring,
and visualizing Italian Public Procurements
(http://public-contracts.nexacenter.org/)
4
How TellMeFirst works
TellMeFirst
Results obtained with a
description of the
Eyes Wide Shut movie
Anti-corruption National Authority
Contratti Pubblici
(Synapta + Nexa)
Different data sources to
build a search engine on
Italian Public Contracts
Agency for Digital Italy
Linked Data repository of
Public Contracts, linked to
DBpedia and SPC
Contratti Pubblici
(Synapta + Nexa)
Contratti Pubblici
DBpedia in our projects
• TellMeFirst:
–Training set used for the semantic classification task
–Several interlinks used for the enrichment task
• Contratti Pubblici:
–Data enrichment to enable advanced SPARQL queries
–Data quality improvement (i.e., consistent labels)
• Big Linked Data
–Already implemented as shown by the exponential growth
of Linked Data in the last years
• Linked Big Data
–RDF data model for Big Data Variety
–Meta information to enable powerful analytics
–Simplify Big Data access, integration, and interlinking
From Big Linked Data to Linked Big Data
Big Data notion of Variety
• Variety of data and representation formats
• Variety of conceptualizations and data models
• Variety related to temporal and spatial dependencies
• Variety as a “generalization of the semantic
heterogeneity as studied in the field of Linked Data”
(Pascal Hitzler & Krzysztof Janowicz)
PhD research questions (i)
• RQ1: How can the technological foundations of Linked
Data and Big Data can be further improved and
combined to create an open software architecture for a
multi-thematic, multi-perspective, and multi-medial
knowledge graph from heterogeneous sources?
PhD research questions (ii)
• RQ2: Which are the features of a research method to
meet and evaluate security, scalability, performance,
openness, interoperability of the software architecture
mentioned earlier? And how we can measure the quality
of the knowledge graph produced with this software
architecture?
Key ideas for my PhD
• Get concepts and ontologies from the DBpedia
knowledge base to support semantic alignment during
the integration stage
• Use frameworks for data integration of structured
information with Big Data technologies:
RDF Mapping Language (RML) + Hadoop or Spark
• Exploit Machine Learning techniques to increment
datasets with unstructured data (i.e., Deep Learning)
DBpedia as knowledge base for:
• Entity linking and annotations in documents
• Assertion of additional categories for data
• Improvement of multilingual information
• Estimation of data quality of integrated information
according to different features (i.e., provenance)
Challenges
• Greater accuracy (integrating different datasets)
• Immediacy (near-real time data, from new data sources)
• Flexibility (not constrained by database structure)
• Better analytics (the ability to change the rules)
• Data quality (reliability and effectiveness of data)
Suggestions and/or comments?
Mail
giuseppe.futia@polito.it
Repository GitHub
https://github.com/giuseppefutia/

More Related Content

What's hot

AZ to eDiscovery
AZ to eDiscoveryAZ to eDiscovery
AZ to eDiscovery
eamonnsfl
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
Georg Guentner
 

What's hot (20)

Text Data Mining & Publishing
Text Data Mining & PublishingText Data Mining & Publishing
Text Data Mining & Publishing
 
Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!Coreon - Making Sure IoT Devices Understand Each Other!
Coreon - Making Sure IoT Devices Understand Each Other!
 
AZ to eDiscovery
AZ to eDiscoveryAZ to eDiscovery
AZ to eDiscovery
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
Dotnet ieee titles 2013 14
Dotnet ieee titles 2013 14Dotnet ieee titles 2013 14
Dotnet ieee titles 2013 14
 
Infraestructuras data science_portugal_ipca_industry_4.0_v2
Infraestructuras data science_portugal_ipca_industry_4.0_v2Infraestructuras data science_portugal_ipca_industry_4.0_v2
Infraestructuras data science_portugal_ipca_industry_4.0_v2
 
PhD Projects in P2P Live Streaming Research Assistance
PhD Projects in P2P Live Streaming Research AssistancePhD Projects in P2P Live Streaming Research Assistance
PhD Projects in P2P Live Streaming Research Assistance
 
PhD Projects in NS3 Tutorials
PhD Projects in NS3 TutorialsPhD Projects in NS3 Tutorials
PhD Projects in NS3 Tutorials
 
Design for Findability at the Library of Congress
Design for Findability at the Library of CongressDesign for Findability at the Library of Congress
Design for Findability at the Library of Congress
 
Design for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.govDesign for Findability: metadata, metrics and collaboration on LOC.gov
Design for Findability: metadata, metrics and collaboration on LOC.gov
 
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural HeritageBuild Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
 
The open semantic enterprise enterprise data meets web data
The open semantic enterprise   enterprise data meets web dataThe open semantic enterprise   enterprise data meets web data
The open semantic enterprise enterprise data meets web data
 
20140521 presentation ce de mv3
20140521 presentation ce de mv320140521 presentation ce de mv3
20140521 presentation ce de mv3
 
PhD Research Topics in Data Mining Tutorials
PhD Research Topics in Data Mining TutorialsPhD Research Topics in Data Mining Tutorials
PhD Research Topics in Data Mining Tutorials
 
PhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research GuidancePhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research Guidance
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?
 
#opendata Back to the future
#opendata Back to the future#opendata Back to the future
#opendata Back to the future
 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
 
Semantic web on Cloud Infrastructure
Semantic web on Cloud InfrastructureSemantic web on Cloud Infrastructure
Semantic web on Cloud Infrastructure
 

Viewers also liked

Bitspend Introduction
Bitspend IntroductionBitspend Introduction
Bitspend Introduction
bitcoin
 
Introduction to bitcoin
Introduction to bitcoinIntroduction to bitcoin
Introduction to bitcoin
Wolf McNally
 

Viewers also liked (13)

Visualization of Linked Data
Visualization of Linked DataVisualization of Linked Data
Visualization of Linked Data
 
TellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery frameworkTellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery framework
 
ORAM: A Brief Overview
ORAM: A Brief OverviewORAM: A Brief Overview
ORAM: A Brief Overview
 
Analyzing Bitcoin Security
Analyzing Bitcoin SecurityAnalyzing Bitcoin Security
Analyzing Bitcoin Security
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Bitspend Introduction
Bitspend IntroductionBitspend Introduction
Bitspend Introduction
 
Bitcoin
BitcoinBitcoin
Bitcoin
 
Introduction Bitcoin
Introduction BitcoinIntroduction Bitcoin
Introduction Bitcoin
 
Bitcoin (Global Digital Currency)
Bitcoin (Global Digital Currency) Bitcoin (Global Digital Currency)
Bitcoin (Global Digital Currency)
 
What is Bitcoin? - A guide for beginners
What is Bitcoin? - A guide for beginnersWhat is Bitcoin? - A guide for beginners
What is Bitcoin? - A guide for beginners
 
Bitcoin - the Basics
Bitcoin - the BasicsBitcoin - the Basics
Bitcoin - the Basics
 
Introduction to bitcoin
Introduction to bitcoinIntroduction to bitcoin
Introduction to bitcoin
 
Bitcoin: The Internet of Money
Bitcoin: The Internet of MoneyBitcoin: The Internet of Money
Bitcoin: The Internet of Money
 

Similar to From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

DISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloudDISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloud
Paolo Nesi
 

Similar to From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration (20)

Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Paths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphsPaths to more personal and collaborative knowledge graphs
Paths to more personal and collaborative knowledge graphs
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
A Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdfA Comprehensive Guide to Data Science Technologies.pdf
A Comprehensive Guide to Data Science Technologies.pdf
 
Toward universal information access on the digital object cloud
Toward universal information access on the digital object cloudToward universal information access on the digital object cloud
Toward universal information access on the digital object cloud
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Rdaeu russia_fg_1_july2014_final
Rdaeu  russia_fg_1_july2014_finalRdaeu  russia_fg_1_july2014_final
Rdaeu russia_fg_1_july2014_final
 
Towards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA projectTowards long-term preservation of linked data - the PRELIDA project
Towards long-term preservation of linked data - the PRELIDA project
 
Toward FAIR Semantic Resources
Toward FAIR Semantic ResourcesToward FAIR Semantic Resources
Toward FAIR Semantic Resources
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
 
Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database  Linked Data Generation for the University Data From Legacy Database
Linked Data Generation for the University Data From Legacy Database
 
186-RISIS
186-RISIS186-RISIS
186-RISIS
 
DISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloudDISIT Lab overview: smart city, big data, semantic computing, cloud
DISIT Lab overview: smart city, big data, semantic computing, cloud
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 

Recently uploaded

➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
nirzagarg
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
@Chandigarh #call #Girls 9053900678 @Call #Girls in @Punjab 9053900678
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 

Recently uploaded (20)

Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Prashant Vihar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
𓀤Call On 7877925207 𓀤 Ahmedguda Call Girls Hot Model With Sexy Bhabi Ready Fo...
 
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
💚😋 Bilaspur Escort Service Call Girls, 9352852248 ₹5000 To 25K With AC💚😋
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Mamura Sector 66 ( Noida)
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 

From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration

  • 1. From Big Linked Data to Linked Big Data: DBpedia as a framework for data integration Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2 1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino 2- Istituto Superiore Mario Boella (ISMB) 7th DBpedia Community Meeting in Leipzig 15 September 2016
  • 2. PhD candidate on semantics at Nexa Center for Internet & Society, DAUIN, Politecnico di Torino
  • 3. Experiences with LOD and DBpedia • TellMeFirst, a tool for classifying and enriching textual documents built on DBpedia Spotlight (http://tellmefirst.polito.it) • Contratti Pubblici, a tool for processing, exploring, and visualizing Italian Public Procurements (http://public-contracts.nexacenter.org/)
  • 5. TellMeFirst Results obtained with a description of the Eyes Wide Shut movie
  • 6. Anti-corruption National Authority Contratti Pubblici (Synapta + Nexa) Different data sources to build a search engine on Italian Public Contracts Agency for Digital Italy
  • 7. Linked Data repository of Public Contracts, linked to DBpedia and SPC Contratti Pubblici (Synapta + Nexa) Contratti Pubblici
  • 8. DBpedia in our projects • TellMeFirst: –Training set used for the semantic classification task –Several interlinks used for the enrichment task • Contratti Pubblici: –Data enrichment to enable advanced SPARQL queries –Data quality improvement (i.e., consistent labels)
  • 9. • Big Linked Data –Already implemented as shown by the exponential growth of Linked Data in the last years • Linked Big Data –RDF data model for Big Data Variety –Meta information to enable powerful analytics –Simplify Big Data access, integration, and interlinking From Big Linked Data to Linked Big Data
  • 10. Big Data notion of Variety • Variety of data and representation formats • Variety of conceptualizations and data models • Variety related to temporal and spatial dependencies • Variety as a “generalization of the semantic heterogeneity as studied in the field of Linked Data” (Pascal Hitzler & Krzysztof Janowicz)
  • 11. PhD research questions (i) • RQ1: How can the technological foundations of Linked Data and Big Data can be further improved and combined to create an open software architecture for a multi-thematic, multi-perspective, and multi-medial knowledge graph from heterogeneous sources?
  • 12. PhD research questions (ii) • RQ2: Which are the features of a research method to meet and evaluate security, scalability, performance, openness, interoperability of the software architecture mentioned earlier? And how we can measure the quality of the knowledge graph produced with this software architecture?
  • 13. Key ideas for my PhD • Get concepts and ontologies from the DBpedia knowledge base to support semantic alignment during the integration stage • Use frameworks for data integration of structured information with Big Data technologies: RDF Mapping Language (RML) + Hadoop or Spark • Exploit Machine Learning techniques to increment datasets with unstructured data (i.e., Deep Learning)
  • 14. DBpedia as knowledge base for: • Entity linking and annotations in documents • Assertion of additional categories for data • Improvement of multilingual information • Estimation of data quality of integrated information according to different features (i.e., provenance)
  • 15. Challenges • Greater accuracy (integrating different datasets) • Immediacy (near-real time data, from new data sources) • Flexibility (not constrained by database structure) • Better analytics (the ability to change the rules) • Data quality (reliability and effectiveness of data)