SlideShare a Scribd company logo
1 of 51
Download to read offline
Utilizing Linked Open Data
                                  (LOD) Resources for
                             Semantic Enhancement of
                              User-Generated Content
                             Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3,
                        Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3

                                   1ITC, University of Twente, Enschede, the Netherlands

                              2Institute of Information Science & 3Biodiversity Research Center,
                                              Academia Sinica, Taipei, Taiwan
                               4Department of Computer Science and Information Engineering
                                   National Taiwan University of Science and Technology
                                                      Taipei, Taiwan



Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   2


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   3


Thursday, February 7, 2013
Background
                    Web 2.0 technologies enable people to contribute
                     their content on the web, e.g. wiki, blog, tagging
                    Social media utilize web 2.0 technologies to
                     support social interactive on the web, e.g. twitter,
                     flickr, facebook
                    The content on the web (or/and social media)
                     contributed by people is called “User-Generated
                     Content” (UGC)
                    UGC is mainly multimedia or textual data
                    UGC is considered as a potential resource for
                     scientific projects, e.g. citizen science

                                                              JIST2012   2012/12/3   4


Thursday, February 7, 2013
Background(cont.)
                    There are several problems to harvest UGC to
                     scientific purposes
                        The unstructured UGC is difficult to handle
                        The semantics of UGC is often ambiguous or/and poor
                        Social media is not designed for scientific purposes




                                     Courtesy from http://www.datenform.de/mapeng.html

                                                                                         JIST2012   2012/12/3   5


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   6


Thursday, February 7, 2013
Motivation
                    LOD datasets as resources
                        LOD aims on how to make data available on the Web, and
                         to interconnect data with the aim of increasing its value for
                         users
                        about 300 datasets consisting of over 31 billion RDF triples
                         within LOD projects.
                    Each entry representing a fact in LOD datasets has
                     a Unique Resource Identifier (URI) which is
                     referenceable and linkable on the Web.
                    The high interconnectivity between entries
                     potentially increases discoverability, reusability,
                     and the utility of information

                                                                           JIST2012   2012/12/3   7


Thursday, February 7, 2013
Motivation (cont.)
                    Therefore, if named entities of UGC can be
                     identified and connected to entries of LOD, the
                     semantics of named entities would be
                     disambiguated, so that the UGC could be easier to
                     process.




                                                            JIST2012   2012/12/3   8


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   9


Thursday, February 7, 2013
Data collection
                    Two Facebook interest groups for ecological
                     observations in Taiwan




http://www.facebook.com/groups/roadkilled/   http://www.facebook.com/groups/enjoymoths/



                                                                      JIST2012   2012/12/3   10


Thursday, February 7, 2013
Ecological Observations on Facebook




                                               JIST2012   2012/12/3   11


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   12


Thursday, February 7, 2013
LOD Ecology
                    Linked Open Data of Ecology (LODE) is a validated
                     dataset from a LOD project.
                    LODE integrated 5 previously distributed
                     databases:




          TFRI: Taiwan Forestry Research Institute



                                                            JIST2012   2012/12/3   13


Thursday, February 7, 2013
LODE in Linked Open Data Cloud




                                                 JIST2012   2012/12/3   14


Thursday, February 7, 2013
LODE in Linked Open Data Cloud




                                                 JIST2012   2012/12/3   14


Thursday, February 7, 2013
LOD Taiwan Geographic Name (TGN)
                    LOD TGN is mainly transferred from Taiwan
                     Gazetteer via LOD principles
                    LOD TGN has 159,241 geographic name entries, in
                     which 17,442 entries are linked to geonames.org




                                                           JIST2012   2012/12/3   15


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   16


Thursday, February 7, 2013
An approach for processing UGC
                             Information Extraction   Information Reuse




                                                         Information Formalization

                                                                                     JIST2012   2012/12/3   17


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   18


Thursday, February 7, 2013
Problems in Chinese species names in
                   Facebook ecological observations

                                         曙鳳蝶 (Atrophaneura Horishana)              曙鳳

               (1)                   玉帶鳳蝶 (Papilio Polytes)                        玉帶

                                   琉璃紋鳳蝶 (Papilio Hermosanus)                      琉璃
                             Adjective      Noun


                              細紋 (pronounced Si-Wen, meaning “fine veined”
                                 細紋黃鉤蛾
               (2)               細紋蠍蛉
                                 細紋新蠍蛉
                                ...15 species names with prefix name “細紋”

                                                                        JIST2012   2012/12/3   19


Thursday, February 7, 2013
Identifying shortened
                             species names




                                 Confidence value =




                                                      JIST2012   2012/12/3   20


Thursday, February 7, 2013
Determine a species name for a thread
                    What if several species
                     names had mentioned in
                     one thread? We used three
                     criteria
                        How many Like does the post or
                         the comments get?
                        How prestigious are the people
                         who post or make comments?
                        How many times does a species
                         name occur in a thread?




                                                          JIST2012   2012/12/3   21


Thursday, February 7, 2013
The problems of geographic names in
                   Facebook ecological observations

                  An example:
                  The Endemic Species Research Institute
                 特有生物研究保育中心
                 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

                             is shorten to

                    特生中心
                    Te-Sheng-Jhong-Sin




                                                           JIST2012   2012/12/3   22


Thursday, February 7, 2013
The problems of geographic names in
                   Facebook ecological observations

                  An example:
                  The Endemic Species Research Institute
                 特有生物研究保育中心
                 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

                             is shorten to

                    特生中心                     There are no rules to
                    Te-Sheng-Jhong-Sin       shorten long geographic
                                             names



                                                            JIST2012   2012/12/3   22


Thursday, February 7, 2013
Identifying shortened geographic
                   names




                                                  JIST2012   2012/12/3   23


Thursday, February 7, 2013
The ontology...
                    is relied on a Facebook thread, which is an entity
                     comprised of social media contents involving
                     peoples, places, time periods, photos, and links to
                     other contents
                    uses standard vocabularies,
                        Semantically-Interlinked Online communities (SIOC) can be
                         used to represent the structure of Facebook posts,
                         comments, and threads.
                        Friend of a Friend (FOAF) can be used to describe content
                         creators,
                        and Dublin Core for the interlinked contents they created




                                                                        JIST2012   2012/12/3   24


Thursday, February 7, 2013
An ontology for formalizing the extracted
                   information from Facebook threads




                                                   JIST2012   2012/12/3   25


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   26


Thursday, February 7, 2013
Transfer ecological observations in
                   Facebook to RDF




                     http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

                                                                             JIST2012   2012/12/3   27


Thursday, February 7, 2013
Transfer ecological observations in
                   Facebook to RDF




                     http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

                                                                             JIST2012   2012/12/3   27


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
A taxon of Theretra Nessus is the
                   extracted species name




                                                       JIST2012   2012/12/3   29


Thursday, February 7, 2013
A taxon of Theretra Nessus is the
                   extracted species name




                   This entry is connected to LODE via owl:sameAs
                                                                    JIST2012   2012/12/3   29


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The entry of LOD TGN transferred from
                   Taiwan Gazetteer




                                                   JIST2012   2012/12/3   31


Thursday, February 7, 2013
The entry of LOD TGN transferred from
                   Taiwan Gazetteer




                             It is linked to geonames.org via owl:sameAs


                                                                  JIST2012   2012/12/3   31


Thursday, February 7, 2013
Publish the processed Facebook
                   ecological observations




                                                    JIST2012   2012/12/3   32


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   33


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
JIST2012   2012/12/3   35


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   36


Thursday, February 7, 2013
Conclusion remarking
                    This study reports our experiences in transferring FB
                     ecological observations to interlink to LOD
                     resources (LODE and LOD TGN)
                    With these information extraction tools and LOD
                     resources, we developed a tool for semantic
                     enhancement of user input.

                    The LOD TGN is an ongoing project.
                    In the future, we will consolidate the feature types
                     of the geographic names, and we plan to make
                     the LOD TGN a geospatial semantics reference
                     resource.
                                                                JIST2012   2012/12/3   37


Thursday, February 7, 2013
Thank you for your attentions

                             Questions?

                             deng@itc.nl




                                                      JIST2012   2012/12/3   38


Thursday, February 7, 2013

More Related Content

Viewers also liked

SotM taiwan 2012 opening
SotM taiwan 2012 openingSotM taiwan 2012 opening
SotM taiwan 2012 openingDongpo Deng
 
Twitter and Social movements
Twitter and Social movementsTwitter and Social movements
Twitter and Social movementsJaleh Fazelian
 
Social Media and Disaster Management
Social Media and Disaster ManagementSocial Media and Disaster Management
Social Media and Disaster ManagementEPI2oh
 
How To Use Social Media In Emergency Response Management
How To Use Social Media In Emergency Response ManagementHow To Use Social Media In Emergency Response Management
How To Use Social Media In Emergency Response ManagementNatalie Sisson
 
Social media use in times of crisis
Social media use in times of crisisSocial media use in times of crisis
Social media use in times of crisisInes Mergel
 
Disaster Relief Using Social Media Data
Disaster Relief Using Social Media DataDisaster Relief Using Social Media Data
Disaster Relief Using Social Media DataAli Abbasi
 
Lessons Learned from OGP Summit 2016
Lessons Learned from OGP Summit 2016Lessons Learned from OGP Summit 2016
Lessons Learned from OGP Summit 2016DSP智庫驅動
 
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...Martin Hepp
 
The 10 Big Social Media Challenges (and the tactics to solve them)
The 10 Big Social Media Challenges (and the tactics to solve them)The 10 Big Social Media Challenges (and the tactics to solve them)
The 10 Big Social Media Challenges (and the tactics to solve them)Rand Fishkin
 

Viewers also liked (10)

SotM taiwan 2012 opening
SotM taiwan 2012 openingSotM taiwan 2012 opening
SotM taiwan 2012 opening
 
Twitter and Social movements
Twitter and Social movementsTwitter and Social movements
Twitter and Social movements
 
Social Media and Disaster Management
Social Media and Disaster ManagementSocial Media and Disaster Management
Social Media and Disaster Management
 
How To Use Social Media In Emergency Response Management
How To Use Social Media In Emergency Response ManagementHow To Use Social Media In Emergency Response Management
How To Use Social Media In Emergency Response Management
 
Social media use in times of crisis
Social media use in times of crisisSocial media use in times of crisis
Social media use in times of crisis
 
Disaster Relief Using Social Media Data
Disaster Relief Using Social Media DataDisaster Relief Using Social Media Data
Disaster Relief Using Social Media Data
 
Lessons Learned from OGP Summit 2016
Lessons Learned from OGP Summit 2016Lessons Learned from OGP Summit 2016
Lessons Learned from OGP Summit 2016
 
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
 
Social Media Crisis Management
Social Media Crisis ManagementSocial Media Crisis Management
Social Media Crisis Management
 
The 10 Big Social Media Challenges (and the tactics to solve them)
The 10 Big Social Media Challenges (and the tactics to solve them)The 10 Big Social Media Challenges (and the tactics to solve them)
The 10 Big Social Media Challenges (and the tactics to solve them)
 

Similar to Utilizing LOD for Semantic Enhancement of UGC

You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODMateja Verlic
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATIONBIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATIONIJCI JOURNAL
 
My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)AI4BD GmbH
 
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.RBig Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.Reraser Juan José Calderón
 
Linked Data for Federation of OER Data & Repositories
Linked Data for Federation of OER Data & RepositoriesLinked Data for Federation of OER Data & Repositories
Linked Data for Federation of OER Data & RepositoriesStefan Dietze
 
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Michael Hausenblas
 
UpSkills: Research Data Management for the Sciences
UpSkills: Research Data Management for the SciencesUpSkills: Research Data Management for the Sciences
UpSkills: Research Data Management for the Sciencesstevage
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTIJAEMSJORNAL
 
Tech sem 2_dilip
Tech sem 2_dilipTech sem 2_dilip
Tech sem 2_dilipDilip Kolli
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacydbpublications
 
Data hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataData hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataPhil Cryer
 
Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015bmeredig
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013Stefan Dietze
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Stefan Dietze
 

Similar to Utilizing LOD for Semantic Enhancement of UGC (20)

You rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LODYou rang, M’LOD? Google Refine in the world of LOD
You rang, M’LOD? Google Refine in the world of LOD
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATIONBIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
 
My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)My fire st petersburg 27 june 2012 (d hladky)
My fire st petersburg 27 june 2012 (d hladky)
 
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.RBig Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
 
Linked Data for Federation of OER Data & Repositories
Linked Data for Federation of OER Data & RepositoriesLinked Data for Federation of OER Data & Repositories
Linked Data for Federation of OER Data & Repositories
 
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
LOD2 - Creating Knowledge out of Interlinked Data - General PresentationLOD2 - Creating Knowledge out of Interlinked Data - General Presentation
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
 
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
 
UpSkills: Research Data Management for the Sciences
UpSkills: Research Data Management for the SciencesUpSkills: Research Data Management for the Sciences
UpSkills: Research Data Management for the Sciences
 
Application and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoTApplication and Methods of Deep Learning in IoT
Application and Methods of Deep Learning in IoT
 
Tech sem 2_dilip
Tech sem 2_dilipTech sem 2_dilip
Tech sem 2_dilip
 
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata PrivacyTwo-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
 
Data hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity dataData hosting infrastructure for primary biodiversity data
Data hosting infrastructure for primary biodiversity data
 
Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015Citrination-MRS Fall Meeting 2015
Citrination-MRS Fall Meeting 2015
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013KnowEscape workshop, OKCon 2013
KnowEscape workshop, OKCon 2013
 
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
 
Introduction to LDL 2012
Introduction to LDL 2012Introduction to LDL 2012
Introduction to LDL 2012
 

More from Dongpo Deng

20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governanceDongpo Deng
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open DataDongpo Deng
 
Construction and reuse of linked traceable agricultural product records - An ...
Construction and reuse of linked traceable agricultural product records - An ...Construction and reuse of linked traceable agricultural product records - An ...
Construction and reuse of linked traceable agricultural product records - An ...Dongpo Deng
 
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )Dongpo Deng
 
開放街圖社群經營的不等式
開放街圖社群經營的不等式開放街圖社群經營的不等式
開放街圖社群經營的不等式Dongpo Deng
 
OSM 與 LocalWiki 的整合: 支援社區層級災害管理
OSM 與 LocalWiki 的整合: 支援社區層級災害管理OSM 與 LocalWiki 的整合: 支援社區層級災害管理
OSM 與 LocalWiki 的整合: 支援社區層級災害管理Dongpo Deng
 
啟動開放,創新價值
啟動開放,創新價值 啟動開放,創新價值
啟動開放,創新價值 Dongpo Deng
 
2016年歐洲資料論壇
2016年歐洲資料論壇2016年歐洲資料論壇
2016年歐洲資料論壇Dongpo Deng
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataDongpo Deng
 
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )Dongpo Deng
 
20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_Mapping20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_MappingDongpo Deng
 
Crowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so farCrowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so farDongpo Deng
 
20141018_OD_meetup#3
20141018_OD_meetup#320141018_OD_meetup#3
20141018_OD_meetup#3Dongpo Deng
 
20141001 climate change&osm
20141001 climate change&osm20141001 climate change&osm
20141001 climate change&osmDongpo Deng
 
20140721 open geomeeting
20140721 open geomeeting20140721 open geomeeting
20140721 open geomeetingDongpo Deng
 
20140710 tca gsdi
20140710 tca gsdi20140710 tca gsdi
20140710 tca gsdiDongpo Deng
 
開放資料: 全球化的草根性運動
開放資料:  全球化的草根性運動開放資料:  全球化的草根性運動
開放資料: 全球化的草根性運動Dongpo Deng
 
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation DataSocial Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation DataDongpo Deng
 
20140114 moi open_data
20140114 moi open_data20140114 moi open_data
20140114 moi open_dataDongpo Deng
 

More from Dongpo Deng (20)

20180226 data driven smart governance
20180226 data driven smart governance20180226 data driven smart governance
20180226 data driven smart governance
 
The methods and practices of Linked Open Data
The methods and practices of Linked Open DataThe methods and practices of Linked Open Data
The methods and practices of Linked Open Data
 
Construction and reuse of linked traceable agricultural product records - An ...
Construction and reuse of linked traceable agricultural product records - An ...Construction and reuse of linked traceable agricultural product records - An ...
Construction and reuse of linked traceable agricultural product records - An ...
 
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
 
開放街圖社群經營的不等式
開放街圖社群經營的不等式開放街圖社群經營的不等式
開放街圖社群經營的不等式
 
OSM 與 LocalWiki 的整合: 支援社區層級災害管理
OSM 與 LocalWiki 的整合: 支援社區層級災害管理OSM 與 LocalWiki 的整合: 支援社區層級災害管理
OSM 與 LocalWiki 的整合: 支援社區層級災害管理
 
啟動開放,創新價值
啟動開放,創新價值 啟動開放,創新價值
啟動開放,創新價值
 
2016年歐洲資料論壇
2016年歐洲資料論壇2016年歐洲資料論壇
2016年歐洲資料論壇
 
From Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental DataFrom Structured Data to Linked Open Governmental Data
From Structured Data to Linked Open Governmental Data
 
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
 
20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_Mapping20150427_NCDR_OSM_Disaster_Mapping
20150427_NCDR_OSM_Disaster_Mapping
 
Crowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so farCrowdsourced mapping for open collaboration: A story of Taiwan so far
Crowdsourced mapping for open collaboration: A story of Taiwan so far
 
2014_WWW_BTOR
2014_WWW_BTOR2014_WWW_BTOR
2014_WWW_BTOR
 
20141018_OD_meetup#3
20141018_OD_meetup#320141018_OD_meetup#3
20141018_OD_meetup#3
 
20141001 climate change&osm
20141001 climate change&osm20141001 climate change&osm
20141001 climate change&osm
 
20140721 open geomeeting
20140721 open geomeeting20140721 open geomeeting
20140721 open geomeeting
 
20140710 tca gsdi
20140710 tca gsdi20140710 tca gsdi
20140710 tca gsdi
 
開放資料: 全球化的草根性運動
開放資料:  全球化的草根性運動開放資料:  全球化的草根性運動
開放資料: 全球化的草根性運動
 
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation DataSocial Web Meets Sensor Web: Linked Crowdsourced Observation Data
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
 
20140114 moi open_data
20140114 moi open_data20140114 moi open_data
20140114 moi open_data
 

Utilizing LOD for Semantic Enhancement of UGC

  • 1. Utilizing Linked Open Data (LOD) Resources for Semantic Enhancement of User-Generated Content Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3 1ITC, University of Twente, Enschede, the Netherlands 2Institute of Information Science & 3Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 4Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan Thursday, February 7, 2013
  • 2. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 2 Thursday, February 7, 2013
  • 3. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 3 Thursday, February 7, 2013
  • 4. Background  Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging  Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook  The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)  UGC is mainly multimedia or textual data  UGC is considered as a potential resource for scientific projects, e.g. citizen science JIST2012 2012/12/3 4 Thursday, February 7, 2013
  • 5. Background(cont.)  There are several problems to harvest UGC to scientific purposes  The unstructured UGC is difficult to handle  The semantics of UGC is often ambiguous or/and poor  Social media is not designed for scientific purposes Courtesy from http://www.datenform.de/mapeng.html JIST2012 2012/12/3 5 Thursday, February 7, 2013
  • 6. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 6 Thursday, February 7, 2013
  • 7. Motivation  LOD datasets as resources  LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users  about 300 datasets consisting of over 31 billion RDF triples within LOD projects.  Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.  The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information JIST2012 2012/12/3 7 Thursday, February 7, 2013
  • 8. Motivation (cont.)  Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process. JIST2012 2012/12/3 8 Thursday, February 7, 2013
  • 9. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 9 Thursday, February 7, 2013
  • 10. Data collection  Two Facebook interest groups for ecological observations in Taiwan http://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/ JIST2012 2012/12/3 10 Thursday, February 7, 2013
  • 11. Ecological Observations on Facebook JIST2012 2012/12/3 11 Thursday, February 7, 2013
  • 12. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 12 Thursday, February 7, 2013
  • 13. LOD Ecology  Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.  LODE integrated 5 previously distributed databases: TFRI: Taiwan Forestry Research Institute JIST2012 2012/12/3 13 Thursday, February 7, 2013
  • 14. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14 Thursday, February 7, 2013
  • 15. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14 Thursday, February 7, 2013
  • 16. LOD Taiwan Geographic Name (TGN)  LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles  LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org JIST2012 2012/12/3 15 Thursday, February 7, 2013
  • 17. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 16 Thursday, February 7, 2013
  • 18. An approach for processing UGC Information Extraction Information Reuse Information Formalization JIST2012 2012/12/3 17 Thursday, February 7, 2013
  • 19. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 18 Thursday, February 7, 2013
  • 20. Problems in Chinese species names in Facebook ecological observations 曙鳳蝶 (Atrophaneura Horishana) 曙鳳 (1) 玉帶鳳蝶 (Papilio Polytes) 玉帶 琉璃紋鳳蝶 (Papilio Hermosanus) 琉璃 Adjective Noun 細紋 (pronounced Si-Wen, meaning “fine veined” 細紋黃鉤蛾 (2) 細紋蠍蛉 細紋新蠍蛉 ...15 species names with prefix name “細紋” JIST2012 2012/12/3 19 Thursday, February 7, 2013
  • 21. Identifying shortened species names Confidence value = JIST2012 2012/12/3 20 Thursday, February 7, 2013
  • 22. Determine a species name for a thread  What if several species names had mentioned in one thread? We used three criteria  How many Like does the post or the comments get?  How prestigious are the people who post or make comments?  How many times does a species name occur in a thread? JIST2012 2012/12/3 21 Thursday, February 7, 2013
  • 23. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 Te-Sheng-Jhong-Sin JIST2012 2012/12/3 22 Thursday, February 7, 2013
  • 24. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 There are no rules to Te-Sheng-Jhong-Sin shorten long geographic names JIST2012 2012/12/3 22 Thursday, February 7, 2013
  • 25. Identifying shortened geographic names JIST2012 2012/12/3 23 Thursday, February 7, 2013
  • 26. The ontology...  is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents  uses standard vocabularies,  Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.  Friend of a Friend (FOAF) can be used to describe content creators,  and Dublin Core for the interlinked contents they created JIST2012 2012/12/3 24 Thursday, February 7, 2013
  • 27. An ontology for formalizing the extracted information from Facebook threads JIST2012 2012/12/3 25 Thursday, February 7, 2013
  • 28. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 26 Thursday, February 7, 2013
  • 29. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27 Thursday, February 7, 2013
  • 30. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27 Thursday, February 7, 2013
  • 31. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 32. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 33. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 34. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 35. A taxon of Theretra Nessus is the extracted species name JIST2012 2012/12/3 29 Thursday, February 7, 2013
  • 36. A taxon of Theretra Nessus is the extracted species name This entry is connected to LODE via owl:sameAs JIST2012 2012/12/3 29 Thursday, February 7, 2013
  • 37. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 38. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 39. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 40. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 41. The entry of LOD TGN transferred from Taiwan Gazetteer JIST2012 2012/12/3 31 Thursday, February 7, 2013
  • 42. The entry of LOD TGN transferred from Taiwan Gazetteer It is linked to geonames.org via owl:sameAs JIST2012 2012/12/3 31 Thursday, February 7, 2013
  • 43. Publish the processed Facebook ecological observations JIST2012 2012/12/3 32 Thursday, February 7, 2013
  • 44. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 33 Thursday, February 7, 2013
  • 45. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 46. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 47. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 48. JIST2012 2012/12/3 35 Thursday, February 7, 2013
  • 49. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 36 Thursday, February 7, 2013
  • 50. Conclusion remarking  This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)  With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.  The LOD TGN is an ongoing project.  In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource. JIST2012 2012/12/3 37 Thursday, February 7, 2013
  • 51. Thank you for your attentions Questions? deng@itc.nl JIST2012 2012/12/3 38 Thursday, February 7, 2013